Guest User

Untitled

a guest
Jul 15th, 2018
93
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.48 KB | None | 0 0
  1. Architectures:
  2.  
  3. - Singlecore (one core)
  4. - Simultaneous Multithreading or Hyperthreading (Multiple threads per core)
  5. - Multiprocessor (Has multiple processors)
  6. - Multicore (example Dual Core or Quad Core)
  7.  
  8. Multithreaded Programming:
  9.  
  10. - Processes
  11. - Memory mapping
  12. - memory protection
  13. - Threads
  14. - Several inside same process
  15. - Share process memory
  16.  
  17. Synchronisation:
  18.  
  19. - Critical Sections
  20. - Faster because they're only inside the same process
  21. - Semaphores
  22. - Thread safe waitable counter
  23. - This counter can be incremented (signal) and decremented (wait)
  24. - Mutexes
  25. - Semapphore with a max value of 1
  26. - Critical section that can go across process boundaries (slower)
  27. - Events
  28. - Signaled or non-signaled
  29. - Manual reset or auto resetted
  30. - Message queues
  31. - Thread safe queue of commands/messages to other thread
  32. - sometimes native support (ex windows)
  33. - uni directional so two needed for bidirectional commmuniation
  34. - Receive vs Send, blocking or non-blocking
  35. - Lockless
  36. - Synchronising without explicit waiting or locking
  37. - Based on atomic operations
  38. - Interlocked XXX Fucntions
  39. - Look out for
  40. CPU Read write reordering
  41. - Barries volatile (platform specific)
  42. Compiler instruction reordering
  43. - Compiler directives volatile (platform specific
  44. - Deadlock
  45. - Two or more threads waiting/holding shared locks so none can continue
  46. - Memory Architectures: SMP and Numa
  47. - SMP = Symmetric Multi Processing
  48. - Each processor accesses all ram through same BUS
  49. - Bus contention becomes bottleneck, so does not scale well over 8 to 12 processors
  50. - Examples: XBOX 360, PS3 (PPU on the SPU bus access) and the Intel Xeon
  51. - Numa - Non Uniform Memory Access
  52. - Node = 1 or more processor with local memory + own way of accessing it fast
  53. - These nodes can access all other node's memory but slower than their own memory
  54. - Scales well with more nodes
  55. - Needs NUMA-Aware code to properly take advantage of this
  56. - Examples AMD Opteron, Intel Itanium and the PS3 (SPU vs SPU)
  57.  
  58. Debugging Multithreaded code:
  59.  
  60. - Best practice if possible: Write singlethreaded version first, port to multithreaded later
  61. - Problems with debugging multithreaded is:
  62. - Non-deterministic
  63. - Deadlocks
  64. - Race conditions
  65. - Performance problems
  66. - Synch overhead
  67. - Data sharing
  68. - False sharing (You think you are not sharing the cache line and slows you down because you have to get it from main memory)
  69. - Hard to reproduce the bug
  70.  
  71. - Tips and tricks for debugging mt is:
  72. - Tools such as
  73. - Visual Studio
  74. - Intel Thread CHecker
  75. - Intel Thread Profiler
  76. -Tracers
  77. - Provide run-log of your app so you can see what happened
  78. - "Pair Tracing" to catch synch bugs
  79.  
  80. - Multithreaded Design
  81. - Functional: Split of entire functionality (Physics in a thread, AI in another thread)
  82. - Less scalable to more cores
  83. - Easy to convert existing linear code to this
  84. - Easy to debug
  85. - Job or data driven design:
  86. - Split one task into more independent tasks that each work on a part of the data at the same time
  87. - In job supported dependencies for better spreading of different jobs of different tasks over all cores.
  88. - Better scalability on more cores
  89. - Harder to debug
  90. - Requires a different way of thinking for the programmer. (paradigm shift) FINAL FANTASY 13!!!!!!11
  91. - More code overhead for the setup of the jobs
  92.  
  93. - Job system
  94. - Big jobs against small jobs:
  95. - Do small ones if you can do it without too much overhead
  96. - Big ones if you cant
  97. - Context can determine choice (example: raycasting)
  98. - Not everything is good for the job system
  99. - The I/O that makes the CPU wait is better off in a different thread in the background instead of a job. (Doesnt make the job system idle) Context switching is allowed here.
  100. - Middleware and jobsystems
  101. - Most middleware will use their own multithreading
  102. - Has overhead
  103. - Less optimal use of all cores
  104. - Shared job system is ideal (such as the PS3 Spurs system)
  105.  
  106. Fastest multithreading
  107. 1 No synch whatsoever
  108. 2 Lockless (with barriers)
  109. 3 Critical sections
  110. 4 Semaphores, mutexes, events
  111.  
  112. Fast multithreading: Performance dangers
  113. - Data sharing (continuously invalidating cache lines because reading writing to shared var all the time)
  114. - Fix by working with copies if it's possible
  115. - False sharing (continuously invalidating cache lines because r/w to unrelated vars in same cache lines all the time)
  116. - Fix this by aligning independent vars on cache lines granualiarities
Add Comment
Please, Sign In to add comment