Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Architectures:
- - Singlecore (one core)
- - Simultaneous Multithreading or Hyperthreading (Multiple threads per core)
- - Multiprocessor (Has multiple processors)
- - Multicore (example Dual Core or Quad Core)
- Multithreaded Programming:
- - Processes
- - Memory mapping
- - memory protection
- - Threads
- - Several inside same process
- - Share process memory
- Synchronisation:
- - Critical Sections
- - Faster because they're only inside the same process
- - Semaphores
- - Thread safe waitable counter
- - This counter can be incremented (signal) and decremented (wait)
- - Mutexes
- - Semapphore with a max value of 1
- - Critical section that can go across process boundaries (slower)
- - Events
- - Signaled or non-signaled
- - Manual reset or auto resetted
- - Message queues
- - Thread safe queue of commands/messages to other thread
- - sometimes native support (ex windows)
- - uni directional so two needed for bidirectional commmuniation
- - Receive vs Send, blocking or non-blocking
- - Lockless
- - Synchronising without explicit waiting or locking
- - Based on atomic operations
- - Interlocked XXX Fucntions
- - Look out for
- CPU Read write reordering
- - Barries volatile (platform specific)
- Compiler instruction reordering
- - Compiler directives volatile (platform specific
- - Deadlock
- - Two or more threads waiting/holding shared locks so none can continue
- - Memory Architectures: SMP and Numa
- - SMP = Symmetric Multi Processing
- - Each processor accesses all ram through same BUS
- - Bus contention becomes bottleneck, so does not scale well over 8 to 12 processors
- - Examples: XBOX 360, PS3 (PPU on the SPU bus access) and the Intel Xeon
- - Numa - Non Uniform Memory Access
- - Node = 1 or more processor with local memory + own way of accessing it fast
- - These nodes can access all other node's memory but slower than their own memory
- - Scales well with more nodes
- - Needs NUMA-Aware code to properly take advantage of this
- - Examples AMD Opteron, Intel Itanium and the PS3 (SPU vs SPU)
- Debugging Multithreaded code:
- - Best practice if possible: Write singlethreaded version first, port to multithreaded later
- - Problems with debugging multithreaded is:
- - Non-deterministic
- - Deadlocks
- - Race conditions
- - Performance problems
- - Synch overhead
- - Data sharing
- - False sharing (You think you are not sharing the cache line and slows you down because you have to get it from main memory)
- - Hard to reproduce the bug
- - Tips and tricks for debugging mt is:
- - Tools such as
- - Visual Studio
- - Intel Thread CHecker
- - Intel Thread Profiler
- -Tracers
- - Provide run-log of your app so you can see what happened
- - "Pair Tracing" to catch synch bugs
- - Multithreaded Design
- - Functional: Split of entire functionality (Physics in a thread, AI in another thread)
- - Less scalable to more cores
- - Easy to convert existing linear code to this
- - Easy to debug
- - Job or data driven design:
- - Split one task into more independent tasks that each work on a part of the data at the same time
- - In job supported dependencies for better spreading of different jobs of different tasks over all cores.
- - Better scalability on more cores
- - Harder to debug
- - Requires a different way of thinking for the programmer. (paradigm shift) FINAL FANTASY 13!!!!!!11
- - More code overhead for the setup of the jobs
- - Job system
- - Big jobs against small jobs:
- - Do small ones if you can do it without too much overhead
- - Big ones if you cant
- - Context can determine choice (example: raycasting)
- - Not everything is good for the job system
- - The I/O that makes the CPU wait is better off in a different thread in the background instead of a job. (Doesnt make the job system idle) Context switching is allowed here.
- - Middleware and jobsystems
- - Most middleware will use their own multithreading
- - Has overhead
- - Less optimal use of all cores
- - Shared job system is ideal (such as the PS3 Spurs system)
- Fastest multithreading
- 1 No synch whatsoever
- 2 Lockless (with barriers)
- 3 Critical sections
- 4 Semaphores, mutexes, events
- Fast multithreading: Performance dangers
- - Data sharing (continuously invalidating cache lines because reading writing to shared var all the time)
- - Fix by working with copies if it's possible
- - False sharing (continuously invalidating cache lines because r/w to unrelated vars in same cache lines all the time)
- - Fix this by aligning independent vars on cache lines granualiarities
Add Comment
Please, Sign In to add comment