Untitled

Architectures:

- Singlecore (one core)
- Simultaneous Multithreading or Hyperthreading (Multiple threads per core)
- Multiprocessor (Has multiple processors)
- Multicore (example Dual Core or Quad Core)

Multithreaded Programming:

- Processes
	- Memory mapping
	- memory protection
- Threads
	- Several inside same process
	- Share process memory

Synchronisation:

- Critical Sections
	- Faster because they're only inside the same process
- Semaphores
	- Thread safe waitable counter
	- This counter can be incremented (signal) and decremented (wait)
- Mutexes
	- Semapphore with a max value of 1
	- Critical section that can go across process boundaries (slower)
- Events
	- Signaled or non-signaled
	- Manual reset or auto resetted
- Message queues
	- Thread safe queue of commands/messages to other thread
	- sometimes native support (ex windows)
	- uni directional so two needed for bidirectional commmuniation
	- Receive vs Send, blocking or non-blocking
- Lockless
	- Synchronising without explicit waiting or locking
	- Based on atomic operations
	- Interlocked XXX Fucntions
	- Look out for
		CPU Read write reordering
			- Barries volatile (platform specific)
		Compiler instruction reordering
			- Compiler directives volatile (platform specific
- Deadlock
	- Two or more threads waiting/holding shared locks so none can continue
- Memory Architectures: SMP and Numa
	- SMP = Symmetric Multi Processing
		- Each processor accesses all ram through same BUS
		- Bus contention becomes bottleneck, so does not scale well over 8 to 12 processors
		- Examples: XBOX 360, PS3 (PPU on the SPU bus access) and the Intel Xeon
	- Numa - Non Uniform Memory Access
		- Node = 1 or more processor with local memory + own way of accessing it fast
		- These nodes can access all other node's memory but slower than their own memory
		- Scales well with more nodes
		- Needs NUMA-Aware code to properly take advantage of this
		- Examples AMD Opteron, Intel Itanium and the PS3 (SPU vs SPU)

Debugging Multithreaded code:

- Best practice if possible: Write singlethreaded version first, port to multithreaded later
- Problems with debugging multithreaded is:
	- Non-deterministic
	- Deadlocks
	- Race conditions
	- Performance problems
		- Synch overhead
		- Data sharing
		- False sharing (You think you are not sharing the cache line and slows you down because you have to get it from main memory)
	- Hard to reproduce the bug

- Tips and tricks for debugging mt is:
	- Tools such as
		- Visual Studio
		- Intel Thread CHecker
		- Intel Thread Profiler
	-Tracers
		- Provide run-log of your app so you can see what happened
		- "Pair Tracing" to catch synch bugs

- Multithreaded Design
	- Functional: Split of entire functionality (Physics in a thread, AI in another thread)
		- Less scalable to more cores
		- Easy to convert existing linear code to this
		- Easy to debug
	- Job or data driven design:
		- Split one task into more independent tasks that each work on a part of the data at the same time
		- In job supported dependencies for better spreading of different jobs of different tasks over all cores.
		- Better scalability on more cores
		- Harder to debug
		- Requires a different way of thinking for the programmer. (paradigm shift) FINAL FANTASY 13!!!!!!11
		- More code overhead for the setup of the jobs

	- Job system
		- Big jobs against small jobs:
			- Do small ones if you can do it without too much overhead
			- Big ones if you cant
			- Context can determine choice (example: raycasting)
		- Not everything is good for the job system
			- The I/O that makes the CPU wait is better off in a different thread in the background instead of a job. (Doesnt make the job system idle) Context switching is allowed here.
		- Middleware and jobsystems
			- Most middleware will use their own multithreading
				- Has overhead
				- Less optimal use of all cores
			- Shared job system is ideal (such as the PS3 Spurs system)

Fastest multithreading
	1 No synch whatsoever
	2 Lockless (with barriers)
	3 Critical sections
	4 Semaphores, mutexes, events

Fast multithreading: Performance dangers
	- Data sharing (continuously invalidating cache lines because reading writing to shared var all the time)
		- Fix by working with copies if it's possible
	- False sharing (continuously invalidating cache lines because r/w to unrelated vars in same cache lines all the time)
		- Fix this by aligning independent vars on cache lines granualiarities