- Thank you all your feedbacks. I can't wait to try those on the weekend.
- Until that TODO:
- LeeHowes, vmiura: atomic in the while()
- "Workgroups are scheduled in batches on the GPU. So, Workgroups in Batch-1 can be infintely spinning waiting for other Batches to complete. The other batches dont get scheduled until Batch-1 completes and thats a classic deadlock."
- Yea, I'm dealing with this:
- - Simply don't let the number of WorkGroups go above 2*NumberOfCUes. (tho' it's weird that it didn't crashed at CU*2+1)
- (On GCN I could use the s_sleep() instruction to let other waves increment and poll that flag. And have 'complete path' with the glc flat that drallan mentioned earlier)
- This time, the bottleneck is LDS memory and not the processing power, so I hope if I will not use that many waves, ther will be no deadlocks.
- The program I wanna make will simulate waves in the strings of a virtual piano. It's basically like a 2D elastic water surface effect but in 1D, and on 192K frames per sec . That's why LDS is needed as a somewhat randomly accessible and fast memory.
- The synchronization will be used to let all the piano strings give and receive vibrations to and from each other when the sustain pedal is pressed (still on 129KHz).
- I already did a simulation for 3..6 strings (thats only 1-2 keys pressed) on a single Phenom II 3GHz core with the help of SSE, but I want all the 200+ strings burn simultaneously on a 1..2TFlops GPU .
RAW Paste Data