Advertisement
FioraAeterna

Untitled

Dec 21st, 2017
2,283
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.91 KB | None | 0 0
  1. the driver folks made a change that caused one of our internal tests, inigo quilez's "leizex" demo, to hang a very old model of GPU.
  2.  
  3. the catch is, the change should have been a complete no-op: they *literally did nothing but remove a bit of dead code* before passing it to the compiler.
  4.  
  5. we looked into it. it turns out the presence of the dead code slightly peturbed the decisions made by the reassociation pass. but the differences were fine and correct, just slightly different placements of instructions and so on. and it worked correctly on all *other* GPUs with the same input code, despite those differences. just not this one.
  6.  
  7. a coworker sobbed into his monitor as and did a whole bunch of debugging, made harder by the GPU's comparative age and lack of more modern debugging features. eventually he concluded, with fairly high certainty, that the core loop of the shader was infinite-looping.
  8.  
  9. but why? it looked like this (pseudocode):
  10.  
  11. for (t = 0.1f; t < 5.f; ) {
  12. pos = ro + (t * rd);
  13. /* ... */
  14. /* lots of code omitted that performs a raymarching step and calculates a new step size, 'h' */
  15. /* ... */
  16. if (h < 0.001f) break;
  17. t += h * 0.12f;
  18. }
  19.  
  20. even considering floating point numerics and "fast-math" optimizations, this should not be possible. the code is not numerically sensitive, and the code transformations appeared correct.
  21.  
  22. we confirmed that the loop was hanging by adding another variable:
  23.  
  24. int foo = 0;
  25. for (t = 0.1f; t < 5.f; ) {
  26. pos = ro + (t * rd);
  27. /* ... */
  28. if (h < 0.001f) break;
  29. t += h * 0.12f;
  30. if (foo++ > 1000000) break;
  31. }
  32.  
  33. which stopped the hang after some waiting time. yet this should be quite impossible.
  34.  
  35. let's dig deeper.
  36.  
  37. if (h < 0.001f) break;
  38. t += h * 0.12f;
  39.  
  40. this ends up looking like this after control flow flattening and considering the for loop's exit condition:
  41.  
  42. t = h < 0.001f ? t : t + h * 0.12f;
  43. if (h < 0.001f || !(t < 5.f)) break;
  44.  
  45. this is still reasonable. but note: the comparison occurs twice. furthermore, as a side note, this GPU is VLIW and eagerly duplicates operations when it can in order to make more clever use of space in its instruction "bundles". and then it can do the operations themselves in different ways.
  46.  
  47. t = (h - 0.001f < 0) ? t : t + h * 0.12f;
  48. if (h < 0.001f || !(t < 5.f)) break;
  49.  
  50. h itself was a multiply (let's call it a*b).
  51.  
  52. t = (a * b - 0.001f < 0) ? t : t + a * b * 0.12f;
  53. if (a * b < 0.001f || !(t < 5.f)) break;
  54.  
  55. now we form FMAs.
  56.  
  57. t = (FMA(a, b, -0.001f) < 0) ? t : FMA(a * b, 0.12f, t);
  58. if (a * b < 0.001f || !(t < 5.f)) break;
  59.  
  60. now one can come up with values of a and b that result in the first condition being true ("do not update t") and the first condition being false ("don't break").
  61.  
  62. for a few pixels in the frame, the shader infinite loops, and the GPU hangs.
  63.  
  64. we cry.
  65.  
  66.  
  67.  
  68. oh and it didn't break before because the multiply was in a different block of the shader so it wasn't able to be "clever".
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement