Guest User

Untitled

a guest
Oct 20th, 2018
97
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.90 KB | None | 0 0
  1. Challenge:
  2.  
  3. Implement selb with only using odd instructions on SPU.
  4. ----------------------------------------------------------------------------------
  5.  
  6. input : mask (comes from a floating point compare so result is always zero or ones for each 32-bit value (and 4 of them))
  7. a, b to select between
  8.  
  9. ----------------------------------------------------------------------------------
  10. Suggestion 1 by @daniel_collin
  11. ----------------------------------------------------------------------------------
  12.  
  13. // 18 cycles latency
  14. gb t, mask // 4
  15. rotqbii offset, t, 4 // 4
  16. lqx shufb_mask, offset, shuffle_table // 6
  17. shufb res, a, b, shufb_mask // 4
  18.  
  19. ----------------------------------------------------------------------------------
  20. Suggestion 2 by @postgoodism
  21. ----------------------------------------------------------------------------------
  22.  
  23. SPU selb using only odd instructions
  24. (with details elided because I'm only half-awake)
  25.  
  26. Given a selb mask:
  27. v1 = FF00FFFF 0000FFFF 00FF0000 FFFFFFFF
  28.  
  29. SHUFB v1 with a qword of zeros, using v1 as the shuffle mask.
  30. v2 = 80008080 00008080 00800000 80808080
  31.  
  32. Rotate v2 to the right by 7 bits with a ROTQMBII (or is it ROTQMBYBI? can't ever remember without a cheat sheet)
  33. v3 = 01000101 00000101 00010000 01010101
  34.  
  35. Broadcast the bytes of v3 into two new qwords v4 and v5 using SHUFB, interleaved with bytes from the following constant k1:
  36. k1 = 00102030 40506070 8090A0B0 C0D0E0F0
  37. v4 = 01000010 01200130 00400050 01600170
  38. v5 = 00800190 00A000B0 01C001D0 01E001F0
  39.  
  40. Rotate v4 and v5 right by 4 bits using ROTQMBII/ROTQMBYBI to create v6/v7
  41. v6 = 00100001 00120013 00040005 00160017
  42. v7 = 00080019 000A000B 001C001D 001E001F
  43.  
  44. Re-combine v6 and v7 into v8 using shufb, taking only the even-numbered bytes from each:
  45. v8 = 10011213 04051617 08190A0B 1C1D1E1F
  46.  
  47. v8 is a shufb mask that replicates the original selb mask
Add Comment
Please, Sign In to add comment