Advertisement
Guest User

Untitled

a guest
Oct 17th, 2019
95
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.04 KB | None | 0 0
  1. # The AMD OpenCL compiler apparently thinks this is the best way to do a 64-bit byte swap (endian swap.)
  2. # Inputs are v0 for the low dword, and v2 for the high dword; outputs are v4 for low, and v5 for high.
  3.  
  4. v_lshrrev_b32 v4, 24, v2
  5. v_lshrrev_b32 v5, 24, v0
  6. s_movk_i32 s0, 0xff
  7. v_bfi_b32 v4, s0, v4, 0
  8. v_bfi_b32 v5, s0, v5, 0
  9. s_mov_b32 s0, 0x3020600
  10. v_perm_b32 v4, v2, v4, s0
  11. v_perm_b32 v5, v0, v5, s0
  12. s_mov_b32 s0, 0x3050100
  13. v_perm_b32 v4, v2, v4, s0
  14. v_perm_b32 v5, v0, v5, s0
  15. s_mov_b32 s0, 0x4020100
  16. v_perm_b32 v4, v2, v4, s0
  17. v_perm_b32 v5, v0, v5, s0
  18.  
  19. # I thought they were on some bullshit. So, I deleted ALL of the above code, and replaced it.
  20. # Inputs are v0 for the low dword, and v2 for the high one; outputs are v4 (low), and v5 (high.)
  21.  
  22. s_mov_b32 s0, 0x04050607
  23.  
  24. v_perm_b32 v5, v0, v2, s0
  25. v_perm_b32 v4, v2, v0, s0
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement