Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # The AMD OpenCL compiler apparently thinks this is the best way to do a 64-bit byte swap (endian swap.)
- # Inputs are v0 for the low dword, and v2 for the high dword; outputs are v4 for low, and v5 for high.
- v_lshrrev_b32 v4, 24, v2
- v_lshrrev_b32 v5, 24, v0
- s_movk_i32 s0, 0xff
- v_bfi_b32 v4, s0, v4, 0
- v_bfi_b32 v5, s0, v5, 0
- s_mov_b32 s0, 0x3020600
- v_perm_b32 v4, v2, v4, s0
- v_perm_b32 v5, v0, v5, s0
- s_mov_b32 s0, 0x3050100
- v_perm_b32 v4, v2, v4, s0
- v_perm_b32 v5, v0, v5, s0
- s_mov_b32 s0, 0x4020100
- v_perm_b32 v4, v2, v4, s0
- v_perm_b32 v5, v0, v5, s0
- # I thought they were on some bullshit. So, I deleted ALL of the above code, and replaced it.
- # Inputs are v0 for the low dword, and v2 for the high one; outputs are v4 (low), and v5 (high.)
- s_mov_b32 s0, 0x04050607
- v_perm_b32 v5, v0, v2, s0
- v_perm_b32 v4, v2, v0, s0
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement