Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Challenge:
- Implement selb with only using odd instructions on SPU.
- ----------------------------------------------------------------------------------
- input : mask (comes from a floating point compare so result is always zero or ones for each 32-bit value (and 4 of them))
- a, b to select between
- ----------------------------------------------------------------------------------
- Suggestion 1 by @daniel_collin
- ----------------------------------------------------------------------------------
- // 18 cycles latency
- gb t, mask // 4
- rotqbii offset, t, 4 // 4
- lqx shufb_mask, offset, shuffle_table // 6
- shufb res, a, b, shufb_mask // 4
- ----------------------------------------------------------------------------------
- Suggestion 2 by @postgoodism
- ----------------------------------------------------------------------------------
- SPU selb using only odd instructions
- (with details elided because I'm only half-awake)
- Given a selb mask:
- v1 = FF00FFFF 0000FFFF 00FF0000 FFFFFFFF
- SHUFB v1 with a qword of zeros, using v1 as the shuffle mask.
- v2 = 80008080 00008080 00800000 80808080
- Rotate v2 to the right by 7 bits with a ROTQMBII (or is it ROTQMBYBI? can't ever remember without a cheat sheet)
- v3 = 01000101 00000101 00010000 01010101
- Broadcast the bytes of v3 into two new qwords v4 and v5 using SHUFB, interleaved with bytes from the following constant k1:
- k1 = 00102030 40506070 8090A0B0 C0D0E0F0
- v4 = 01000010 01200130 00400050 01600170
- v5 = 00800190 00A000B0 01C001D0 01E001F0
- Rotate v4 and v5 right by 4 bits using ROTQMBII/ROTQMBYBI to create v6/v7
- v6 = 00100001 00120013 00040005 00160017
- v7 = 00080019 000A000B 001C001D 001E001F
- Re-combine v6 and v7 into v8 using shufb, taking only the even-numbered bytes from each:
- v8 = 10011213 04051617 08190A0B 1C1D1E1F
- v8 is a shufb mask that replicates the original selb mask
Add Comment
Please, Sign In to add comment