Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # C908 clang-18 -march=rv64gcv_zbb_zba -Os -DCOUNT_CYCLE
- source matrix:
- 0.000 0.756 0.223 0.306
- 0.918 0.153 0.174 0.885
- 0.321 0.854 0.991 0.624
- 0.648 0.471 0.567 0.869
- --------------------------------------------------------------------------------
- baseline matrix_transpose 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- baseline matrix_transpose 4x4 used 199 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose intrinsics 4x4 used 86 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_segmented_load 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_segmented_load 4x4 used 40 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_segmented_store 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_segmented_store 4x4 used 44 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_vrgather 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_vrgather 4x4 used 122 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_vslide 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_vslide 4x4 used 64 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- baseline matrix_transpose nxn used 149 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose intrinsics nxn used 154 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_loads intrinsics nxn used 123 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 192 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 178 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 101 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 1134 instruction(s) to tranpose 16x16=256 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 1891 instruction(s) to tranpose 16x16=256 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 925 instruction(s) to tranpose 16x16=256 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 315100 instruction(s) to tranpose 128x128=16384 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 162125 instruction(s) to tranpose 128x128=16384 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 310578 instruction(s) to tranpose 128x128=16384 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 21059587 instruction(s) to tranpose 512x512=262144 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 36683380 instruction(s) to tranpose 512x512=262144 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 24279336 instruction(s) to tranpose 512x512=262144 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 1286 instruction(s) to tranpose 17x17=289 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 1518 instruction(s) to tranpose 17x17=289 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 1165 instruction(s) to tranpose 17x17=289 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 126784 instruction(s) to tranpose 129x129=16641 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 119435 instruction(s) to tranpose 129x129=16641 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 99035 instruction(s) to tranpose 129x129=16641 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 7057337 instruction(s) to tranpose 511x511=261121 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 5721012 instruction(s) to tranpose 511x511=261121 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 6939505 instruction(s) to tranpose 511x511=261121 element(s).
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement