Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # C908 clang-18 -march=rv64gcv_zbb_zba -Ofast -DCOUNT_CYCLE
- source matrix:
- 0.000 0.756 0.223 0.306
- 0.918 0.153 0.174 0.885
- 0.321 0.854 0.991 0.624
- 0.648 0.471 0.567 0.869
- --------------------------------------------------------------------------------
- baseline matrix_transpose 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- baseline matrix_transpose 4x4 used 56 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose intrinsics 4x4 used 43 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_segmented_load 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_segmented_load 4x4 used 85 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_segmented_store 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_segmented_store 4x4 used 68 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_vrgather 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_vrgather 4x4 used 427 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_vslide 4x4 result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_vslide 4x4 used 58 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- baseline matrix_transpose nxn used 329 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose intrinsics nxn used 123 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn result:
- 0.000 0.918 0.321 0.648
- 0.756 0.153 0.854 0.471
- 0.223 0.174 0.991 0.567
- 0.306 0.885 0.624 0.869
- matrix_transpose_loads intrinsics nxn used 87 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 263 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 139 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 108 instruction(s) to tranpose 4x4=16 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 933 instruction(s) to tranpose 16x16=256 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 1501 instruction(s) to tranpose 16x16=256 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 938 instruction(s) to tranpose 16x16=256 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 507136 instruction(s) to tranpose 128x128=16384 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 185526 instruction(s) to tranpose 128x128=16384 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 324543 instruction(s) to tranpose 128x128=16384 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 17908513 instruction(s) to tranpose 512x512=262144 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 36578739 instruction(s) to tranpose 512x512=262144 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 18015288 instruction(s) to tranpose 512x512=262144 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 1120 instruction(s) to tranpose 17x17=289 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 2157 instruction(s) to tranpose 17x17=289 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 1098 instruction(s) to tranpose 17x17=289 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 102887 instruction(s) to tranpose 129x129=16641 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 287793 instruction(s) to tranpose 129x129=16641 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 113698 instruction(s) to tranpose 129x129=16641 element(s).
- --------------------------------------------------------------------------------
- baseline matrix_transpose nxn used 6704986 instruction(s) to tranpose 511x511=261121 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose intrinsics nxn used 5760246 instruction(s) to tranpose 511x511=261121 element(s).
- --------------------------------------------------------------------------------
- matrix_transpose_loads intrinsics nxn used 6801455 instruction(s) to tranpose 511x511=261121 element(s).
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement