Advertisement
Guest User

baseline autovec

a guest
Jan 9th, 2024
114
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.74 KB | None | 0 0
  1. # C908 clang-18 -march=rv64gcv_zbb_zba -Ofast -DCOUNT_CYCLE
  2.  
  3. source matrix:
  4. 0.000 0.756 0.223 0.306
  5. 0.918 0.153 0.174 0.885
  6. 0.321 0.854 0.991 0.624
  7. 0.648 0.471 0.567 0.869
  8. --------------------------------------------------------------------------------
  9. baseline matrix_transpose 4x4 result:
  10. 0.000 0.918 0.321 0.648
  11. 0.756 0.153 0.854 0.471
  12. 0.223 0.174 0.991 0.567
  13. 0.306 0.885 0.624 0.869
  14. baseline matrix_transpose 4x4 used 56 instruction(s) to tranpose 4x4=16 element(s).
  15. --------------------------------------------------------------------------------
  16. matrix_transpose intrinsics 4x4 result:
  17. 0.000 0.918 0.321 0.648
  18. 0.756 0.153 0.854 0.471
  19. 0.223 0.174 0.991 0.567
  20. 0.306 0.885 0.624 0.869
  21. matrix_transpose intrinsics 4x4 used 43 instruction(s) to tranpose 4x4=16 element(s).
  22. --------------------------------------------------------------------------------
  23. matrix_transpose_segmented_load 4x4 result:
  24. 0.000 0.918 0.321 0.648
  25. 0.756 0.153 0.854 0.471
  26. 0.223 0.174 0.991 0.567
  27. 0.306 0.885 0.624 0.869
  28. matrix_transpose_segmented_load 4x4 used 85 instruction(s) to tranpose 4x4=16 element(s).
  29. --------------------------------------------------------------------------------
  30. matrix_transpose_segmented_store 4x4 result:
  31. 0.000 0.918 0.321 0.648
  32. 0.756 0.153 0.854 0.471
  33. 0.223 0.174 0.991 0.567
  34. 0.306 0.885 0.624 0.869
  35. matrix_transpose_segmented_store 4x4 used 68 instruction(s) to tranpose 4x4=16 element(s).
  36. --------------------------------------------------------------------------------
  37. matrix_transpose_vrgather 4x4 result:
  38. 0.000 0.918 0.321 0.648
  39. 0.756 0.153 0.854 0.471
  40. 0.223 0.174 0.991 0.567
  41. 0.306 0.885 0.624 0.869
  42. matrix_transpose_vrgather 4x4 used 427 instruction(s) to tranpose 4x4=16 element(s).
  43. --------------------------------------------------------------------------------
  44. matrix_transpose_vslide 4x4 result:
  45. 0.000 0.918 0.321 0.648
  46. 0.756 0.153 0.854 0.471
  47. 0.223 0.174 0.991 0.567
  48. 0.306 0.885 0.624 0.869
  49. matrix_transpose_vslide 4x4 used 58 instruction(s) to tranpose 4x4=16 element(s).
  50. --------------------------------------------------------------------------------
  51. baseline matrix_transpose nxn result:
  52. 0.000 0.918 0.321 0.648
  53. 0.756 0.153 0.854 0.471
  54. 0.223 0.174 0.991 0.567
  55. 0.306 0.885 0.624 0.869
  56. baseline matrix_transpose nxn used 329 instruction(s) to tranpose 4x4=16 element(s).
  57. --------------------------------------------------------------------------------
  58. matrix_transpose intrinsics nxn result:
  59. 0.000 0.918 0.321 0.648
  60. 0.756 0.153 0.854 0.471
  61. 0.223 0.174 0.991 0.567
  62. 0.306 0.885 0.624 0.869
  63. matrix_transpose intrinsics nxn used 123 instruction(s) to tranpose 4x4=16 element(s).
  64. --------------------------------------------------------------------------------
  65. matrix_transpose_loads intrinsics nxn result:
  66. 0.000 0.918 0.321 0.648
  67. 0.756 0.153 0.854 0.471
  68. 0.223 0.174 0.991 0.567
  69. 0.306 0.885 0.624 0.869
  70. matrix_transpose_loads intrinsics nxn used 87 instruction(s) to tranpose 4x4=16 element(s).
  71. --------------------------------------------------------------------------------
  72. baseline matrix_transpose nxn used 263 instruction(s) to tranpose 4x4=16 element(s).
  73. --------------------------------------------------------------------------------
  74. matrix_transpose intrinsics nxn used 139 instruction(s) to tranpose 4x4=16 element(s).
  75. --------------------------------------------------------------------------------
  76. matrix_transpose_loads intrinsics nxn used 108 instruction(s) to tranpose 4x4=16 element(s).
  77. --------------------------------------------------------------------------------
  78. baseline matrix_transpose nxn used 933 instruction(s) to tranpose 16x16=256 element(s).
  79. --------------------------------------------------------------------------------
  80. matrix_transpose intrinsics nxn used 1501 instruction(s) to tranpose 16x16=256 element(s).
  81. --------------------------------------------------------------------------------
  82. matrix_transpose_loads intrinsics nxn used 938 instruction(s) to tranpose 16x16=256 element(s).
  83. --------------------------------------------------------------------------------
  84. baseline matrix_transpose nxn used 507136 instruction(s) to tranpose 128x128=16384 element(s).
  85. --------------------------------------------------------------------------------
  86. matrix_transpose intrinsics nxn used 185526 instruction(s) to tranpose 128x128=16384 element(s).
  87. --------------------------------------------------------------------------------
  88. matrix_transpose_loads intrinsics nxn used 324543 instruction(s) to tranpose 128x128=16384 element(s).
  89. --------------------------------------------------------------------------------
  90. baseline matrix_transpose nxn used 17908513 instruction(s) to tranpose 512x512=262144 element(s).
  91. --------------------------------------------------------------------------------
  92. matrix_transpose intrinsics nxn used 36578739 instruction(s) to tranpose 512x512=262144 element(s).
  93. --------------------------------------------------------------------------------
  94. matrix_transpose_loads intrinsics nxn used 18015288 instruction(s) to tranpose 512x512=262144 element(s).
  95. --------------------------------------------------------------------------------
  96. baseline matrix_transpose nxn used 1120 instruction(s) to tranpose 17x17=289 element(s).
  97. --------------------------------------------------------------------------------
  98. matrix_transpose intrinsics nxn used 2157 instruction(s) to tranpose 17x17=289 element(s).
  99. --------------------------------------------------------------------------------
  100. matrix_transpose_loads intrinsics nxn used 1098 instruction(s) to tranpose 17x17=289 element(s).
  101. --------------------------------------------------------------------------------
  102. baseline matrix_transpose nxn used 102887 instruction(s) to tranpose 129x129=16641 element(s).
  103. --------------------------------------------------------------------------------
  104. matrix_transpose intrinsics nxn used 287793 instruction(s) to tranpose 129x129=16641 element(s).
  105. --------------------------------------------------------------------------------
  106. matrix_transpose_loads intrinsics nxn used 113698 instruction(s) to tranpose 129x129=16641 element(s).
  107. --------------------------------------------------------------------------------
  108. baseline matrix_transpose nxn used 6704986 instruction(s) to tranpose 511x511=261121 element(s).
  109. --------------------------------------------------------------------------------
  110. matrix_transpose intrinsics nxn used 5760246 instruction(s) to tranpose 511x511=261121 element(s).
  111. --------------------------------------------------------------------------------
  112. matrix_transpose_loads intrinsics nxn used 6801455 instruction(s) to tranpose 511x511=261121 element(s).
  113.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement