Advertisement
Guest User

Untitled

a guest
May 20th, 2018
304
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 37.51 KB | None | 0 0
  1. Welcome to uarch-bench (e1d92fb-dirty)
  2. Median CPU speed: 1.499 GHz
  3. Running benchmarks groups using timer clock
  4.  
  5. ** Running benchmark group Default Group **
  6. Benchmark Cycles Nanos
  7. Dependent add chain 1.00 0.67
  8. Independent add chain 0.25 0.17
  9. Dependent imul 64->128 3.00 2.00
  10. Dependent imul 64->64 3.00 2.00
  11. Independent imul 64->128 2.00 1.33
  12. Same location stores 1.00 0.67
  13. Disjoint location stores 1.00 0.67
  14. Dependent push/pop chain 7.00 4.67
  15. Inependent push/pop chain 1.00 0.67
  16.  
  17. ** Inverse throughput for load/16-bit **
  18. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  19. 0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  20. 16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  21. 32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  22. 48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  23.  
  24. ** Inverse throughput for load/32-bit **
  25. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  26. 0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
  27. 16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0
  28. 32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
  29. 48 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0
  30.  
  31. ** Inverse throughput for load/64-bit **
  32. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  33. 0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
  34. 16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  35. 32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
  36. 48 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  37.  
  38. ** Inverse throughput for load/128-bit **
  39. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  40. 0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
  41. 16 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  42. 32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
  43. 48 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  44.  
  45. ** Inverse throughput for load/256-bit **
  46. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  47. 0 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
  48. 16 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
  49. 32 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
  50. 48 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
  51.  
  52. ** Inverse throughput for store/16-bit **
  53. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  54. 0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
  55. 16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
  56. 32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
  57. 48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
  58.  
  59. ** Inverse throughput for store/32-bit **
  60. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  61. 0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
  62. 16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
  63. 32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
  64. 48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
  65.  
  66. ** Inverse throughput for store/64-bit **
  67. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  68. 0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
  69. 16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
  70. 32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
  71. 48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
  72.  
  73. ** Inverse throughput for store/128-bit **
  74. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  75. 0 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
  76. 16 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
  77. 32 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
  78. 48 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
  79.  
  80. ** Inverse throughput for store/256-bit **
  81. offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  82. 0 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
  83. 16 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
  84. 32 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
  85. 48 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
  86.  
  87. ** Running benchmark group Parallel load/prefetches from fixed-size regions **
  88. Benchmark Cycles Nanos
  89. 16-KiB parallel-loads 0.53 0.35
  90. 16-KiB parallel-prefetcht0 0.50 0.33
  91. 16-KiB parallel-prefetcht1 0.50 0.33
  92. 16-KiB parallel-prefetcht2 0.50 0.33
  93. 16-KiB parallel-prefetchnta 0.50 0.33
  94. 32-KiB parallel-loads 0.53 0.35
  95. 32-KiB parallel-prefetcht0 0.50 0.33
  96. 32-KiB parallel-prefetcht1 0.50 0.33
  97. 32-KiB parallel-prefetcht2 0.50 0.33
  98. 32-KiB parallel-prefetchnta 0.50 0.33
  99. 64-KiB parallel-loads 2.00 1.33
  100. 64-KiB parallel-prefetcht0 0.50 0.33
  101. 64-KiB parallel-prefetcht1 0.50 0.33
  102. 64-KiB parallel-prefetcht2 0.50 0.33
  103. 64-KiB parallel-prefetchnta 0.50 0.33
  104. 128-KiB parallel-loads 2.00 1.33
  105. 128-KiB parallel-prefetcht0 0.50 0.33
  106. 128-KiB parallel-prefetcht1 0.50 0.33
  107. 128-KiB parallel-prefetcht2 0.50 0.33
  108. 128-KiB parallel-prefetchnta 0.50 0.33
  109. 256-KiB parallel-loads 2.00 1.33
  110. 256-KiB parallel-prefetcht0 0.50 0.33
  111. 256-KiB parallel-prefetcht1 0.50 0.33
  112. 256-KiB parallel-prefetcht2 0.50 0.33
  113. 256-KiB parallel-prefetchnta 0.50 0.33
  114. 512-KiB parallel-loads 2.01 1.34
  115. 512-KiB parallel-prefetcht0 0.50 0.33
  116. 512-KiB parallel-prefetcht1 0.50 0.33
  117. 512-KiB parallel-prefetcht2 0.50 0.34
  118. 512-KiB parallel-prefetchnta 0.50 0.33
  119. 2048-KiB parallel-loads 2.06 1.37
  120. 2048-KiB parallel-prefetcht0 0.51 0.34
  121. 2048-KiB parallel-prefetcht1 0.51 0.34
  122. 2048-KiB parallel-prefetcht2 0.51 0.34
  123. 2048-KiB parallel-prefetchnta 0.50 0.33
  124.  
  125. ** Running benchmark group Serial loads from fixed-size regions **
  126. Benchmark Cycles Nanos
  127. 16-KiB serial loads 4.00 2.67
  128. 24-KiB serial loads 4.00 2.67
  129. 30-KiB serial loads 4.00 2.67
  130. 31-KiB serial loads 4.00 2.67
  131. 32-KiB serial loads 4.00 2.67
  132. 33-KiB serial loads 5.11 3.41
  133. 34-KiB serial loads 5.76 3.84
  134. 35-KiB serial loads 8.53 5.69
  135. 40-KiB serial loads 12.06 8.04
  136. 48-KiB serial loads 12.16 8.11
  137. 56-KiB serial loads 12.06 8.05
  138. 64-KiB serial loads 12.05 8.03
  139. 80-KiB serial loads 12.06 8.04
  140. 96-KiB serial loads 12.07 8.05
  141. 112-KiB serial loads 12.08 8.06
  142. 128-KiB serial loads 12.05 8.04
  143. 196-KiB serial loads 12.06 8.05
  144. 252-KiB serial loads 12.06 8.04
  145. 256-KiB serial loads 12.06 8.04
  146. 260-KiB serial loads 12.19 8.13
  147. 384-KiB serial loads 17.28 11.53
  148. 512-KiB serial loads 21.46 14.31
  149. 1024-KiB serial loads 35.88 23.93
  150. 2048-KiB serial loads 39.74 26.51
  151.  
  152. ** Running benchmark group Store forwaring latency and throughput **
  153. Benchmark Cycles Nanos
  154. Store forward latency delay 0 6.99 4.66
  155. Store forward latency delay 1 6.99 4.66
  156. Store forward latency delay 2 6.99 4.66
  157. Store forward latency delay 3 6.99 4.66
  158. Store forward latency delay 4 6.99 4.66
  159. Store forward latency delay 5 6.31 4.21
  160. Store fwd tput concurrency 1 6.99 4.66
  161. Store fwd tput concurrency 2 3.50 2.33
  162. Store fwd tput concurrency 3 2.33 1.55
  163. Store fwd tput concurrency 4 1.75 1.17
  164. Store fwd tput concurrency 5 1.40 0.93
  165. Store fwd tput concurrency 6 1.17 0.78
  166. Store fwd tput concurrency 7 1.06 0.71
  167. Store fwd tput concurrency 8 1.00 0.67
  168. Store fwd tput concurrency 9 1.00 0.67
  169. Store fwd tput concurrency 10 1.00 0.67
  170.  
  171. ** Running benchmark group Store forwaring latency and throughput **
  172.  
  173. ---------- Oneshot calibration start --------------
  174. Benchmark Cycles Nanos
  175. Oneshot overhead min 89.96 60.00
  176. Oneshot overhead median (used) 104.95 70.00
  177. Oneshot overhead max 104.95 70.00
  178. ---------- Oneshot calibration end --------------
  179.  
  180. oneshot-dummy @ 0x0x494100
  181. Benchmark Sample Cycles Nanos
  182. Empty oneshot bench 1 0.00 0.00
  183. Empty oneshot bench 2 0.00 0.00
  184. Empty oneshot bench 3 0.00 0.00
  185. Empty oneshot bench 4 -14.99 -10.00
  186. Empty oneshot bench 5 -14.99 -10.00
  187. Empty oneshot bench 6 0.00 0.00
  188. Empty oneshot bench 7 -14.99 -10.00
  189. Empty oneshot bench 8 -14.99 -10.00
  190. Empty oneshot bench 9 -14.99 -10.00
  191. Empty oneshot bench 10 0.00 0.00
  192. Empty oneshot bench 11 -14.99 -10.00
  193. Empty oneshot bench 12 -14.99 -10.00
  194. Empty oneshot bench 13 0.00 0.00
  195. Empty oneshot bench 14 0.00 0.00
  196. Empty oneshot bench 15 -14.99 -10.00
  197. Empty oneshot bench 16 -14.99 -10.00
  198. Empty oneshot bench 17 0.00 0.00
  199. Empty oneshot bench 18 0.00 0.00
  200. Empty oneshot bench 19 -14.99 -10.00
  201. Empty oneshot bench 20 -14.99 -10.00
  202.  
  203. oneshot-latency-2 @ 0x0x4a11c0
  204. Benchmark Sample Cycles Nanos
  205. StFwd oneshot lat (delay 2) 1144767.62 96560.00
  206. StFwd oneshot lat (delay 2) 2144482.76 96370.00
  207. StFwd oneshot lat (delay 2) 3144481.26 96369.00
  208. StFwd oneshot lat (delay 2) 4144482.76 96370.00
  209. StFwd oneshot lat (delay 2) 5144467.77 96360.00
  210. StFwd oneshot lat (delay 2) 6144481.26 96369.00
  211. StFwd oneshot lat (delay 2) 7144482.76 96370.00
  212. StFwd oneshot lat (delay 2) 8144481.26 96369.00
  213. StFwd oneshot lat (delay 2) 9144482.76 96370.00
  214. StFwd oneshot lat (delay 2) 10144467.77 96360.00
  215. StFwd oneshot lat (delay 2) 11144481.26 96369.00
  216. StFwd oneshot lat (delay 2) 12144482.76 96370.00
  217. StFwd oneshot lat (delay 2) 13144482.76 96370.00
  218. StFwd oneshot lat (delay 2) 14144481.26 96369.00
  219. StFwd oneshot lat (delay 2) 15144482.76 96370.00
  220. StFwd oneshot lat (delay 2) 16151587.71101109.00
  221. StFwd oneshot lat (delay 2) 17144482.76 96370.00
  222. StFwd oneshot lat (delay 2) 18144482.76 96370.00
  223. StFwd oneshot lat (delay 2) 19144481.26 96369.00
  224. StFwd oneshot lat (delay 2) 20144482.76 96370.00
  225.  
  226. oneshot-latency-1 @ 0x0x4a0e40
  227. Benchmark Sample Cycles Nanos
  228. StFwd oneshot lat (delay 1) 1144767.62 96560.00
  229. StFwd oneshot lat (delay 1) 2144482.76 96370.00
  230. StFwd oneshot lat (delay 1) 3144481.26 96369.00
  231. StFwd oneshot lat (delay 1) 4144482.76 96370.00
  232. StFwd oneshot lat (delay 1) 5144482.76 96370.00
  233. StFwd oneshot lat (delay 1) 6144481.26 96369.00
  234. StFwd oneshot lat (delay 1) 7144482.76 96370.00
  235. StFwd oneshot lat (delay 1) 8144482.76 96370.00
  236. StFwd oneshot lat (delay 1) 9144481.26 96369.00
  237. StFwd oneshot lat (delay 1) 10144482.76 96370.00
  238. StFwd oneshot lat (delay 1) 11144466.27 96359.00
  239. StFwd oneshot lat (delay 1) 12144467.77 96360.00
  240. StFwd oneshot lat (delay 1) 13144482.76 96370.00
  241. StFwd oneshot lat (delay 1) 14144481.26 96369.00
  242. StFwd oneshot lat (delay 1) 15144482.76 96370.00
  243. StFwd oneshot lat (delay 1) 16144482.76 96370.00
  244. StFwd oneshot lat (delay 1) 17144481.26 96369.00
  245. StFwd oneshot lat (delay 1) 18144482.76 96370.00
  246. StFwd oneshot lat (delay 1) 19144482.76 96370.00
  247. StFwd oneshot lat (delay 1) 20144481.26 96369.00
  248.  
  249. oneshot-latency-0 @ 0x0x4a0ac0
  250. Benchmark Sample Cycles Nanos
  251. StFwd oneshot lat (delay 0) 1144691.15 96509.00
  252. StFwd oneshot lat (delay 0) 2144482.76 96370.00
  253. StFwd oneshot lat (delay 0) 3144482.76 96370.00
  254. StFwd oneshot lat (delay 0) 4144481.26 96369.00
  255. StFwd oneshot lat (delay 0) 5144482.76 96370.00
  256. StFwd oneshot lat (delay 0) 6144482.76 96370.00
  257. StFwd oneshot lat (delay 0) 7144481.26 96369.00
  258. StFwd oneshot lat (delay 0) 8144467.77 96360.00
  259. StFwd oneshot lat (delay 0) 9152052.47101419.00
  260. StFwd oneshot lat (delay 0) 10144482.76 96370.00
  261. StFwd oneshot lat (delay 0) 11144482.76 96370.00
  262. StFwd oneshot lat (delay 0) 12144481.26 96369.00
  263. StFwd oneshot lat (delay 0) 13144482.76 96370.00
  264. StFwd oneshot lat (delay 0) 14144482.76 96370.00
  265. StFwd oneshot lat (delay 0) 15144481.26 96369.00
  266. StFwd oneshot lat (delay 0) 16144482.76 96370.00
  267. StFwd oneshot lat (delay 0) 17144466.27 96359.00
  268. StFwd oneshot lat (delay 0) 18144467.77 96360.00
  269. StFwd oneshot lat (delay 0) 19144467.77 96360.00
  270. StFwd oneshot lat (delay 0) 20144466.27 96359.00
  271.  
  272.  
  273. ** Running benchmark group Store forward attempts **
  274. oneshot-dummy @ 0x0x494100
  275. Benchmark Sample Cycles Nanos
  276. Empty oneshot bench 1 -14.99 -10.00
  277. Empty oneshot bench 2 -14.99 -10.00
  278. Empty oneshot bench 3 -14.99 -10.00
  279. Empty oneshot bench 4 -14.99 -10.00
  280. Empty oneshot bench 5 0.00 0.00
  281. Empty oneshot bench 6 0.00 0.00
  282. Empty oneshot bench 7 -14.99 -10.00
  283. Empty oneshot bench 8 -14.99 -10.00
  284. Empty oneshot bench 9 -14.99 -10.00
  285. Empty oneshot bench 10 0.00 0.00
  286. Empty oneshot bench 11 -14.99 -10.00
  287. Empty oneshot bench 12 -14.99 -10.00
  288. Empty oneshot bench 13 0.00 0.00
  289. Empty oneshot bench 14 0.00 0.00
  290. Empty oneshot bench 15 -14.99 -10.00
  291. Empty oneshot bench 16 -14.99 -10.00
  292. Empty oneshot bench 17 0.00 0.00
  293. Empty oneshot bench 18 -14.99 -10.00
  294. Empty oneshot bench 19 -14.99 -10.00
  295. Empty oneshot bench 20 -14.99 -10.00
  296.  
  297. stfwd-try1 @ 0x0x4a0780
  298. Benchmark Sample Cycles Nanos
  299. stfwd-try1 1 674.66 450.00
  300. stfwd-try1 2 89.96 60.00
  301. stfwd-try1 3 89.96 60.00
  302. stfwd-try1 4 89.96 60.00
  303. stfwd-try1 5 89.96 60.00
  304. stfwd-try1 6 74.96 50.00
  305. stfwd-try1 7 89.96 60.00
  306. stfwd-try1 8 74.96 50.00
  307. stfwd-try1 9 89.96 60.00
  308. stfwd-try1 10 74.96 50.00
  309. stfwd-try1 11 74.96 50.00
  310. stfwd-try1 12 74.96 50.00
  311. stfwd-try1 13 74.96 50.00
  312. stfwd-try1 14 74.96 50.00
  313. stfwd-try1 15 89.96 60.00
  314. stfwd-try1 16 74.96 50.00
  315. stfwd-try1 17 89.96 60.00
  316. stfwd-try1 18 74.96 50.00
  317. stfwd-try1 19 89.96 60.00
  318. stfwd-try1 20 89.96 60.00
  319.  
  320. stfwd-try2 @ 0x0x4a02c0
  321. Benchmark Sample Cycles Nanos
  322. stfwd-try2 100 loads 1 614.69 410.00
  323. stfwd-try2 100 loads 2 3658.17 2440.00
  324. stfwd-try2 100 loads 3 254.87 170.00
  325. stfwd-try2 100 loads 4 254.87 170.00
  326. stfwd-try2 100 loads 5 254.87 170.00
  327. stfwd-try2 100 loads 6 254.87 170.00
  328. stfwd-try2 100 loads 7 269.87 180.00
  329. stfwd-try2 100 loads 8 254.87 170.00
  330. stfwd-try2 100 loads 9 254.87 170.00
  331. stfwd-try2 100 loads 10 254.87 170.00
  332. stfwd-try2 100 loads 11 254.87 170.00
  333. stfwd-try2 100 loads 12 269.87 180.00
  334. stfwd-try2 100 loads 13 254.87 170.00
  335. stfwd-try2 100 loads 14 254.87 170.00
  336. stfwd-try2 100 loads 15 254.87 170.00
  337. stfwd-try2 100 loads 16 254.87 170.00
  338. stfwd-try2 100 loads 17 269.87 180.00
  339. stfwd-try2 100 loads 18 254.87 170.00
  340. stfwd-try2 100 loads 19 254.87 170.00
  341. stfwd-try2 100 loads 20 254.87 170.00
  342.  
  343. stfwd-try2-4 @ 0x0x49d200
  344. Benchmark Sample Cycles Nanos
  345. stfwd-try2 4 loads 1 74.96 50.00
  346. stfwd-try2 4 loads 2 164.92 110.00
  347. stfwd-try2 4 loads 3 -14.99 -10.00
  348. stfwd-try2 4 loads 4 0.00 0.00
  349. stfwd-try2 4 loads 5 0.00 0.00
  350. stfwd-try2 4 loads 6 -14.99 -10.00
  351. stfwd-try2 4 loads 7 -14.99 -10.00
  352. stfwd-try2 4 loads 8 -14.99 -10.00
  353. stfwd-try2 4 loads 9 0.00 0.00
  354. stfwd-try2 4 loads 10 0.00 0.00
  355. stfwd-try2 4 loads 11 -14.99 -10.00
  356. stfwd-try2 4 loads 12 -14.99 -10.00
  357. stfwd-try2 4 loads 13 -14.99 -10.00
  358. stfwd-try2 4 loads 14 0.00 0.00
  359. stfwd-try2 4 loads 15 0.00 0.00
  360. stfwd-try2 4 loads 16 -14.99 -10.00
  361. stfwd-try2 4 loads 17 -14.99 -10.00
  362. stfwd-try2 4 loads 18 -14.99 -10.00
  363. stfwd-try2 4 loads 19 0.00 0.00
  364. stfwd-try2 4 loads 20 0.00 0.00
  365.  
  366. stfwd-try2-10 @ 0x0x49d240
  367. Benchmark Sample Cycles Nanos
  368. stfwd-try2 10 loads 1 44.98 30.00
  369. stfwd-try2 10 loads 2 389.81 260.00
  370. stfwd-try2 10 loads 3 0.00 0.00
  371. stfwd-try2 10 loads 4 0.00 0.00
  372. stfwd-try2 10 loads 5 0.00 0.00
  373. stfwd-try2 10 loads 6 0.00 0.00
  374. stfwd-try2 10 loads 7 -14.99 -10.00
  375. stfwd-try2 10 loads 8 -14.99 -10.00
  376. stfwd-try2 10 loads 9 -14.99 -10.00
  377. stfwd-try2 10 loads 10 0.00 0.00
  378. stfwd-try2 10 loads 11 0.00 0.00
  379. stfwd-try2 10 loads 12 0.00 0.00
  380. stfwd-try2 10 loads 13 0.00 0.00
  381. stfwd-try2 10 loads 14 0.00 0.00
  382. stfwd-try2 10 loads 15 -14.99 -10.00
  383. stfwd-try2 10 loads 16 -14.99 -10.00
  384. stfwd-try2 10 loads 17 0.00 0.00
  385. stfwd-try2 10 loads 18 0.00 0.00
  386. stfwd-try2 10 loads 19 0.00 0.00
  387. stfwd-try2 10 loads 20 0.00 0.00
  388.  
  389. stfwd-try2-20 @ 0x0x4a01c0
  390. Benchmark Sample Cycles Nanos
  391. stfwd-try2 20 loads 1 509.75 340.00
  392. stfwd-try2 20 loads 2 734.63 490.00
  393. stfwd-try2 20 loads 3 14.99 10.00
  394. stfwd-try2 20 loads 4 29.99 20.00
  395. stfwd-try2 20 loads 5 14.99 10.00
  396. stfwd-try2 20 loads 6 29.99 20.00
  397. stfwd-try2 20 loads 7 14.99 10.00
  398. stfwd-try2 20 loads 8 14.99 10.00
  399. stfwd-try2 20 loads 9 14.99 10.00
  400. stfwd-try2 20 loads 10 14.99 10.00
  401. stfwd-try2 20 loads 11 14.99 10.00
  402. stfwd-try2 20 loads 12 14.99 10.00
  403. stfwd-try2 20 loads 13 14.99 10.00
  404. stfwd-try2 20 loads 14 29.99 20.00
  405. stfwd-try2 20 loads 15 14.99 10.00
  406. stfwd-try2 20 loads 16 29.99 20.00
  407. stfwd-try2 20 loads 17 14.99 10.00
  408. stfwd-try2 20 loads 18 14.99 10.00
  409. stfwd-try2 20 loads 19 14.99 10.00
  410. stfwd-try2 20 loads 20 14.99 10.00
  411.  
  412. stfwd-try2-1000 @ 0x0x49d2c0
  413. Benchmark Sample Cycles Nanos
  414. stfwd-try2 1000 loads 1 32188.91 21470.00
  415. stfwd-try2 1000 loads 2 36236.88 24170.00
  416. stfwd-try2 1000 loads 3 2968.52 1980.00
  417. stfwd-try2 1000 loads 4 2968.52 1980.00
  418. stfwd-try2 1000 loads 5 2953.52 1970.00
  419. stfwd-try2 1000 loads 6 2953.52 1970.00
  420. stfwd-try2 1000 loads 7 2968.52 1980.00
  421. stfwd-try2 1000 loads 8 2968.52 1980.00
  422. stfwd-try2 1000 loads 9 2968.52 1980.00
  423. stfwd-try2 1000 loads 10 2953.52 1970.00
  424. stfwd-try2 1000 loads 11 2953.52 1970.00
  425. stfwd-try2 1000 loads 12 2968.52 1980.00
  426. stfwd-try2 1000 loads 13 2968.52 1980.00
  427. stfwd-try2 1000 loads 14 2968.52 1980.00
  428. stfwd-try2 1000 loads 15 2953.52 1970.00
  429. stfwd-try2 1000 loads 16 2953.52 1970.00
  430. stfwd-try2 1000 loads 17 2968.52 1980.00
  431. stfwd-try2 1000 loads 18 2968.52 1980.00
  432. stfwd-try2 1000 loads 19 2968.52 1980.00
  433. stfwd-try2 1000 loads 20 2953.52 1970.00
  434.  
  435. stfwd-try2-1000w @ 0x0x49d2c0
  436. Benchmark Sample Cycles Nanos
  437. stfwd-try2 1000 loads warm 1 2983.51 1990.00
  438. stfwd-try2 1000 loads warm 2 36236.88 24170.00
  439. stfwd-try2 1000 loads warm 3 36236.88 24170.00
  440. stfwd-try2 1000 loads warm 4 36236.88 24170.00
  441. stfwd-try2 1000 loads warm 5 36251.87 24180.00
  442. stfwd-try2 1000 loads warm 6 36236.88 24170.00
  443. stfwd-try2 1000 loads warm 7 36236.88 24170.00
  444. stfwd-try2 1000 loads warm 8 36236.88 24170.00
  445. stfwd-try2 1000 loads warm 9 36236.88 24170.00
  446. stfwd-try2 1000 loads warm 10 36235.38 24169.00
  447. stfwd-try2 1000 loads warm 11 36236.88 24170.00
  448. stfwd-try2 1000 loads warm 12 36236.88 24170.00
  449. stfwd-try2 1000 loads warm 13 36236.88 24170.00
  450. stfwd-try2 1000 loads warm 14 36251.87 24180.00
  451. stfwd-try2 1000 loads warm 15 36236.88 24170.00
  452. stfwd-try2 1000 loads warm 16 36236.88 24170.00
  453. stfwd-try2 1000 loads warm 17 36236.88 24170.00
  454. stfwd-try2 1000 loads warm 18 36236.88 24170.00
  455. stfwd-try2 1000 loads warm 19 36236.88 24170.00
  456. stfwd-try2 1000 loads warm 20 36235.38 24169.00
  457.  
  458. stfwd-try2b @ 0x0x4a02c0
  459. Benchmark Sample Cycles Nanos
  460. stfwd-try2 100 loads 1 254.87 170.00
  461. stfwd-try2 100 loads 2 254.87 170.00
  462. stfwd-try2 100 loads 3 254.87 170.00
  463. stfwd-try2 100 loads 4 269.87 180.00
  464. stfwd-try2 100 loads 5 254.87 170.00
  465. stfwd-try2 100 loads 6 254.87 170.00
  466. stfwd-try2 100 loads 7 254.87 170.00
  467. stfwd-try2 100 loads 8 254.87 170.00
  468. stfwd-try2 100 loads 9 269.87 180.00
  469. stfwd-try2 100 loads 10 254.87 170.00
  470. stfwd-try2 100 loads 11 254.87 170.00
  471. stfwd-try2 100 loads 12 254.87 170.00
  472. stfwd-try2 100 loads 13 254.87 170.00
  473. stfwd-try2 100 loads 14 269.87 180.00
  474. stfwd-try2 100 loads 15 254.87 170.00
  475. stfwd-try2 100 loads 16 254.87 170.00
  476. stfwd-try2 100 loads 17 254.87 170.00
  477. stfwd-try2 100 loads 18 254.87 170.00
  478. stfwd-try2 100 loads 19 269.87 180.00
  479. stfwd-try2 100 loads 20 254.87 170.00
  480.  
  481. stfwd-try2c @ 0x0x49d180
  482. Benchmark Sample Cycles Nanos
  483. trained loads 1 209.90 140.00
  484. trained loads 2 209.90 140.00
  485. trained loads 3 149.93 100.00
  486. trained loads 4 149.93 100.00
  487. trained loads 5 74.96 50.00
  488. trained loads 6 74.96 50.00
  489. trained loads 7 74.96 50.00
  490. trained loads 8 59.97 40.00
  491. trained loads 9 59.97 40.00
  492. trained loads 10 59.97 40.00
  493. trained loads 11 29.99 20.00
  494. trained loads 12 44.98 30.00
  495. trained loads 13 29.99 20.00
  496. trained loads 14 44.98 30.00
  497. trained loads 15 44.98 30.00
  498. trained loads 16 44.98 30.00
  499. trained loads 17 44.98 30.00
  500. trained loads 18 29.99 20.00
  501. trained loads 19 44.98 30.00
  502. trained loads 20 44.98 30.00
  503.  
  504.  
  505. ** Running benchmark group Miscellaneous tests **
  506. Benchmark Cycles Nanos
  507. 32-bit add-loop 2.50 1.67
  508. 64-bit add-loop 2.50 1.67
  509. Can port7 be used by loads 1.50 1.00
  510. Test micro-fused add 1.00 0.67
  511. Add-JO fusion 1.00 0.67
  512. Flag merge 1 1.24 0.83
  513. Flag merge 2 1.17 0.78
  514. Flag merge 3 1.24 0.83
  515. Loop weirdness fast 6.99 4.66
  516.  
  517. ** Running benchmark group Fusion tests from dendibakh blog **
  518. Benchmark Cycles Nanos
  519. Crosses 64-byte i-boundary 300.83 200.65
  520. No cross 64-byte i-boundary 173.95 116.02
  521. Fused (original) 1.38 0.92
  522. Fused (simple addr) 1.36 0.91
  523. Fused (add [reg + reg * 4], 1) 1.38 0.92
  524. Fused (add [reg], 1) 1.36 0.91
  525. Unfused (original) 1.61 1.07
  526. Fused summation 2.15 1.44
  527. Unfused summation 1.63 1.08
  528.  
  529. ** Running benchmark group BMI false-dependency tests **
  530. Benchmark Cycles Nanos
  531. dest-dependent tzcnt 0.50 0.34
  532. dest-dependent lzcnt 0.25 0.17
  533. dest-dependent popcnt 0.25 0.17
  534.  
  535. ** Running benchmark group retpoline tests **
  536. Benchmark Cycles Nanos
  537. Dense retpoline call pause 55.60 37.08
  538. Dense retpoline call lfence 55.48 37.01
  539. Dense indirect pred calls 4.15 2.77
  540. Dense indirect unpred calls 21.38 14.26
  541. Sparse retpo indep call pause 13.69 9.13
  542. Sparse retpo indep call lfence 15.43 10.29
  543. Sparse retpo dep call pause 46.79 31.21
  544. Sparse retpo dep call lfence 47.29 31.54
  545.  
  546. ** Running benchmark group Tests written in C++ **
  547. Benchmark Cycles Nanos
  548. Dependent inline divisions 16.99 11.33
  549. Dependent 64-bit divisions 16.99 11.33
  550. Independent inline divisions 14.53 9.69
  551. Independent divisions 14.53 9.69
  552. Linked-list w/ sentinel 9.74 6.49
  553. Linked-list w/ count 10.14 6.76
  554.  
  555. ** Running benchmark group Vector unit bypass latency **
  556. Benchmark Cycles Nanos
  557. movdqa [mem] -> paddb latency 10.99 7.33
  558. movdqu [mem] -> paddb latency 10.99 7.33
  559. movups [mem] -> paddb latency 10.99 7.33
  560. movupd [mem] -> paddb latency 10.99 7.33
  561. movq rax,xmm0 -> xmm0,rax lat 6.00 4.00
  562. movq rax,xmm0 -> xmm0,rax lat 6.00 4.00
  563.  
  564. ** Running benchmark group Vector load-load latency **
  565. Benchmark Cycles Nanos
  566. aligned movdqu load lat 9.99 6.67
  567. aligned vmovdqu load lat 9.99 6.67
  568. aligned lddqu load lat 9.99 6.67
  569. aligned vlddqu load lat 9.99 6.67
  570. misaligned movdqu load lat 10.99 7.33
  571. misaligned vmovdqu load lat 10.99 7.33
  572. misaligned lddqu load lat 10.99 7.33
  573. misaligned vlddqu load lat 10.99 7.33
  574.  
  575. ** Running benchmark group Call/ret benchmarks **
  576. Benchmark Cycles Nanos
  577. calls sparsed by 0 4.12 2.75
  578. calls sparsed by 1 4.19 2.79
  579. calls sparsed by 2 4.12 2.75
  580. calls sparsed by 3 4.25 2.83
  581. calls sparsed by 4 4.31 2.87
  582. calls sparsed by 5 5.00 3.33
  583. calls sparsed by 6 6.00 4.00
  584. calls sparsed by 7 7.00 4.67
  585. calls chained by 0 4.06 2.71
  586. calls chained by 1 4.06 2.71
  587. calls chained by 2 4.06 2.71
  588. calls chained by 3 4.06 2.71
  589. calls to pushpop fn 7.00 4.67
  590. calls to addrsp0 fn 13.99 9.33
  591. calls to addrsp8 fn 13.99 9.33
  592.  
  593. ** Running benchmark group Oneshot Group **
  594. dep-add-oneshot @ 0x0x494380
  595. Benchmark Sample Cycles Nanos
  596. Oneshot dep add chain 1 1.51 1.01
  597. Oneshot dep add chain 2 0.70 0.47
  598. Oneshot dep add chain 3 0.70 0.47
  599. Oneshot dep add chain 4 0.70 0.47
  600. Oneshot dep add chain 5 0.70 0.47
  601. Oneshot dep add chain 6 0.70 0.47
  602. Oneshot dep add chain 7 0.70 0.47
  603. Oneshot dep add chain 8 0.70 0.47
  604. Oneshot dep add chain 9 0.70 0.47
  605. Oneshot dep add chain 10 0.70 0.47
  606. Oneshot dep add chain 11 0.70 0.47
  607. Oneshot dep add chain 12 0.70 0.47
  608. Oneshot dep add chain 13 0.70 0.47
  609. Oneshot dep add chain 14 0.70 0.47
  610. Oneshot dep add chain 15 0.70 0.47
  611. Oneshot dep add chain 16 0.70 0.47
  612. Oneshot dep add chain 17 0.70 0.47
  613. Oneshot dep add chain 18 0.70 0.47
  614. Oneshot dep add chain 19 0.70 0.47
  615. Oneshot dep add chain 20 0.70 0.47
  616.  
  617. indep-add-oneshot @ 0x0x495ac0
  618. Benchmark Sample Cycles Nanos
  619. Oneshot indep add chain 1 2.51 1.68
  620. Oneshot indep add chain 2 0.19 0.12
  621. Oneshot indep add chain 3 0.26 0.18
  622. Oneshot indep add chain 4 0.22 0.15
  623. Oneshot indep add chain 5 0.22 0.15
  624. Oneshot indep add chain 6 0.22 0.15
  625. Oneshot indep add chain 7 0.22 0.15
  626. Oneshot indep add chain 8 0.26 0.18
  627. Oneshot indep add chain 9 0.22 0.15
  628. Oneshot indep add chain 10 0.22 0.15
  629. Oneshot indep add chain 11 0.22 0.15
  630. Oneshot indep add chain 12 0.22 0.15
  631. Oneshot indep add chain 13 0.26 0.18
  632. Oneshot indep add chain 14 0.22 0.15
  633. Oneshot indep add chain 15 0.22 0.15
  634. Oneshot indep add chain 16 0.22 0.15
  635. Oneshot indep add chain 17 0.22 0.15
  636. Oneshot indep add chain 18 0.26 0.18
  637. Oneshot indep add chain 19 0.22 0.15
  638. Oneshot indep add chain 20 0.22 0.15
  639.  
  640. dep-add128 @ 0x0x4941c0
  641. Benchmark Sample Cycles Nanos
  642. 128 dependent add instructions 1 3.98 2.66
  643. 128 dependent add instructions 2 0.59 0.39
  644. 128 dependent add instructions 3 0.59 0.39
  645. 128 dependent add instructions 4 0.59 0.39
  646. 128 dependent add instructions 5 0.59 0.39
  647. 128 dependent add instructions 6 0.59 0.39
  648. 128 dependent add instructions 7 0.59 0.39
  649. 128 dependent add instructions 8 0.59 0.39
  650. 128 dependent add instructions 9 0.70 0.47
  651. 128 dependent add instructions 10 0.70 0.47
  652. 128 dependent add instructions 11 0.70 0.47
  653. 128 dependent add instructions 12 0.70 0.47
  654. 128 dependent add instructions 13 0.70 0.47
  655. 128 dependent add instructions 14 0.70 0.47
  656. 128 dependent add instructions 15 0.70 0.47
  657. 128 dependent add instructions 16 0.70 0.47
  658. 128 dependent add instructions 17 0.59 0.39
  659. 128 dependent add instructions 18 0.59 0.39
  660. 128 dependent add instructions 19 0.59 0.39
  661. 128 dependent add instructions 20 0.59 0.39
  662.  
  663. oneshot-dummy-touch @ 0x0x494180
  664. Benchmark Sample Cycles Nanos
  665. Empty touched oneshot bench 1 44.98 30.00
  666. Empty touched oneshot bench 2 -14.99 -10.00
  667. Empty touched oneshot bench 3 -14.99 -10.00
  668. Empty touched oneshot bench 4 0.00 0.00
  669. Empty touched oneshot bench 5 0.00 0.00
  670. Empty touched oneshot bench 6 -14.99 -10.00
  671. Empty touched oneshot bench 7 -14.99 -10.00
  672. Empty touched oneshot bench 8 0.00 0.00
  673. Empty touched oneshot bench 9 0.00 0.00
  674. Empty touched oneshot bench 10 -14.99 -10.00
  675. Empty touched oneshot bench 11 -14.99 -10.00
  676. Empty touched oneshot bench 12 0.00 0.00
  677. Empty touched oneshot bench 13 0.00 0.00
  678. Empty touched oneshot bench 14 -14.99 -10.00
  679. Empty touched oneshot bench 15 -14.99 -10.00
  680. Empty touched oneshot bench 16 0.00 0.00
  681. Empty touched oneshot bench 17 -14.99 -10.00
  682. Empty touched oneshot bench 18 -14.99 -10.00
  683. Empty touched oneshot bench 19 -14.99 -10.00
  684. Empty touched oneshot bench 20 0.00 0.00
  685.  
  686. oneshot-dummy-notouch @ 0x0x494140
  687. Benchmark Sample Cycles Nanos
  688. Empty untouched oneshot bench 1 74.96 50.00
  689. Empty untouched oneshot bench 2 -14.99 -10.00
  690. Empty untouched oneshot bench 3 0.00 0.00
  691. Empty untouched oneshot bench 4 0.00 0.00
  692. Empty untouched oneshot bench 5 -14.99 -10.00
  693. Empty untouched oneshot bench 6 -14.99 -10.00
  694. Empty untouched oneshot bench 7 0.00 0.00
  695. Empty untouched oneshot bench 8 0.00 0.00
  696. Empty untouched oneshot bench 9 -14.99 -10.00
  697. Empty untouched oneshot bench 10 -14.99 -10.00
  698. Empty untouched oneshot bench 11 0.00 0.00
  699. Empty untouched oneshot bench 12 -14.99 -10.00
  700. Empty untouched oneshot bench 13 -14.99 -10.00
  701. Empty untouched oneshot bench 14 -14.99 -10.00
  702. Empty untouched oneshot bench 15 0.00 0.00
  703. Empty untouched oneshot bench 16 -14.99 -10.00
  704. Empty untouched oneshot bench 17 -14.99 -10.00
  705. Empty untouched oneshot bench 18 0.00 0.00
  706. Empty untouched oneshot bench 19 0.00 0.00
  707. Empty untouched oneshot bench 20 -14.99 -10.00
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement