Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Welcome to uarch-bench (e1d92fb-dirty)
- Median CPU speed: 1.499 GHz
- Running benchmarks groups using timer clock
- ** Running benchmark group Default Group **
- Benchmark Cycles Nanos
- Dependent add chain 1.00 0.67
- Independent add chain 0.25 0.17
- Dependent imul 64->128 3.00 2.00
- Dependent imul 64->64 3.00 2.00
- Independent imul 64->128 2.00 1.33
- Same location stores 1.00 0.67
- Disjoint location stores 1.00 0.67
- Dependent push/pop chain 7.00 4.67
- Inependent push/pop chain 1.00 0.67
- ** Inverse throughput for load/16-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- 16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- 32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- 48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- ** Inverse throughput for load/32-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
- 16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0
- 32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
- 48 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0
- ** Inverse throughput for load/64-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
- 16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- 32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
- 48 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- ** Inverse throughput for load/128-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
- 16 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- 32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
- 48 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
- ** Inverse throughput for load/256-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
- 16 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
- 32 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
- 48 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
- ** Inverse throughput for store/16-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
- 16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
- 32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
- 48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
- ** Inverse throughput for store/32-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
- 16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
- 32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
- 48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
- ** Inverse throughput for store/64-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
- 16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
- 32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
- 48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
- ** Inverse throughput for store/128-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
- 16 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
- 32 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
- 48 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
- ** Inverse throughput for store/256-bit **
- offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- 0 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
- 16 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
- 32 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
- 48 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
- ** Running benchmark group Parallel load/prefetches from fixed-size regions **
- Benchmark Cycles Nanos
- 16-KiB parallel-loads 0.53 0.35
- 16-KiB parallel-prefetcht0 0.50 0.33
- 16-KiB parallel-prefetcht1 0.50 0.33
- 16-KiB parallel-prefetcht2 0.50 0.33
- 16-KiB parallel-prefetchnta 0.50 0.33
- 32-KiB parallel-loads 0.53 0.35
- 32-KiB parallel-prefetcht0 0.50 0.33
- 32-KiB parallel-prefetcht1 0.50 0.33
- 32-KiB parallel-prefetcht2 0.50 0.33
- 32-KiB parallel-prefetchnta 0.50 0.33
- 64-KiB parallel-loads 2.00 1.33
- 64-KiB parallel-prefetcht0 0.50 0.33
- 64-KiB parallel-prefetcht1 0.50 0.33
- 64-KiB parallel-prefetcht2 0.50 0.33
- 64-KiB parallel-prefetchnta 0.50 0.33
- 128-KiB parallel-loads 2.00 1.33
- 128-KiB parallel-prefetcht0 0.50 0.33
- 128-KiB parallel-prefetcht1 0.50 0.33
- 128-KiB parallel-prefetcht2 0.50 0.33
- 128-KiB parallel-prefetchnta 0.50 0.33
- 256-KiB parallel-loads 2.00 1.33
- 256-KiB parallel-prefetcht0 0.50 0.33
- 256-KiB parallel-prefetcht1 0.50 0.33
- 256-KiB parallel-prefetcht2 0.50 0.33
- 256-KiB parallel-prefetchnta 0.50 0.33
- 512-KiB parallel-loads 2.01 1.34
- 512-KiB parallel-prefetcht0 0.50 0.33
- 512-KiB parallel-prefetcht1 0.50 0.33
- 512-KiB parallel-prefetcht2 0.50 0.34
- 512-KiB parallel-prefetchnta 0.50 0.33
- 2048-KiB parallel-loads 2.06 1.37
- 2048-KiB parallel-prefetcht0 0.51 0.34
- 2048-KiB parallel-prefetcht1 0.51 0.34
- 2048-KiB parallel-prefetcht2 0.51 0.34
- 2048-KiB parallel-prefetchnta 0.50 0.33
- ** Running benchmark group Serial loads from fixed-size regions **
- Benchmark Cycles Nanos
- 16-KiB serial loads 4.00 2.67
- 24-KiB serial loads 4.00 2.67
- 30-KiB serial loads 4.00 2.67
- 31-KiB serial loads 4.00 2.67
- 32-KiB serial loads 4.00 2.67
- 33-KiB serial loads 5.11 3.41
- 34-KiB serial loads 5.76 3.84
- 35-KiB serial loads 8.53 5.69
- 40-KiB serial loads 12.06 8.04
- 48-KiB serial loads 12.16 8.11
- 56-KiB serial loads 12.06 8.05
- 64-KiB serial loads 12.05 8.03
- 80-KiB serial loads 12.06 8.04
- 96-KiB serial loads 12.07 8.05
- 112-KiB serial loads 12.08 8.06
- 128-KiB serial loads 12.05 8.04
- 196-KiB serial loads 12.06 8.05
- 252-KiB serial loads 12.06 8.04
- 256-KiB serial loads 12.06 8.04
- 260-KiB serial loads 12.19 8.13
- 384-KiB serial loads 17.28 11.53
- 512-KiB serial loads 21.46 14.31
- 1024-KiB serial loads 35.88 23.93
- 2048-KiB serial loads 39.74 26.51
- ** Running benchmark group Store forwaring latency and throughput **
- Benchmark Cycles Nanos
- Store forward latency delay 0 6.99 4.66
- Store forward latency delay 1 6.99 4.66
- Store forward latency delay 2 6.99 4.66
- Store forward latency delay 3 6.99 4.66
- Store forward latency delay 4 6.99 4.66
- Store forward latency delay 5 6.31 4.21
- Store fwd tput concurrency 1 6.99 4.66
- Store fwd tput concurrency 2 3.50 2.33
- Store fwd tput concurrency 3 2.33 1.55
- Store fwd tput concurrency 4 1.75 1.17
- Store fwd tput concurrency 5 1.40 0.93
- Store fwd tput concurrency 6 1.17 0.78
- Store fwd tput concurrency 7 1.06 0.71
- Store fwd tput concurrency 8 1.00 0.67
- Store fwd tput concurrency 9 1.00 0.67
- Store fwd tput concurrency 10 1.00 0.67
- ** Running benchmark group Store forwaring latency and throughput **
- ---------- Oneshot calibration start --------------
- Benchmark Cycles Nanos
- Oneshot overhead min 89.96 60.00
- Oneshot overhead median (used) 104.95 70.00
- Oneshot overhead max 104.95 70.00
- ---------- Oneshot calibration end --------------
- oneshot-dummy @ 0x0x494100
- Benchmark Sample Cycles Nanos
- Empty oneshot bench 1 0.00 0.00
- Empty oneshot bench 2 0.00 0.00
- Empty oneshot bench 3 0.00 0.00
- Empty oneshot bench 4 -14.99 -10.00
- Empty oneshot bench 5 -14.99 -10.00
- Empty oneshot bench 6 0.00 0.00
- Empty oneshot bench 7 -14.99 -10.00
- Empty oneshot bench 8 -14.99 -10.00
- Empty oneshot bench 9 -14.99 -10.00
- Empty oneshot bench 10 0.00 0.00
- Empty oneshot bench 11 -14.99 -10.00
- Empty oneshot bench 12 -14.99 -10.00
- Empty oneshot bench 13 0.00 0.00
- Empty oneshot bench 14 0.00 0.00
- Empty oneshot bench 15 -14.99 -10.00
- Empty oneshot bench 16 -14.99 -10.00
- Empty oneshot bench 17 0.00 0.00
- Empty oneshot bench 18 0.00 0.00
- Empty oneshot bench 19 -14.99 -10.00
- Empty oneshot bench 20 -14.99 -10.00
- oneshot-latency-2 @ 0x0x4a11c0
- Benchmark Sample Cycles Nanos
- StFwd oneshot lat (delay 2) 1144767.62 96560.00
- StFwd oneshot lat (delay 2) 2144482.76 96370.00
- StFwd oneshot lat (delay 2) 3144481.26 96369.00
- StFwd oneshot lat (delay 2) 4144482.76 96370.00
- StFwd oneshot lat (delay 2) 5144467.77 96360.00
- StFwd oneshot lat (delay 2) 6144481.26 96369.00
- StFwd oneshot lat (delay 2) 7144482.76 96370.00
- StFwd oneshot lat (delay 2) 8144481.26 96369.00
- StFwd oneshot lat (delay 2) 9144482.76 96370.00
- StFwd oneshot lat (delay 2) 10144467.77 96360.00
- StFwd oneshot lat (delay 2) 11144481.26 96369.00
- StFwd oneshot lat (delay 2) 12144482.76 96370.00
- StFwd oneshot lat (delay 2) 13144482.76 96370.00
- StFwd oneshot lat (delay 2) 14144481.26 96369.00
- StFwd oneshot lat (delay 2) 15144482.76 96370.00
- StFwd oneshot lat (delay 2) 16151587.71101109.00
- StFwd oneshot lat (delay 2) 17144482.76 96370.00
- StFwd oneshot lat (delay 2) 18144482.76 96370.00
- StFwd oneshot lat (delay 2) 19144481.26 96369.00
- StFwd oneshot lat (delay 2) 20144482.76 96370.00
- oneshot-latency-1 @ 0x0x4a0e40
- Benchmark Sample Cycles Nanos
- StFwd oneshot lat (delay 1) 1144767.62 96560.00
- StFwd oneshot lat (delay 1) 2144482.76 96370.00
- StFwd oneshot lat (delay 1) 3144481.26 96369.00
- StFwd oneshot lat (delay 1) 4144482.76 96370.00
- StFwd oneshot lat (delay 1) 5144482.76 96370.00
- StFwd oneshot lat (delay 1) 6144481.26 96369.00
- StFwd oneshot lat (delay 1) 7144482.76 96370.00
- StFwd oneshot lat (delay 1) 8144482.76 96370.00
- StFwd oneshot lat (delay 1) 9144481.26 96369.00
- StFwd oneshot lat (delay 1) 10144482.76 96370.00
- StFwd oneshot lat (delay 1) 11144466.27 96359.00
- StFwd oneshot lat (delay 1) 12144467.77 96360.00
- StFwd oneshot lat (delay 1) 13144482.76 96370.00
- StFwd oneshot lat (delay 1) 14144481.26 96369.00
- StFwd oneshot lat (delay 1) 15144482.76 96370.00
- StFwd oneshot lat (delay 1) 16144482.76 96370.00
- StFwd oneshot lat (delay 1) 17144481.26 96369.00
- StFwd oneshot lat (delay 1) 18144482.76 96370.00
- StFwd oneshot lat (delay 1) 19144482.76 96370.00
- StFwd oneshot lat (delay 1) 20144481.26 96369.00
- oneshot-latency-0 @ 0x0x4a0ac0
- Benchmark Sample Cycles Nanos
- StFwd oneshot lat (delay 0) 1144691.15 96509.00
- StFwd oneshot lat (delay 0) 2144482.76 96370.00
- StFwd oneshot lat (delay 0) 3144482.76 96370.00
- StFwd oneshot lat (delay 0) 4144481.26 96369.00
- StFwd oneshot lat (delay 0) 5144482.76 96370.00
- StFwd oneshot lat (delay 0) 6144482.76 96370.00
- StFwd oneshot lat (delay 0) 7144481.26 96369.00
- StFwd oneshot lat (delay 0) 8144467.77 96360.00
- StFwd oneshot lat (delay 0) 9152052.47101419.00
- StFwd oneshot lat (delay 0) 10144482.76 96370.00
- StFwd oneshot lat (delay 0) 11144482.76 96370.00
- StFwd oneshot lat (delay 0) 12144481.26 96369.00
- StFwd oneshot lat (delay 0) 13144482.76 96370.00
- StFwd oneshot lat (delay 0) 14144482.76 96370.00
- StFwd oneshot lat (delay 0) 15144481.26 96369.00
- StFwd oneshot lat (delay 0) 16144482.76 96370.00
- StFwd oneshot lat (delay 0) 17144466.27 96359.00
- StFwd oneshot lat (delay 0) 18144467.77 96360.00
- StFwd oneshot lat (delay 0) 19144467.77 96360.00
- StFwd oneshot lat (delay 0) 20144466.27 96359.00
- ** Running benchmark group Store forward attempts **
- oneshot-dummy @ 0x0x494100
- Benchmark Sample Cycles Nanos
- Empty oneshot bench 1 -14.99 -10.00
- Empty oneshot bench 2 -14.99 -10.00
- Empty oneshot bench 3 -14.99 -10.00
- Empty oneshot bench 4 -14.99 -10.00
- Empty oneshot bench 5 0.00 0.00
- Empty oneshot bench 6 0.00 0.00
- Empty oneshot bench 7 -14.99 -10.00
- Empty oneshot bench 8 -14.99 -10.00
- Empty oneshot bench 9 -14.99 -10.00
- Empty oneshot bench 10 0.00 0.00
- Empty oneshot bench 11 -14.99 -10.00
- Empty oneshot bench 12 -14.99 -10.00
- Empty oneshot bench 13 0.00 0.00
- Empty oneshot bench 14 0.00 0.00
- Empty oneshot bench 15 -14.99 -10.00
- Empty oneshot bench 16 -14.99 -10.00
- Empty oneshot bench 17 0.00 0.00
- Empty oneshot bench 18 -14.99 -10.00
- Empty oneshot bench 19 -14.99 -10.00
- Empty oneshot bench 20 -14.99 -10.00
- stfwd-try1 @ 0x0x4a0780
- Benchmark Sample Cycles Nanos
- stfwd-try1 1 674.66 450.00
- stfwd-try1 2 89.96 60.00
- stfwd-try1 3 89.96 60.00
- stfwd-try1 4 89.96 60.00
- stfwd-try1 5 89.96 60.00
- stfwd-try1 6 74.96 50.00
- stfwd-try1 7 89.96 60.00
- stfwd-try1 8 74.96 50.00
- stfwd-try1 9 89.96 60.00
- stfwd-try1 10 74.96 50.00
- stfwd-try1 11 74.96 50.00
- stfwd-try1 12 74.96 50.00
- stfwd-try1 13 74.96 50.00
- stfwd-try1 14 74.96 50.00
- stfwd-try1 15 89.96 60.00
- stfwd-try1 16 74.96 50.00
- stfwd-try1 17 89.96 60.00
- stfwd-try1 18 74.96 50.00
- stfwd-try1 19 89.96 60.00
- stfwd-try1 20 89.96 60.00
- stfwd-try2 @ 0x0x4a02c0
- Benchmark Sample Cycles Nanos
- stfwd-try2 100 loads 1 614.69 410.00
- stfwd-try2 100 loads 2 3658.17 2440.00
- stfwd-try2 100 loads 3 254.87 170.00
- stfwd-try2 100 loads 4 254.87 170.00
- stfwd-try2 100 loads 5 254.87 170.00
- stfwd-try2 100 loads 6 254.87 170.00
- stfwd-try2 100 loads 7 269.87 180.00
- stfwd-try2 100 loads 8 254.87 170.00
- stfwd-try2 100 loads 9 254.87 170.00
- stfwd-try2 100 loads 10 254.87 170.00
- stfwd-try2 100 loads 11 254.87 170.00
- stfwd-try2 100 loads 12 269.87 180.00
- stfwd-try2 100 loads 13 254.87 170.00
- stfwd-try2 100 loads 14 254.87 170.00
- stfwd-try2 100 loads 15 254.87 170.00
- stfwd-try2 100 loads 16 254.87 170.00
- stfwd-try2 100 loads 17 269.87 180.00
- stfwd-try2 100 loads 18 254.87 170.00
- stfwd-try2 100 loads 19 254.87 170.00
- stfwd-try2 100 loads 20 254.87 170.00
- stfwd-try2-4 @ 0x0x49d200
- Benchmark Sample Cycles Nanos
- stfwd-try2 4 loads 1 74.96 50.00
- stfwd-try2 4 loads 2 164.92 110.00
- stfwd-try2 4 loads 3 -14.99 -10.00
- stfwd-try2 4 loads 4 0.00 0.00
- stfwd-try2 4 loads 5 0.00 0.00
- stfwd-try2 4 loads 6 -14.99 -10.00
- stfwd-try2 4 loads 7 -14.99 -10.00
- stfwd-try2 4 loads 8 -14.99 -10.00
- stfwd-try2 4 loads 9 0.00 0.00
- stfwd-try2 4 loads 10 0.00 0.00
- stfwd-try2 4 loads 11 -14.99 -10.00
- stfwd-try2 4 loads 12 -14.99 -10.00
- stfwd-try2 4 loads 13 -14.99 -10.00
- stfwd-try2 4 loads 14 0.00 0.00
- stfwd-try2 4 loads 15 0.00 0.00
- stfwd-try2 4 loads 16 -14.99 -10.00
- stfwd-try2 4 loads 17 -14.99 -10.00
- stfwd-try2 4 loads 18 -14.99 -10.00
- stfwd-try2 4 loads 19 0.00 0.00
- stfwd-try2 4 loads 20 0.00 0.00
- stfwd-try2-10 @ 0x0x49d240
- Benchmark Sample Cycles Nanos
- stfwd-try2 10 loads 1 44.98 30.00
- stfwd-try2 10 loads 2 389.81 260.00
- stfwd-try2 10 loads 3 0.00 0.00
- stfwd-try2 10 loads 4 0.00 0.00
- stfwd-try2 10 loads 5 0.00 0.00
- stfwd-try2 10 loads 6 0.00 0.00
- stfwd-try2 10 loads 7 -14.99 -10.00
- stfwd-try2 10 loads 8 -14.99 -10.00
- stfwd-try2 10 loads 9 -14.99 -10.00
- stfwd-try2 10 loads 10 0.00 0.00
- stfwd-try2 10 loads 11 0.00 0.00
- stfwd-try2 10 loads 12 0.00 0.00
- stfwd-try2 10 loads 13 0.00 0.00
- stfwd-try2 10 loads 14 0.00 0.00
- stfwd-try2 10 loads 15 -14.99 -10.00
- stfwd-try2 10 loads 16 -14.99 -10.00
- stfwd-try2 10 loads 17 0.00 0.00
- stfwd-try2 10 loads 18 0.00 0.00
- stfwd-try2 10 loads 19 0.00 0.00
- stfwd-try2 10 loads 20 0.00 0.00
- stfwd-try2-20 @ 0x0x4a01c0
- Benchmark Sample Cycles Nanos
- stfwd-try2 20 loads 1 509.75 340.00
- stfwd-try2 20 loads 2 734.63 490.00
- stfwd-try2 20 loads 3 14.99 10.00
- stfwd-try2 20 loads 4 29.99 20.00
- stfwd-try2 20 loads 5 14.99 10.00
- stfwd-try2 20 loads 6 29.99 20.00
- stfwd-try2 20 loads 7 14.99 10.00
- stfwd-try2 20 loads 8 14.99 10.00
- stfwd-try2 20 loads 9 14.99 10.00
- stfwd-try2 20 loads 10 14.99 10.00
- stfwd-try2 20 loads 11 14.99 10.00
- stfwd-try2 20 loads 12 14.99 10.00
- stfwd-try2 20 loads 13 14.99 10.00
- stfwd-try2 20 loads 14 29.99 20.00
- stfwd-try2 20 loads 15 14.99 10.00
- stfwd-try2 20 loads 16 29.99 20.00
- stfwd-try2 20 loads 17 14.99 10.00
- stfwd-try2 20 loads 18 14.99 10.00
- stfwd-try2 20 loads 19 14.99 10.00
- stfwd-try2 20 loads 20 14.99 10.00
- stfwd-try2-1000 @ 0x0x49d2c0
- Benchmark Sample Cycles Nanos
- stfwd-try2 1000 loads 1 32188.91 21470.00
- stfwd-try2 1000 loads 2 36236.88 24170.00
- stfwd-try2 1000 loads 3 2968.52 1980.00
- stfwd-try2 1000 loads 4 2968.52 1980.00
- stfwd-try2 1000 loads 5 2953.52 1970.00
- stfwd-try2 1000 loads 6 2953.52 1970.00
- stfwd-try2 1000 loads 7 2968.52 1980.00
- stfwd-try2 1000 loads 8 2968.52 1980.00
- stfwd-try2 1000 loads 9 2968.52 1980.00
- stfwd-try2 1000 loads 10 2953.52 1970.00
- stfwd-try2 1000 loads 11 2953.52 1970.00
- stfwd-try2 1000 loads 12 2968.52 1980.00
- stfwd-try2 1000 loads 13 2968.52 1980.00
- stfwd-try2 1000 loads 14 2968.52 1980.00
- stfwd-try2 1000 loads 15 2953.52 1970.00
- stfwd-try2 1000 loads 16 2953.52 1970.00
- stfwd-try2 1000 loads 17 2968.52 1980.00
- stfwd-try2 1000 loads 18 2968.52 1980.00
- stfwd-try2 1000 loads 19 2968.52 1980.00
- stfwd-try2 1000 loads 20 2953.52 1970.00
- stfwd-try2-1000w @ 0x0x49d2c0
- Benchmark Sample Cycles Nanos
- stfwd-try2 1000 loads warm 1 2983.51 1990.00
- stfwd-try2 1000 loads warm 2 36236.88 24170.00
- stfwd-try2 1000 loads warm 3 36236.88 24170.00
- stfwd-try2 1000 loads warm 4 36236.88 24170.00
- stfwd-try2 1000 loads warm 5 36251.87 24180.00
- stfwd-try2 1000 loads warm 6 36236.88 24170.00
- stfwd-try2 1000 loads warm 7 36236.88 24170.00
- stfwd-try2 1000 loads warm 8 36236.88 24170.00
- stfwd-try2 1000 loads warm 9 36236.88 24170.00
- stfwd-try2 1000 loads warm 10 36235.38 24169.00
- stfwd-try2 1000 loads warm 11 36236.88 24170.00
- stfwd-try2 1000 loads warm 12 36236.88 24170.00
- stfwd-try2 1000 loads warm 13 36236.88 24170.00
- stfwd-try2 1000 loads warm 14 36251.87 24180.00
- stfwd-try2 1000 loads warm 15 36236.88 24170.00
- stfwd-try2 1000 loads warm 16 36236.88 24170.00
- stfwd-try2 1000 loads warm 17 36236.88 24170.00
- stfwd-try2 1000 loads warm 18 36236.88 24170.00
- stfwd-try2 1000 loads warm 19 36236.88 24170.00
- stfwd-try2 1000 loads warm 20 36235.38 24169.00
- stfwd-try2b @ 0x0x4a02c0
- Benchmark Sample Cycles Nanos
- stfwd-try2 100 loads 1 254.87 170.00
- stfwd-try2 100 loads 2 254.87 170.00
- stfwd-try2 100 loads 3 254.87 170.00
- stfwd-try2 100 loads 4 269.87 180.00
- stfwd-try2 100 loads 5 254.87 170.00
- stfwd-try2 100 loads 6 254.87 170.00
- stfwd-try2 100 loads 7 254.87 170.00
- stfwd-try2 100 loads 8 254.87 170.00
- stfwd-try2 100 loads 9 269.87 180.00
- stfwd-try2 100 loads 10 254.87 170.00
- stfwd-try2 100 loads 11 254.87 170.00
- stfwd-try2 100 loads 12 254.87 170.00
- stfwd-try2 100 loads 13 254.87 170.00
- stfwd-try2 100 loads 14 269.87 180.00
- stfwd-try2 100 loads 15 254.87 170.00
- stfwd-try2 100 loads 16 254.87 170.00
- stfwd-try2 100 loads 17 254.87 170.00
- stfwd-try2 100 loads 18 254.87 170.00
- stfwd-try2 100 loads 19 269.87 180.00
- stfwd-try2 100 loads 20 254.87 170.00
- stfwd-try2c @ 0x0x49d180
- Benchmark Sample Cycles Nanos
- trained loads 1 209.90 140.00
- trained loads 2 209.90 140.00
- trained loads 3 149.93 100.00
- trained loads 4 149.93 100.00
- trained loads 5 74.96 50.00
- trained loads 6 74.96 50.00
- trained loads 7 74.96 50.00
- trained loads 8 59.97 40.00
- trained loads 9 59.97 40.00
- trained loads 10 59.97 40.00
- trained loads 11 29.99 20.00
- trained loads 12 44.98 30.00
- trained loads 13 29.99 20.00
- trained loads 14 44.98 30.00
- trained loads 15 44.98 30.00
- trained loads 16 44.98 30.00
- trained loads 17 44.98 30.00
- trained loads 18 29.99 20.00
- trained loads 19 44.98 30.00
- trained loads 20 44.98 30.00
- ** Running benchmark group Miscellaneous tests **
- Benchmark Cycles Nanos
- 32-bit add-loop 2.50 1.67
- 64-bit add-loop 2.50 1.67
- Can port7 be used by loads 1.50 1.00
- Test micro-fused add 1.00 0.67
- Add-JO fusion 1.00 0.67
- Flag merge 1 1.24 0.83
- Flag merge 2 1.17 0.78
- Flag merge 3 1.24 0.83
- Loop weirdness fast 6.99 4.66
- ** Running benchmark group Fusion tests from dendibakh blog **
- Benchmark Cycles Nanos
- Crosses 64-byte i-boundary 300.83 200.65
- No cross 64-byte i-boundary 173.95 116.02
- Fused (original) 1.38 0.92
- Fused (simple addr) 1.36 0.91
- Fused (add [reg + reg * 4], 1) 1.38 0.92
- Fused (add [reg], 1) 1.36 0.91
- Unfused (original) 1.61 1.07
- Fused summation 2.15 1.44
- Unfused summation 1.63 1.08
- ** Running benchmark group BMI false-dependency tests **
- Benchmark Cycles Nanos
- dest-dependent tzcnt 0.50 0.34
- dest-dependent lzcnt 0.25 0.17
- dest-dependent popcnt 0.25 0.17
- ** Running benchmark group retpoline tests **
- Benchmark Cycles Nanos
- Dense retpoline call pause 55.60 37.08
- Dense retpoline call lfence 55.48 37.01
- Dense indirect pred calls 4.15 2.77
- Dense indirect unpred calls 21.38 14.26
- Sparse retpo indep call pause 13.69 9.13
- Sparse retpo indep call lfence 15.43 10.29
- Sparse retpo dep call pause 46.79 31.21
- Sparse retpo dep call lfence 47.29 31.54
- ** Running benchmark group Tests written in C++ **
- Benchmark Cycles Nanos
- Dependent inline divisions 16.99 11.33
- Dependent 64-bit divisions 16.99 11.33
- Independent inline divisions 14.53 9.69
- Independent divisions 14.53 9.69
- Linked-list w/ sentinel 9.74 6.49
- Linked-list w/ count 10.14 6.76
- ** Running benchmark group Vector unit bypass latency **
- Benchmark Cycles Nanos
- movdqa [mem] -> paddb latency 10.99 7.33
- movdqu [mem] -> paddb latency 10.99 7.33
- movups [mem] -> paddb latency 10.99 7.33
- movupd [mem] -> paddb latency 10.99 7.33
- movq rax,xmm0 -> xmm0,rax lat 6.00 4.00
- movq rax,xmm0 -> xmm0,rax lat 6.00 4.00
- ** Running benchmark group Vector load-load latency **
- Benchmark Cycles Nanos
- aligned movdqu load lat 9.99 6.67
- aligned vmovdqu load lat 9.99 6.67
- aligned lddqu load lat 9.99 6.67
- aligned vlddqu load lat 9.99 6.67
- misaligned movdqu load lat 10.99 7.33
- misaligned vmovdqu load lat 10.99 7.33
- misaligned lddqu load lat 10.99 7.33
- misaligned vlddqu load lat 10.99 7.33
- ** Running benchmark group Call/ret benchmarks **
- Benchmark Cycles Nanos
- calls sparsed by 0 4.12 2.75
- calls sparsed by 1 4.19 2.79
- calls sparsed by 2 4.12 2.75
- calls sparsed by 3 4.25 2.83
- calls sparsed by 4 4.31 2.87
- calls sparsed by 5 5.00 3.33
- calls sparsed by 6 6.00 4.00
- calls sparsed by 7 7.00 4.67
- calls chained by 0 4.06 2.71
- calls chained by 1 4.06 2.71
- calls chained by 2 4.06 2.71
- calls chained by 3 4.06 2.71
- calls to pushpop fn 7.00 4.67
- calls to addrsp0 fn 13.99 9.33
- calls to addrsp8 fn 13.99 9.33
- ** Running benchmark group Oneshot Group **
- dep-add-oneshot @ 0x0x494380
- Benchmark Sample Cycles Nanos
- Oneshot dep add chain 1 1.51 1.01
- Oneshot dep add chain 2 0.70 0.47
- Oneshot dep add chain 3 0.70 0.47
- Oneshot dep add chain 4 0.70 0.47
- Oneshot dep add chain 5 0.70 0.47
- Oneshot dep add chain 6 0.70 0.47
- Oneshot dep add chain 7 0.70 0.47
- Oneshot dep add chain 8 0.70 0.47
- Oneshot dep add chain 9 0.70 0.47
- Oneshot dep add chain 10 0.70 0.47
- Oneshot dep add chain 11 0.70 0.47
- Oneshot dep add chain 12 0.70 0.47
- Oneshot dep add chain 13 0.70 0.47
- Oneshot dep add chain 14 0.70 0.47
- Oneshot dep add chain 15 0.70 0.47
- Oneshot dep add chain 16 0.70 0.47
- Oneshot dep add chain 17 0.70 0.47
- Oneshot dep add chain 18 0.70 0.47
- Oneshot dep add chain 19 0.70 0.47
- Oneshot dep add chain 20 0.70 0.47
- indep-add-oneshot @ 0x0x495ac0
- Benchmark Sample Cycles Nanos
- Oneshot indep add chain 1 2.51 1.68
- Oneshot indep add chain 2 0.19 0.12
- Oneshot indep add chain 3 0.26 0.18
- Oneshot indep add chain 4 0.22 0.15
- Oneshot indep add chain 5 0.22 0.15
- Oneshot indep add chain 6 0.22 0.15
- Oneshot indep add chain 7 0.22 0.15
- Oneshot indep add chain 8 0.26 0.18
- Oneshot indep add chain 9 0.22 0.15
- Oneshot indep add chain 10 0.22 0.15
- Oneshot indep add chain 11 0.22 0.15
- Oneshot indep add chain 12 0.22 0.15
- Oneshot indep add chain 13 0.26 0.18
- Oneshot indep add chain 14 0.22 0.15
- Oneshot indep add chain 15 0.22 0.15
- Oneshot indep add chain 16 0.22 0.15
- Oneshot indep add chain 17 0.22 0.15
- Oneshot indep add chain 18 0.26 0.18
- Oneshot indep add chain 19 0.22 0.15
- Oneshot indep add chain 20 0.22 0.15
- dep-add128 @ 0x0x4941c0
- Benchmark Sample Cycles Nanos
- 128 dependent add instructions 1 3.98 2.66
- 128 dependent add instructions 2 0.59 0.39
- 128 dependent add instructions 3 0.59 0.39
- 128 dependent add instructions 4 0.59 0.39
- 128 dependent add instructions 5 0.59 0.39
- 128 dependent add instructions 6 0.59 0.39
- 128 dependent add instructions 7 0.59 0.39
- 128 dependent add instructions 8 0.59 0.39
- 128 dependent add instructions 9 0.70 0.47
- 128 dependent add instructions 10 0.70 0.47
- 128 dependent add instructions 11 0.70 0.47
- 128 dependent add instructions 12 0.70 0.47
- 128 dependent add instructions 13 0.70 0.47
- 128 dependent add instructions 14 0.70 0.47
- 128 dependent add instructions 15 0.70 0.47
- 128 dependent add instructions 16 0.70 0.47
- 128 dependent add instructions 17 0.59 0.39
- 128 dependent add instructions 18 0.59 0.39
- 128 dependent add instructions 19 0.59 0.39
- 128 dependent add instructions 20 0.59 0.39
- oneshot-dummy-touch @ 0x0x494180
- Benchmark Sample Cycles Nanos
- Empty touched oneshot bench 1 44.98 30.00
- Empty touched oneshot bench 2 -14.99 -10.00
- Empty touched oneshot bench 3 -14.99 -10.00
- Empty touched oneshot bench 4 0.00 0.00
- Empty touched oneshot bench 5 0.00 0.00
- Empty touched oneshot bench 6 -14.99 -10.00
- Empty touched oneshot bench 7 -14.99 -10.00
- Empty touched oneshot bench 8 0.00 0.00
- Empty touched oneshot bench 9 0.00 0.00
- Empty touched oneshot bench 10 -14.99 -10.00
- Empty touched oneshot bench 11 -14.99 -10.00
- Empty touched oneshot bench 12 0.00 0.00
- Empty touched oneshot bench 13 0.00 0.00
- Empty touched oneshot bench 14 -14.99 -10.00
- Empty touched oneshot bench 15 -14.99 -10.00
- Empty touched oneshot bench 16 0.00 0.00
- Empty touched oneshot bench 17 -14.99 -10.00
- Empty touched oneshot bench 18 -14.99 -10.00
- Empty touched oneshot bench 19 -14.99 -10.00
- Empty touched oneshot bench 20 0.00 0.00
- oneshot-dummy-notouch @ 0x0x494140
- Benchmark Sample Cycles Nanos
- Empty untouched oneshot bench 1 74.96 50.00
- Empty untouched oneshot bench 2 -14.99 -10.00
- Empty untouched oneshot bench 3 0.00 0.00
- Empty untouched oneshot bench 4 0.00 0.00
- Empty untouched oneshot bench 5 -14.99 -10.00
- Empty untouched oneshot bench 6 -14.99 -10.00
- Empty untouched oneshot bench 7 0.00 0.00
- Empty untouched oneshot bench 8 0.00 0.00
- Empty untouched oneshot bench 9 -14.99 -10.00
- Empty untouched oneshot bench 10 -14.99 -10.00
- Empty untouched oneshot bench 11 0.00 0.00
- Empty untouched oneshot bench 12 -14.99 -10.00
- Empty untouched oneshot bench 13 -14.99 -10.00
- Empty untouched oneshot bench 14 -14.99 -10.00
- Empty untouched oneshot bench 15 0.00 0.00
- Empty untouched oneshot bench 16 -14.99 -10.00
- Empty untouched oneshot bench 17 -14.99 -10.00
- Empty untouched oneshot bench 18 0.00 0.00
- Empty untouched oneshot bench 19 0.00 0.00
- Empty untouched oneshot bench 20 -14.99 -10.00
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement