Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- root@pserver1:~# ./stream
- -------------------------------------------------------------
- STREAM version $Revision: 5.10 $
- -------------------------------------------------------------
- This system uses 8 bytes per array element.
- -------------------------------------------------------------
- Array size = 2500000 (elements), Offset = 0 (elements)
- Memory per array = 19.1 MiB (= 0.0 GiB).
- Total memory required = 57.2 MiB (= 0.1 GiB).
- Each kernel will be executed 10 times.
- The *best* time for each kernel (excluding the first iteration)
- will be used to compute the reported bandwidth.
- -------------------------------------------------------------
- Number of Threads requested = 48
- Number of Threads counted = 48
- -------------------------------------------------------------
- Your clock granularity/precision appears to be 1 microseconds.
- Each test below will take on the order of 5883 microseconds.
- (= 5883 clock ticks)
- Increase the size of the arrays if this shows that
- you are not getting at least 20 clock ticks per test.
- -------------------------------------------------------------
- WARNING -- The above is only a rough guideline.
- For best results, please be sure you know the
- precision of your system timer.
- -------------------------------------------------------------
- Function Best Rate MB/s Avg time Min time Max time
- Copy: 31034.4 0.001320 0.001289 0.001436
- Scale: 17912.9 0.002265 0.002233 0.002377
- Add: 19867.2 0.003085 0.003020 0.003259
- Triad: 20100.5 0.003202 0.002985 0.003579
- -------------------------------------------------------------
- Solution Validates: avg error less than 1.000000e-13 on all three arrays
- -------------------------------------------------------------
- root@pserver1:~# numastat
- node0 node1 node2 node3
- numa_hit 426500 786579 163910 896998
- numa_miss 0 0 0 0
- numa_foreign 0 0 0 0
- interleave_hit 7305 7252 7313 7248
- local_node 425470 779319 156510 889659
- other_node 1030 7260 7400 7339
- node4 node5 node6 node7
- numa_hit 93984 235942 47527 102039
- numa_miss 0 0 0 0
- numa_foreign 0 0 0 0
- interleave_hit 7296 7253 7302 7249
- local_node 86601 228602 40138 95576
- other_node 7383 7340 7389 6463
- root@pserver1:~# ./stream
- -------------------------------------------------------------
- STREAM version $Revision: 5.10 $
- -------------------------------------------------------------
- This system uses 8 bytes per array element.
- -------------------------------------------------------------
- Array size = 2500000 (elements), Offset = 0 (elements)
- Memory per array = 19.1 MiB (= 0.0 GiB).
- Total memory required = 57.2 MiB (= 0.1 GiB).
- Each kernel will be executed 10 times.
- The *best* time for each kernel (excluding the first iteration)
- will be used to compute the reported bandwidth.
- -------------------------------------------------------------
- Number of Threads requested = 48
- Number of Threads counted = 48
- -------------------------------------------------------------
- Your clock granularity/precision appears to be 1 microseconds.
- Each test below will take on the order of 5275 microseconds.
- (= 5275 clock ticks)
- Increase the size of the arrays if this shows that
- you are not getting at least 20 clock ticks per test.
- -------------------------------------------------------------
- WARNING -- The above is only a rough guideline.
- For best results, please be sure you know the
- precision of your system timer.
- -------------------------------------------------------------
- Function Best Rate MB/s Avg time Min time Max time
- Copy: 136511.1 0.000321 0.000293 0.000419
- Scale: 141341.3 0.000303 0.000283 0.000361
- Add: 144216.8 0.000532 0.000416 0.001333
- Triad: 145973.5 0.000436 0.000411 0.000498
- -------------------------------------------------------------
- Solution Validates: avg error less than 1.000000e-13 on all three arrays
- -------------------------------------------------------------
Add Comment
Please, Sign In to add comment