stream on pserver1

root@pserver1:~# ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 2500000 (elements), Offset = 0 (elements)
Memory per array = 19.1 MiB (= 0.0 GiB).
Total memory required = 57.2 MiB (= 0.1 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 5883 microseconds.
   (= 5883 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           31034.4     0.001320     0.001289     0.001436
Scale:          17912.9     0.002265     0.002233     0.002377
Add:            19867.2     0.003085     0.003020     0.003259
Triad:          20100.5     0.003202     0.002985     0.003579
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
root@pserver1:~# numastat
                           node0           node1           node2           node3
numa_hit                  426500          786579          163910          896998
numa_miss                      0               0               0               0
numa_foreign                   0               0               0               0
interleave_hit              7305            7252            7313            7248
local_node                425470          779319          156510          889659
other_node                  1030            7260            7400            7339

                           node4           node5           node6           node7
numa_hit                   93984          235942           47527          102039
numa_miss                      0               0               0               0
numa_foreign                   0               0               0               0
interleave_hit              7296            7253            7302            7249
local_node                 86601          228602           40138           95576
other_node                  7383            7340            7389            6463
root@pserver1:~# ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 2500000 (elements), Offset = 0 (elements)
Memory per array = 19.1 MiB (= 0.0 GiB).
Total memory required = 57.2 MiB (= 0.1 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 5275 microseconds.
   (= 5275 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          136511.1     0.000321     0.000293     0.000419
Scale:         141341.3     0.000303     0.000283     0.000361
Add:           144216.8     0.000532     0.000416     0.001333
Triad:         145973.5     0.000436     0.000411     0.000498
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------