Advertisement
Guest User

tinymembench Haswell i7-4700MQ DDR3 1600 MHz

a guest
Apr 23rd, 2017
1,051
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.15 KB | None | 0 0
  1. tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
  2.  
  3. ==========================================================================
  4. == Memory bandwidth tests ==
  5. == ==
  6. == Note 1: 1MB = 1000000 bytes ==
  7. == Note 2: Results for 'copy' tests show how many bytes can be ==
  8. == copied per second (adding together read and writen ==
  9. == bytes would have provided twice higher numbers) ==
  10. == Note 3: 2-pass copy means that we are using a small temporary buffer ==
  11. == to first fetch data into it, and only then write it to the ==
  12. == destination (source -> L1 cache, L1 cache -> destination) ==
  13. == Note 4: If sample standard deviation exceeds 0.1%, it is shown in ==
  14. == brackets ==
  15. ==========================================================================
  16.  
  17. C copy backwards : 6799.8 MB/s (1.3%)
  18. C copy backwards (32 byte blocks) : 6787.4 MB/s (1.1%)
  19. C copy backwards (64 byte blocks) : 6817.8 MB/s (0.5%)
  20. C copy : 6777.8 MB/s (0.4%)
  21. C copy prefetched (32 bytes step) : 6755.4 MB/s (0.6%)
  22. C copy prefetched (64 bytes step) : 6755.4 MB/s (0.5%)
  23. C 2-pass copy : 6346.9 MB/s (0.4%)
  24. C 2-pass copy prefetched (32 bytes step) : 6542.7 MB/s (0.3%)
  25. C 2-pass copy prefetched (64 bytes step) : 6546.1 MB/s (0.7%)
  26. C fill : 10658.5 MB/s (0.6%)
  27. C fill (shuffle within 16 byte blocks) : 10642.1 MB/s (0.6%)
  28. C fill (shuffle within 32 byte blocks) : 10603.8 MB/s (0.3%)
  29. C fill (shuffle within 64 byte blocks) : 10644.7 MB/s (0.3%)
  30. ---
  31. standard memcpy : 10487.3 MB/s (0.5%)
  32. standard memset : 26842.2 MB/s (0.9%)
  33. ---
  34. MOVSB copy : 9393.9 MB/s (0.2%)
  35. MOVSD copy : 9155.0 MB/s (1.6%)
  36. SSE2 copy : 6780.5 MB/s (0.4%)
  37. SSE2 nontemporal copy : 10688.2 MB/s (0.3%)
  38. SSE2 copy prefetched (32 bytes step) : 6751.9 MB/s (0.4%)
  39. SSE2 copy prefetched (64 bytes step) : 6744.2 MB/s (0.5%)
  40. SSE2 nontemporal copy prefetched (32 bytes step) : 10707.7 MB/s (1.3%)
  41. SSE2 nontemporal copy prefetched (64 bytes step) : 10698.8 MB/s (1.3%)
  42. SSE2 2-pass copy : 6457.4 MB/s (0.5%)
  43. SSE2 2-pass copy prefetched (32 bytes step) : 6373.7 MB/s (0.4%)
  44. SSE2 2-pass copy prefetched (64 bytes step) : 6358.8 MB/s (0.5%)
  45. SSE2 2-pass nontemporal copy : 4915.7 MB/s (0.3%)
  46. SSE2 fill : 10525.7 MB/s (0.5%)
  47. SSE2 nontemporal fill : 19563.3 MB/s
  48.  
  49. ==========================================================================
  50. == Framebuffer read tests. ==
  51. == ==
  52. == Many ARM devices use a part of the system memory as the framebuffer, ==
  53. == typically mapped as uncached but with write-combining enabled. ==
  54. == Writes to such framebuffers are quite fast, but reads are much ==
  55. == slower and very sensitive to the alignment and the selection of ==
  56. == CPU instructions which are used for accessing memory. ==
  57. == ==
  58. == Many x86 systems allocate the framebuffer in the GPU memory, ==
  59. == accessible for the CPU via a relatively slow PCI-E bus. Moreover, ==
  60. == PCI-E is asymmetric and handles reads a lot worse than writes. ==
  61. == ==
  62. == If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
  63. == or preferably >300 MB/s), then using the shadow framebuffer layer ==
  64. == is not necessary in Xorg DDX drivers, resulting in a nice overall ==
  65. == performance improvement. For example, the xf86-video-fbturbo DDX ==
  66. == uses this trick. ==
  67. ==========================================================================
  68.  
  69. MOVSD copy (from framebuffer) : 206.2 MB/s (1.6%)
  70. MOVSD 2-pass copy (from framebuffer) : 225.8 MB/s
  71. SSE2 copy (from framebuffer) : 140.1 MB/s (0.1%)
  72. SSE2 2-pass copy (from framebuffer) : 139.5 MB/s
  73.  
  74. ==========================================================================
  75. == Memory latency test ==
  76. == ==
  77. == Average time is measured for random memory accesses in the buffers ==
  78. == of different sizes. The larger is the buffer, the more significant ==
  79. == are relative contributions of TLB, L1/L2 cache misses and SDRAM ==
  80. == accesses. For extremely large buffer sizes we are expecting to see ==
  81. == page table walk with several requests to SDRAM for almost every ==
  82. == memory access (though 64MiB is not nearly large enough to experience ==
  83. == this effect to its fullest). ==
  84. == ==
  85. == Note 1: All the numbers are representing extra time, which needs to ==
  86. == be added to L1 cache latency. The cycle timings for L1 cache ==
  87. == latency can be usually found in the processor documentation. ==
  88. == Note 2: Dual random read means that we are simultaneously performing ==
  89. == two independent memory accesses at a time. In the case if ==
  90. == the memory subsystem can't handle multiple outstanding ==
  91. == requests, dual random read has the same timings as two ==
  92. == single reads performed one after another. ==
  93. ==========================================================================
  94.  
  95. block size : single random read / dual random read, [MADV_NOHUGEPAGE]
  96. 1024 : 0.0 ns / 0.0 ns
  97. 2048 : 0.0 ns / 0.0 ns
  98. 4096 : 0.0 ns / 0.0 ns
  99. 8192 : 0.0 ns / 0.0 ns
  100. 16384 : 0.0 ns / 0.0 ns
  101. 32768 : 0.0 ns / 0.1 ns
  102. 65536 : 1.1 ns / 1.5 ns
  103. 131072 : 1.6 ns / 1.9 ns
  104. 262144 : 2.3 ns / 3.0 ns
  105. 524288 : 6.7 ns / 8.6 ns
  106. 1048576 : 9.1 ns / 10.6 ns
  107. 2097152 : 10.4 ns / 11.3 ns
  108. 4194304 : 12.0 ns / 12.9 ns
  109. 8388608 : 30.7 ns / 45.7 ns
  110. 16777216 : 53.5 ns / 72.7 ns
  111. 33554432 : 67.0 ns / 83.2 ns
  112. 67108864 : 72.8 ns / 89.5 ns
  113.  
  114. block size : single random read / dual random read, [MADV_HUGEPAGE]
  115. 1024 : 0.0 ns / 0.0 ns
  116. 2048 : 0.0 ns / 0.0 ns
  117. 4096 : 0.0 ns / 0.0 ns
  118. 8192 : 0.0 ns / 0.0 ns
  119. 16384 : 0.0 ns / 0.0 ns
  120. 32768 : 0.0 ns / 0.0 ns
  121. 65536 : 1.1 ns / 1.5 ns
  122. 131072 : 1.6 ns / 2.0 ns
  123. 262144 : 1.9 ns / 2.1 ns
  124. 524288 : 5.5 ns / 7.3 ns
  125. 1048576 : 7.3 ns / 8.7 ns
  126. 2097152 : 8.3 ns / 9.1 ns
  127. 4194304 : 8.7 ns / 9.2 ns
  128. 8388608 : 25.2 ns / 37.2 ns
  129. 16777216 : 47.7 ns / 63.7 ns
  130. 33554432 : 58.6 ns / 71.8 ns
  131. 67108864 : 63.8 ns / 74.7 ns
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement