Advertisement
Guest User

NanoPC T4 with 4.17.0-rc6 mmind kernel

a guest
Jun 17th, 2018
336
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 14.12 KB | None | 0 0
  1. root@nanopct4:~/tinymembench# taskset -c 5 ./tinymembench ; taskset -c 3 ./tinymembench
  2. tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
  3.  
  4. ==========================================================================
  5. == Memory bandwidth tests ==
  6. == ==
  7. == Note 1: 1MB = 1000000 bytes ==
  8. == Note 2: Results for 'copy' tests show how many bytes can be ==
  9. == copied per second (adding together read and writen ==
  10. == bytes would have provided twice higher numbers) ==
  11. == Note 3: 2-pass copy means that we are using a small temporary buffer ==
  12. == to first fetch data into it, and only then write it to the ==
  13. == destination (source -> L1 cache, L1 cache -> destination) ==
  14. == Note 4: If sample standard deviation exceeds 0.1%, it is shown in ==
  15. == brackets ==
  16. ==========================================================================
  17.  
  18. C copy backwards : 4087.7 MB/s
  19. C copy backwards (32 byte blocks) : 4089.3 MB/s
  20. C copy backwards (64 byte blocks) : 4089.2 MB/s
  21. C copy : 4091.0 MB/s
  22. C copy prefetched (32 bytes step) : 4067.9 MB/s
  23. C copy prefetched (64 bytes step) : 4065.3 MB/s
  24. C 2-pass copy : 3595.6 MB/s
  25. C 2-pass copy prefetched (32 bytes step) : 3771.6 MB/s
  26. C 2-pass copy prefetched (64 bytes step) : 3762.5 MB/s
  27. ^C
  28. root@nanopct4:~/tinymembench# cpufreq-set -g performance
  29. root@nanopct4:~/tinymembench# taskset -c 5 ./tinymembench ; taskset -c 3 ./tinymembench
  30. tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
  31.  
  32. ==========================================================================
  33. == Memory bandwidth tests ==
  34. == ==
  35. == Note 1: 1MB = 1000000 bytes ==
  36. == Note 2: Results for 'copy' tests show how many bytes can be ==
  37. == copied per second (adding together read and writen ==
  38. == bytes would have provided twice higher numbers) ==
  39. == Note 3: 2-pass copy means that we are using a small temporary buffer ==
  40. == to first fetch data into it, and only then write it to the ==
  41. == destination (source -> L1 cache, L1 cache -> destination) ==
  42. == Note 4: If sample standard deviation exceeds 0.1%, it is shown in ==
  43. == brackets ==
  44. ==========================================================================
  45.  
  46. C copy backwards : 4089.0 MB/s
  47. C copy backwards (32 byte blocks) : 4089.9 MB/s
  48. C copy backwards (64 byte blocks) : 4091.0 MB/s
  49. C copy : 4094.3 MB/s
  50. C copy prefetched (32 bytes step) : 4070.8 MB/s
  51. C copy prefetched (64 bytes step) : 4068.3 MB/s
  52. C 2-pass copy : 3597.1 MB/s
  53. C 2-pass copy prefetched (32 bytes step) : 3776.2 MB/s
  54. C 2-pass copy prefetched (64 bytes step) : 3764.9 MB/s
  55. C fill : 9035.1 MB/s (0.7%)
  56. C fill (shuffle within 16 byte blocks) : 9073.4 MB/s (0.2%)
  57. C fill (shuffle within 32 byte blocks) : 9079.7 MB/s
  58. C fill (shuffle within 64 byte blocks) : 9079.3 MB/s
  59. ---
  60. standard memcpy : 4094.2 MB/s
  61. standard memset : 9033.6 MB/s (0.7%)
  62. ---
  63. NEON LDP/STP copy : 4093.2 MB/s
  64. NEON LDP/STP copy pldl2strm (32 bytes step) : 4134.2 MB/s
  65. NEON LDP/STP copy pldl2strm (64 bytes step) : 4134.0 MB/s
  66. NEON LDP/STP copy pldl1keep (32 bytes step) : 4064.0 MB/s
  67. NEON LDP/STP copy pldl1keep (64 bytes step) : 4061.0 MB/s
  68. NEON LD1/ST1 copy : 4092.6 MB/s
  69. NEON STP fill : 9028.1 MB/s (0.7%)
  70. NEON STNP fill : 9035.0 MB/s
  71. ARM LDP/STP copy : 4093.3 MB/s
  72. ARM STP fill : 9031.8 MB/s (0.7%)
  73. ARM STNP fill : 9031.8 MB/s
  74.  
  75. ==========================================================================
  76. == Memory latency test ==
  77. == ==
  78. == Average time is measured for random memory accesses in the buffers ==
  79. == of different sizes. The larger is the buffer, the more significant ==
  80. == are relative contributions of TLB, L1/L2 cache misses and SDRAM ==
  81. == accesses. For extremely large buffer sizes we are expecting to see ==
  82. == page table walk with several requests to SDRAM for almost every ==
  83. == memory access (though 64MiB is not nearly large enough to experience ==
  84. == this effect to its fullest). ==
  85. == ==
  86. == Note 1: All the numbers are representing extra time, which needs to ==
  87. == be added to L1 cache latency. The cycle timings for L1 cache ==
  88. == latency can be usually found in the processor documentation. ==
  89. == Note 2: Dual random read means that we are simultaneously performing ==
  90. == two independent memory accesses at a time. In the case if ==
  91. == the memory subsystem can't handle multiple outstanding ==
  92. == requests, dual random read has the same timings as two ==
  93. == single reads performed one after another. ==
  94. ==========================================================================
  95.  
  96. block size : single random read / dual random read, [MADV_NOHUGEPAGE]
  97. 1024 : 0.0 ns / 0.0 ns
  98. 2048 : 0.0 ns / 0.0 ns
  99. 4096 : 0.0 ns / 0.0 ns
  100. 8192 : 0.0 ns / 0.0 ns
  101. 16384 : 0.0 ns / 0.0 ns
  102. 32768 : 0.0 ns / 0.0 ns
  103. 65536 : 4.5 ns / 7.1 ns
  104. 131072 : 6.8 ns / 9.6 ns
  105. 262144 : 9.8 ns / 12.8 ns
  106. 524288 : 11.3 ns / 14.6 ns
  107. 1048576 : 16.2 ns / 23.1 ns
  108. 2097152 : 93.2 ns / 140.0 ns
  109. 4194304 : 130.7 ns / 174.1 ns
  110. 8388608 : 155.1 ns / 194.7 ns
  111. 16777216 : 166.9 ns / 199.3 ns
  112. 33554432 : 173.6 ns / 206.7 ns
  113. 67108864 : 185.0 ns / 225.6 ns
  114.  
  115. block size : single random read / dual random read, [MADV_HUGEPAGE]
  116. 1024 : 0.0 ns / 0.0 ns
  117. 2048 : 0.0 ns / 0.0 ns
  118. 4096 : 0.0 ns / 0.0 ns
  119. 8192 : 0.0 ns / 0.0 ns
  120. 16384 : 0.0 ns / 0.0 ns
  121. 32768 : 0.0 ns / 0.0 ns
  122. 65536 : 4.5 ns / 7.1 ns
  123. 131072 : 6.7 ns / 9.6 ns
  124. 262144 : 7.9 ns / 10.5 ns
  125. 524288 : 8.5 ns / 10.9 ns
  126. 1048576 : 12.4 ns / 17.8 ns
  127. 2097152 : 85.7 ns / 130.5 ns
  128. 4194304 : 122.5 ns / 163.8 ns
  129. 8388608 : 144.1 ns / 182.2 ns
  130. 16777216 : 152.9 ns / 185.3 ns
  131. 33554432 : 156.9 ns / 185.1 ns
  132. 67108864 : 161.6 ns / 187.2 ns
  133. tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
  134.  
  135. ==========================================================================
  136. == Memory bandwidth tests ==
  137. == ==
  138. == Note 1: 1MB = 1000000 bytes ==
  139. == Note 2: Results for 'copy' tests show how many bytes can be ==
  140. == copied per second (adding together read and writen ==
  141. == bytes would have provided twice higher numbers) ==
  142. == Note 3: 2-pass copy means that we are using a small temporary buffer ==
  143. == to first fetch data into it, and only then write it to the ==
  144. == destination (source -> L1 cache, L1 cache -> destination) ==
  145. == Note 4: If sample standard deviation exceeds 0.1%, it is shown in ==
  146. == brackets ==
  147. ==========================================================================
  148.  
  149. C copy backwards : 1886.1 MB/s (1.6%)
  150. C copy backwards (32 byte blocks) : 1897.8 MB/s (3.4%)
  151. C copy backwards (64 byte blocks) : 1886.0 MB/s (2.8%)
  152. C copy : 2012.7 MB/s (0.7%)
  153. C copy prefetched (32 bytes step) : 1410.2 MB/s
  154. C copy prefetched (64 bytes step) : 1624.7 MB/s
  155. C 2-pass copy : 1615.8 MB/s
  156. C 2-pass copy prefetched (32 bytes step) : 1163.1 MB/s
  157. C 2-pass copy prefetched (64 bytes step) : 1142.3 MB/s
  158. C fill : 8427.2 MB/s (0.1%)
  159. C fill (shuffle within 16 byte blocks) : 8425.9 MB/s
  160. C fill (shuffle within 32 byte blocks) : 8424.7 MB/s
  161. C fill (shuffle within 64 byte blocks) : 8424.9 MB/s
  162. ---
  163. standard memcpy : 2031.3 MB/s
  164. standard memset : 8449.6 MB/s (0.2%)
  165. ---
  166. NEON LDP/STP copy : 2051.3 MB/s (0.2%)
  167. NEON LDP/STP copy pldl2strm (32 bytes step) : 1358.5 MB/s (1.3%)
  168. NEON LDP/STP copy pldl2strm (64 bytes step) : 1716.7 MB/s
  169. NEON LDP/STP copy pldl1keep (32 bytes step) : 2230.4 MB/s
  170. NEON LDP/STP copy pldl1keep (64 bytes step) : 2233.0 MB/s
  171. NEON LD1/ST1 copy : 2022.6 MB/s (2.1%)
  172. NEON STP fill : 8450.9 MB/s (0.2%)
  173. NEON STNP fill : 3514.7 MB/s (3.8%)
  174. ARM LDP/STP copy : 2048.3 MB/s (0.3%)
  175. ARM STP fill : 8449.6 MB/s (0.2%)
  176. ARM STNP fill : 3365.4 MB/s (2.9%)
  177.  
  178. ==========================================================================
  179. == Memory latency test ==
  180. == ==
  181. == Average time is measured for random memory accesses in the buffers ==
  182. == of different sizes. The larger is the buffer, the more significant ==
  183. == are relative contributions of TLB, L1/L2 cache misses and SDRAM ==
  184. == accesses. For extremely large buffer sizes we are expecting to see ==
  185. == page table walk with several requests to SDRAM for almost every ==
  186. == memory access (though 64MiB is not nearly large enough to experience ==
  187. == this effect to its fullest). ==
  188. == ==
  189. == Note 1: All the numbers are representing extra time, which needs to ==
  190. == be added to L1 cache latency. The cycle timings for L1 cache ==
  191. == latency can be usually found in the processor documentation. ==
  192. == Note 2: Dual random read means that we are simultaneously performing ==
  193. == two independent memory accesses at a time. In the case if ==
  194. == the memory subsystem can't handle multiple outstanding ==
  195. == requests, dual random read has the same timings as two ==
  196. == single reads performed one after another. ==
  197. ==========================================================================
  198.  
  199. block size : single random read / dual random read, [MADV_NOHUGEPAGE]
  200. 1024 : 0.0 ns / 0.0 ns
  201. 2048 : 0.0 ns / 0.0 ns
  202. 4096 : 0.0 ns / 0.0 ns
  203. 8192 : 0.0 ns / 0.0 ns
  204. 16384 : 0.0 ns / 0.0 ns
  205. 32768 : 0.0 ns / 0.0 ns
  206. 65536 : 4.8 ns / 8.2 ns
  207. 131072 : 7.4 ns / 11.1 ns
  208. 262144 : 8.7 ns / 12.2 ns
  209. 524288 : 9.8 ns / 13.8 ns
  210. 1048576 : 79.1 ns / 122.1 ns
  211. 2097152 : 116.9 ns / 157.6 ns
  212. 4194304 : 141.7 ns / 177.4 ns
  213. 8388608 : 155.1 ns / 187.1 ns
  214. 16777216 : 163.2 ns / 193.6 ns
  215. 33554432 : 167.3 ns / 197.8 ns
  216. 67108864 : 169.3 ns / 201.7 ns
  217.  
  218. block size : single random read / dual random read, [MADV_HUGEPAGE]
  219. 1024 : 0.0 ns / 0.0 ns
  220. 2048 : 0.0 ns / 0.0 ns
  221. 4096 : 0.0 ns / 0.0 ns
  222. 8192 : 0.0 ns / 0.0 ns
  223. 16384 : 0.0 ns / 0.0 ns
  224. 32768 : 0.0 ns / 0.0 ns
  225. 65536 : 4.8 ns / 8.0 ns
  226. 131072 : 7.4 ns / 11.1 ns
  227. 262144 : 8.7 ns / 12.2 ns
  228. 524288 : 9.8 ns / 13.6 ns
  229. 1048576 : 79.1 ns / 122.0 ns
  230. 2097152 : 116.7 ns / 157.4 ns
  231. 4194304 : 135.5 ns / 169.1 ns
  232. 8388608 : 144.7 ns / 173.2 ns
  233. 16777216 : 149.6 ns / 174.7 ns
  234. 33554432 : 152.0 ns / 175.4 ns
  235. 67108864 : 153.3 ns / 175.6 ns
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement