SHARE
TWEET

OPi Zero Plus with vendor Xenial image

a guest Nov 15th, 2017 359 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Orange Pi Zero Plus with vendor's own Ubuntu Xenial arm64 image (64-bit kernel 3.10.65, settings limiting cpufreq to 1008 MHz and clocking DRAM at 624 MHz):
  2.  
  3. root@Orangepi:~/tinymembench# ./tinymembench
  4. tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
  5.  
  6. ==========================================================================
  7. == Memory bandwidth tests                                               ==
  8. ==                                                                      ==
  9. == Note 1: 1MB = 1000000 bytes                                          ==
  10. == Note 2: Results for 'copy' tests show how many bytes can be          ==
  11. ==         copied per second (adding together read and writen           ==
  12. ==         bytes would have provided twice higher numbers)              ==
  13. == Note 3: 2-pass copy means that we are using a small temporary buffer ==
  14. ==         to first fetch data into it, and only then write it to the   ==
  15. ==         destination (source -> L1 cache, L1 cache -> destination)    ==
  16. == Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
  17. ==         brackets                                                     ==
  18. ==========================================================================
  19.  
  20.  C copy backwards                                     :    851.7 MB/s (2.0%)
  21.  C copy backwards (32 byte blocks)                    :    861.5 MB/s (1.2%)
  22.  C copy backwards (64 byte blocks)                    :    867.8 MB/s (1.1%)
  23.  C copy                                               :    861.7 MB/s (1.2%)
  24.  C copy prefetched (32 bytes step)                    :    693.5 MB/s
  25.  C copy prefetched (64 bytes step)                    :    775.7 MB/s (0.5%)
  26.  C 2-pass copy                                        :    863.1 MB/s (0.5%)
  27.  C 2-pass copy prefetched (32 bytes step)             :    646.7 MB/s
  28.  C 2-pass copy prefetched (64 bytes step)             :    344.1 MB/s (0.3%)
  29.  C fill                                               :   2037.2 MB/s (0.6%)
  30.  C fill (shuffle within 16 byte blocks)               :   2037.8 MB/s
  31.  C fill (shuffle within 32 byte blocks)               :   2038.4 MB/s (0.6%)
  32.  C fill (shuffle within 64 byte blocks)               :   2036.8 MB/s
  33.  ---
  34.  standard memcpy                                      :    887.9 MB/s (0.6%)
  35.  standard memset                                      :   2037.9 MB/s (0.6%)
  36.  ---
  37.  NEON LDP/STP copy                                    :    871.7 MB/s (0.8%)
  38.  NEON LDP/STP copy pldl2strm (32 bytes step)          :    670.7 MB/s (0.6%)
  39.  NEON LDP/STP copy pldl2strm (64 bytes step)          :    781.7 MB/s (0.3%)
  40.  NEON LDP/STP copy pldl1keep (32 bytes step)          :    932.4 MB/s
  41.  NEON LDP/STP copy pldl1keep (64 bytes step)          :    934.4 MB/s (0.5%)
  42.  NEON LD1/ST1 copy                                    :    860.8 MB/s (1.2%)
  43.  NEON STP fill                                        :   2035.9 MB/s (2.0%)
  44.  NEON STNP fill                                       :   1772.4 MB/s (3.5%)
  45.  ARM LDP/STP copy                                     :    871.1 MB/s (0.7%)
  46.  ARM STP fill                                         :   2036.1 MB/s (0.6%)
  47.  ARM STNP fill                                        :   1780.6 MB/s (2.1%)
  48.  
  49. ==========================================================================
  50. == Framebuffer read tests.                                              ==
  51. ==                                                                      ==
  52. == Many ARM devices use a part of the system memory as the framebuffer, ==
  53. == typically mapped as uncached but with write-combining enabled.       ==
  54. == Writes to such framebuffers are quite fast, but reads are much       ==
  55. == slower and very sensitive to the alignment and the selection of      ==
  56. == CPU instructions which are used for accessing memory.                ==
  57. ==                                                                      ==
  58. == Many x86 systems allocate the framebuffer in the GPU memory,         ==
  59. == accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
  60. == PCI-E is asymmetric and handles reads a lot worse than writes.       ==
  61. ==                                                                      ==
  62. == If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
  63. == or preferably >300 MB/s), then using the shadow framebuffer layer    ==
  64. == is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
  65. == performance improvement. For example, the xf86-video-fbturbo DDX     ==
  66. == uses this trick.                                                     ==
  67. ==========================================================================
  68.  
  69.  NEON LDP/STP copy (from framebuffer)                 :    172.1 MB/s (0.2%)
  70.  NEON LDP/STP 2-pass copy (from framebuffer)          :    164.4 MB/s (2.5%)
  71.  NEON LD1/ST1 copy (from framebuffer)                 :     45.6 MB/s (0.8%)
  72.  NEON LD1/ST1 2-pass copy (from framebuffer)          :     45.0 MB/s (0.1%)
  73.  ARM LDP/STP copy (from framebuffer)                  :     89.9 MB/s (0.4%)
  74.  ARM LDP/STP 2-pass copy (from framebuffer)           :     87.6 MB/s
  75.  
  76. ==========================================================================
  77. == Memory latency test                                                  ==
  78. ==                                                                      ==
  79. == Average time is measured for random memory accesses in the buffers   ==
  80. == of different sizes. The larger is the buffer, the more significant   ==
  81. == are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
  82. == accesses. For extremely large buffer sizes we are expecting to see   ==
  83. == page table walk with several requests to SDRAM for almost every      ==
  84. == memory access (though 64MiB is not nearly large enough to experience ==
  85. == this effect to its fullest).                                         ==
  86. ==                                                                      ==
  87. == Note 1: All the numbers are representing extra time, which needs to  ==
  88. ==         be added to L1 cache latency. The cycle timings for L1 cache ==
  89. ==         latency can be usually found in the processor documentation. ==
  90. == Note 2: Dual random read means that we are simultaneously performing ==
  91. ==         two independent memory accesses at a time. In the case if    ==
  92. ==         the memory subsystem can't handle multiple outstanding       ==
  93. ==         requests, dual random read has the same timings as two       ==
  94. ==         single reads performed one after another.                    ==
  95. ==========================================================================
  96.  
  97. block size : single random read / dual random read
  98.       1024 :    0.0 ns          /     0.0 ns
  99.       2048 :    0.0 ns          /     0.0 ns
  100.       4096 :    0.0 ns          /     0.0 ns
  101.       8192 :    0.0 ns          /     0.0 ns
  102.      16384 :    0.0 ns          /     0.0 ns
  103.      32768 :    0.0 ns          /     0.0 ns
  104.      65536 :    6.9 ns          /    11.7 ns
  105.     131072 :   10.7 ns          /    16.3 ns
  106.     262144 :   13.7 ns          /    19.4 ns
  107.     524288 :   74.2 ns          /    99.8 ns
  108.    1048576 :  122.2 ns          /   177.3 ns
  109.    2097152 :  168.2 ns          /   212.1 ns
  110.    4194304 :  184.3 ns          /   229.6 ns
  111.    8388608 :  197.1 ns          /   246.4 ns
  112.   16777216 :  205.9 ns          /   248.3 ns
  113.   33554432 :  210.5 ns          /   250.2 ns
  114.   67108864 :  218.7 ns          /   255.3 ns
  115.  
  116.  
  117. root@Orangepi:~/tinymembench# /bin/bash /tmp/sysbench.sh
  118. 480:     execution time (avg/stddev):   77.1597/0.00
  119. 648:     execution time (avg/stddev):   56.9540/0.00
  120. 720:     execution time (avg/stddev):   51.3270/0.00
  121. 816:     execution time (avg/stddev):   45.2233/0.00
  122. 912:     execution time (avg/stddev):   40.4356/0.00
  123. 1008:     execution time (avg/stddev):   36.5771/0.00
  124. 1104:     execution time (avg/stddev):   36.5767/0.00
  125. 1152:     execution time (avg/stddev):   36.5724/0.00
  126. 1200:     execution time (avg/stddev):   36.5786/0.00
  127. 480:     execution time (avg/stddev):   19.5235/0.01
  128. 648:     execution time (avg/stddev):   14.4798/0.01
  129. 720:     execution time (avg/stddev):   13.0305/0.00
  130. 816:     execution time (avg/stddev):   11.4151/0.01
  131. 912:     execution time (avg/stddev):   10.3376/0.00
  132. 1008:     execution time (avg/stddev):   9.2395/0.01
  133. 1104:     execution time (avg/stddev):   9.2420/0.01
  134. 1152:     execution time (avg/stddev):   9.2317/0.01
  135. 1200:     execution time (avg/stddev):   9.2291/0.01
  136.  
  137.  
  138. root@Orangepi:~/tinymembench# 7zr b
  139.  
  140. 7-Zip (A) 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
  141. p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,4 CPUs)
  142.  
  143. RAM size:     468 MB,  # CPU hardware threads:   4
  144. RAM usage:    434 MB,  # Benchmark threads:      4
  145.  
  146. Dict        Compressing          |        Decompressing
  147.       Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
  148.        KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS
  149.  
  150. 22:    1324   281    457   1288  |    36532   382    862   3296
  151. 23:    1319   291    462   1344  |    35139   373    861   3215
  152. Killed
  153.  
  154.  
  155. for i in 128 192 256 ; do openssl speed -elapsed -evp aes-${i}-cbc ; done
  156. type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
  157. aes-128-cbc     126688.02k   339073.43k   566900.48k   704829.78k   758764.89k
  158. aes-192-cbc     118457.32k   291691.16k   453099.78k   539112.45k   570556.42k
  159. aes-256-cbc     113399.49k   262801.51k   386358.36k   448731.82k   462544.90k
  160.  
  161.  
  162. root@Orangepi:~/StabilityTester# ./stabilityTester.sh
  163. Testing frequency 648000
  164. Cooling down    CPU Freq: 648000    CPU Core: 1200000  
  165. Testing frequency 720000
  166. Cooling down    CPU Freq: 720000    CPU Core: 1200000  
  167. Testing frequency 816000
  168. Cooling down    CPU Freq: 816000    CPU Core: 1200000  
  169. Testing frequency 912000
  170. Cooling down    CPU Freq: 912000    CPU Core: 1200000  
  171. Testing frequency 1008000
  172. Cooling down    CPU Freq: 1008000   CPU Core: 1200000  
  173. Testing frequency 1104000
  174. ./stabilityTester.sh: line 55: echo: write error: Invalid argument
  175.  
  176.  
  177. root@Orangepi:~/StabilityTester# cat /sys/devices/1c62000.dramfreq/devfreq/dramfreq/cur_freq
  178. 624000
  179.  
  180. root@Orangepi:~/StabilityTester# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
  181. 1008000
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top