Advertisement
shupeif

llamacpp-m2

Nov 7th, 2024
25
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 13.03 KB | None | 0 0
  1. | model | size | params | backend | threads | fa | test | t/s |
  2. | ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | ------------: | -------------------: |
  3. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 1 | 1 | pp64 | 153.97 ± 6.55 |
  4. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 1 | 1 | pp128 | 232.21 ± 0.77 |
  5. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 1 | 1 | pp256 | 272.61 ± 0.27 |
  6. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 1 | 1 | tg64 | 26.45 ± 0.00 |
  7. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 2 | 1 | pp64 | 176.57 ± 0.26 |
  8. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 2 | 1 | pp128 | 265.21 ± 0.28 |
  9. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 2 | 1 | pp256 | 324.88 ± 0.43 |
  10. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 2 | 1 | tg64 | 46.52 ± 0.01 |
  11. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 4 | 1 | pp64 | 156.49 ± 1.24 |
  12. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 4 | 1 | pp128 | 244.99 ± 1.93 |
  13. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 4 | 1 | pp256 | 322.09 ± 3.23 |
  14. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | BLAS | 4 | 1 | tg64 | 70.68 ± 0.21 |
  15. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 1 | 1 | pp64 | 96.27 ± 3.51 |
  16. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 1 | 1 | pp128 | 161.08 ± 0.02 |
  17. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 1 | 1 | pp256 | 215.83 ± 0.04 |
  18. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 1 | 1 | tg64 | 22.52 ± 0.02 |
  19. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 2 | 1 | pp64 | 130.64 ± 0.38 |
  20. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 2 | 1 | pp128 | 211.95 ± 0.41 |
  21. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 2 | 1 | pp256 | 281.87 ± 0.31 |
  22. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 2 | 1 | tg64 | 40.52 ± 0.01 |
  23. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 4 | 1 | pp64 | 109.06 ± 0.16 |
  24. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 4 | 1 | pp128 | 190.29 ± 0.76 |
  25. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 4 | 1 | pp256 | 275.51 ± 0.88 |
  26. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | BLAS | 4 | 1 | tg64 | 62.50 ± 0.31 |
  27. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 1 | 1 | pp64 | 130.07 ± 2.56 |
  28. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 1 | 1 | pp128 | 130.55 ± 0.07 |
  29. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 1 | 1 | pp256 | 126.01 ± 0.10 |
  30. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 1 | 1 | tg64 | 40.52 ± 0.46 |
  31. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 2 | 1 | pp64 | 253.15 ± 0.31 |
  32. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 2 | 1 | pp128 | 252.51 ± 0.60 |
  33. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 2 | 1 | pp256 | 243.98 ± 0.19 |
  34. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 2 | 1 | tg64 | 66.56 ± 0.04 |
  35. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 4 | 1 | pp64 | 468.86 ± 0.43 |
  36. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 4 | 1 | pp128 | 454.57 ± 11.13 |
  37. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 4 | 1 | pp256 | 444.80 ± 1.81 |
  38. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | BLAS | 4 | 1 | tg64 | 77.89 ± 0.33 |
  39.  
  40. build: b6453c3a (4039)
  41.  
  42. | model | size | params | backend | threads | fa | test | t/s |
  43. | ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | ------------: | -------------------: |
  44. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp64 | 62.57 ± 1.23 |
  45. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp128 | 62.23 ± 0.25 |
  46. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp256 | 60.40 ± 0.54 |
  47. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | tg64 | 26.58 ± 0.02 |
  48. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp64 | 120.63 ± 0.12 |
  49. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp128 | 119.39 ± 0.09 |
  50. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp256 | 118.31 ± 0.03 |
  51. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | tg64 | 46.76 ± 0.03 |
  52. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp64 | 227.80 ± 1.40 |
  53. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp128 | 222.03 ± 3.12 |
  54. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp256 | 225.47 ± 0.76 |
  55. | llama 1B Q4_0 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | tg64 | 70.94 ± 0.41 |
  56. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 1 | 1 | pp64 | 31.05 ± 0.22 |
  57. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 1 | 1 | pp128 | 31.14 ± 0.02 |
  58. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 1 | 1 | pp256 | 30.86 ± 0.02 |
  59. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 1 | 1 | tg64 | 22.53 ± 0.01 |
  60. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 2 | 1 | pp64 | 60.70 ± 0.03 |
  61. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 2 | 1 | pp128 | 60.64 ± 0.01 |
  62. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 2 | 1 | pp256 | 60.02 ± 0.02 |
  63. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 2 | 1 | tg64 | 40.74 ± 0.03 |
  64. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 4 | 1 | pp64 | 116.31 ± 0.43 |
  65. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 4 | 1 | pp128 | 116.29 ± 0.31 |
  66. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 4 | 1 | pp256 | 115.24 ± 0.09 |
  67. | llama 1B IQ4_NL - 4.5 bpw | 733.75 MiB | 1.24 B | CPU | 4 | 1 | tg64 | 66.17 ± 0.21 |
  68. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp64 | 135.46 ± 3.95 |
  69. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp128 | 136.77 ± 0.04 |
  70. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp256 | 131.45 ± 0.24 |
  71. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | tg64 | 40.74 ± 0.08 |
  72. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp64 | 266.49 ± 0.17 |
  73. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp128 | 263.95 ± 2.09 |
  74. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp256 | 254.62 ± 0.83 |
  75. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | tg64 | 66.56 ± 0.02 |
  76. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp64 | 504.90 ± 0.23 |
  77. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp128 | 495.84 ± 13.62 |
  78. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp256 | 480.09 ± 3.04 |
  79. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | tg64 | 77.70 ± 0.21 |
  80.  
  81. build: a9e8a9a0 (4033)
  82.  
  83.  
  84. | model | size | params | backend | threads | fa | test | t/s |
  85. | ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | ------------: | -------------------: |
  86. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp64 | 121.95 ± 3.38 |
  87. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp128 | 122.95 ± 0.14 |
  88. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp256 | 118.84 ± 0.04 |
  89. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 1 | 1 | tg64 | 40.83 ± 0.03 |
  90. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp64 | 240.60 ± 0.08 |
  91. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp128 | 239.15 ± 0.04 |
  92. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp256 | 230.74 ± 0.13 |
  93. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 2 | 1 | tg64 | 66.48 ± 0.05 |
  94. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp64 | 452.19 ± 5.66 |
  95. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp128 | 453.66 ± 1.15 |
  96. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp256 | 440.32 ± 0.05 |
  97. | llama 1B Q4_0_4_4 | 727.75 MiB | 1.24 B | CPU | 4 | 1 | tg64 | 78.13 ± 0.71 |
  98. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp64 | 108.30 ± 1.95 |
  99. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp128 | 108.88 ± 0.03 |
  100. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 1 | 1 | pp256 | 105.11 ± 0.65 |
  101. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 1 | 1 | tg64 | 34.85 ± 0.03 |
  102. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp64 | 212.50 ± 0.52 |
  103. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp128 | 211.68 ± 0.12 |
  104. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 2 | 1 | pp256 | 205.00 ± 0.07 |
  105. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 2 | 1 | tg64 | 62.39 ± 0.06 |
  106. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp64 | 404.10 ± 0.13 |
  107. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp128 | 399.49 ± 4.20 |
  108. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 4 | 1 | pp256 | 388.26 ± 3.66 |
  109. | llama 1B IQ4_NL_4_4 - 4.5 bpw | 727.75 MiB | 1.24 B | CPU | 4 | 1 | tg64 | 76.75 ± 1.06 |
  110.  
  111. build: 32e0862a (4037)
  112.  
  113.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement