Guest User

Logs_Llama-3.1-8B-German-ORPO-8bpw-h8-exl2

a guest
Sep 22nd, 2024
41
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 13.70 KB | None | 0 0
  1. python run_openai.py --url http://127.0.0.1:5000/v1 --model Llama-3.1-8B-Instruct-exl2-8bpw-h8
  2. 2024-09
  3. {
  4. "comment": "",
  5. "server": {
  6. "url": "http://127.0.0.1:5000/v1",
  7. "model": "Llama-3.1-8B-Instruct-exl2-8bpw-h8",
  8. "timeout": 600.0
  9. },
  10. "inference": {
  11. "temperature": 0.0,
  12. "top_p": 1.0,
  13. "max_tokens": 2048,
  14. "system_prompt": "The following are multiple choice questions (with answers) about {subject}. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.",
  15. "style": "multi_chat"
  16. },
  17. "test": {
  18. "parallel": 1
  19. },
  20. "log": {
  21. "verbosity": 0,
  22. "log_prompt": true
  23. }
  24. }
  25. assigned subjects ['biology', 'business', 'chemistry', 'computer science', 'economics', 'engineering', 'health', 'history', 'law', 'math', 'philosophy', 'physics', 'psychology', 'other']
  26. Testing biology...
  27. 100%|##############################################################################| 717/717 [4:54:58<00:00, 24.68s/it]
  28. Finished testing biology in 4 hours, 54 minutes, 58 seconds.
  29. Total, 457/717, 63.74%
  30. Random Guess Attempts, 20/717, 2.79%
  31. Correct Random Guesses, 2/20, 10.00%
  32. Adjusted Score Without Random Guesses, 455/697, 65.28%
  33. Testing business...
  34. 100%|##############################################################################| 789/789 [5:13:47<00:00, 23.86s/it]
  35. Finished testing business in 5 hours, 13 minutes, 47 seconds.
  36. Total, 392/789, 49.68%
  37. Random Guess Attempts, 78/789, 9.89%
  38. Correct Random Guesses, 8/78, 10.26%
  39. Adjusted Score Without Random Guesses, 384/711, 54.01%
  40. Testing chemistry...
  41. 100%|############################################################################| 1132/1132 [7:55:15<00:00, 25.19s/it]
  42. Finished testing chemistry in 7 hours, 55 minutes, 15 seconds.
  43. Total, 418/1132, 36.93%
  44. Random Guess Attempts, 87/1132, 7.69%
  45. Correct Random Guesses, 6/87, 6.90%
  46. Adjusted Score Without Random Guesses, 412/1045, 39.43%
  47. Testing computer science...
  48. 100%|##############################################################################| 410/410 [2:44:56<00:00, 24.14s/it]
  49. Finished testing computer science in 2 hours, 44 minutes, 56 seconds.
  50. Total, 198/410, 48.29%
  51. Random Guess Attempts, 9/410, 2.20%
  52. Correct Random Guesses, 1/9, 11.11%
  53. Adjusted Score Without Random Guesses, 197/401, 49.13%
  54. Testing economics...
  55. 100%|##############################################################################| 844/844 [5:19:23<00:00, 22.71s/it]
  56. Finished testing economics in 5 hours, 19 minutes, 24 seconds.
  57. Total, 471/844, 55.81%
  58. Random Guess Attempts, 17/844, 2.01%
  59. Correct Random Guesses, 1/17, 5.88%
  60. Adjusted Score Without Random Guesses, 470/827, 56.83%
  61. Testing engineering...
  62. 100%|##############################################################################| 969/969 [6:38:31<00:00, 24.68s/it]
  63. Finished testing engineering in 6 hours, 38 minutes, 31 seconds.
  64. Total, 277/969, 28.59%
  65. Random Guess Attempts, 47/969, 4.85%
  66. Correct Random Guesses, 10/47, 21.28%
  67. Adjusted Score Without Random Guesses, 267/922, 28.96%
  68. Testing health...
  69. 100%|##############################################################################| 818/818 [5:23:08<00:00, 23.70s/it]
  70. Finished testing health in 5 hours, 23 minutes, 8 seconds.
  71. Total, 432/818, 52.81%
  72. Random Guess Attempts, 7/818, 0.86%
  73. Correct Random Guesses, 1/7, 14.29%
  74. Adjusted Score Without Random Guesses, 431/811, 53.14%
  75. Testing history...
  76. 100%|##############################################################################| 381/381 [2:40:14<00:00, 25.23s/it]
  77. Finished testing history in 2 hours, 40 minutes, 14 seconds.
  78. Total, 174/381, 45.67%
  79. Random Guess Attempts, 1/381, 0.26%
  80. Correct Random Guesses, 0/1, 0.00%
  81. Adjusted Score Without Random Guesses, 174/380, 45.79%
  82. Testing law...
  83. 100%|############################################################################| 1101/1101 [7:33:56<00:00, 24.74s/it]
  84. Finished testing law in 7 hours, 33 minutes, 56 seconds.
  85. Total, 339/1101, 30.79%
  86. Random Guess Attempts, 7/1101, 0.64%
  87. Correct Random Guesses, 0/7, 0.00%
  88. Adjusted Score Without Random Guesses, 339/1094, 30.99%
  89. Testing math...
  90. 100%|############################################################################| 1351/1351 [8:59:26<00:00, 23.96s/it]
  91. Finished testing math in 8 hours, 59 minutes, 26 seconds.
  92. Total, 609/1351, 45.08%
  93. Random Guess Attempts, 219/1351, 16.21%
  94. Correct Random Guesses, 18/219, 8.22%
  95. Adjusted Score Without Random Guesses, 591/1132, 52.21%
  96. Testing philosophy...
  97. 100%|##############################################################################| 499/499 [3:15:43<00:00, 23.53s/it]
  98. Finished testing philosophy in 3 hours, 15 minutes, 43 seconds.
  99. Total, 202/499, 40.48%
  100. Random Guess Attempts, 11/499, 2.20%
  101. Correct Random Guesses, 2/11, 18.18%
  102. Adjusted Score Without Random Guesses, 200/488, 40.98%
  103. Testing physics...
  104. 100%|############################################################################| 1299/1299 [7:10:37<00:00, 19.89s/it]
  105. Finished testing physics in 7 hours, 10 minutes, 38 seconds.
  106. Total, 507/1299, 39.03%
  107. Random Guess Attempts, 80/1299, 6.16%
  108. Correct Random Guesses, 8/80, 10.00%
  109. Adjusted Score Without Random Guesses, 499/1219, 40.94%
  110. Testing psychology...
  111. 100%|##############################################################################| 798/798 [5:27:07<00:00, 24.60s/it]
  112. Finished testing psychology in 5 hours, 27 minutes, 7 seconds.
  113. Total, 486/798, 60.90%
  114. Random Guess Attempts, 0/798, 0.00%
  115. Correct Random Guesses, division by zero error
  116. Adjusted Score Without Random Guesses, 486/798, 60.90%
  117. Testing other...
  118. 100%|##############################################################################| 924/924 [6:26:46<00:00, 25.12s/it]
  119. Finished testing other in 6 hours, 26 minutes, 47 seconds.
  120. Total, 447/924, 48.38%
  121. Random Guess Attempts, 11/924, 1.19%
  122. Correct Random Guesses, 3/11, 27.27%
  123. Adjusted Score Without Random Guesses, 444/913, 48.63%
  124. Finished the benchmark in 7 hours, 43 minutes, 58 seconds.
  125. Total, 5409/12032, 44.96%
  126. Random Guess Attempts, 594/12032, 4.94%
  127. Correct Random Guesses, 60/594, 10.10%
  128. Adjusted Score Without Random Guesses, 5349/11438, 46.77%
  129. Token Usage:
  130. Prompt tokens: min 913, average 1394, max 2669, total 16778053, tk/s 58.45
  131. Completion tokens: min 41, average 1643, max 2049, total 19766691, tk/s 68.86
  132. Markdown Table:
  133. | overall | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other |
  134. | ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | --- | ---- | ---------- | ------- | ---------- | ----- |
  135. | 44.96 | 63.74 | 49.68 | 36.93 | 48.29 | 55.81 | 28.59 | 52.81 | 45.67 | 30.79 | 45.08 | 40.48 | 39.03 | 60.90 | 48.38 |
  136.  
  137.  
  138.  
  139. ---
  140.  
  141.  
  142. -python run_openai.py --url http://127.0.0.1:5000/v1 --model 1_Ll
  143. ama-3.1-8B-German-ORPO-8.0bpw-h8-exl2
  144. 2024-09
  145. {
  146. "comment": "",
  147. "server": {
  148. "url": "http://127.0.0.1:5000/v1",
  149. "model": "1_Llama-3.1-8B-German-ORPO-8.0bpw-h8-exl2",
  150. "timeout": 600.0
  151. },
  152. "inference": {
  153. "temperature": 0.0,
  154. "top_p": 1.0,
  155. "max_tokens": 2048,
  156. "system_prompt": "The following are multiple choice questions (with answers) about {subject}. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.",
  157. "style": "multi_chat"
  158. },
  159. "test": {
  160. "parallel": 1
  161. },
  162. "log": {
  163. "verbosity": 0,
  164. "log_prompt": true
  165. }
  166. }
  167. assigned subjects ['biology', 'business', 'chemistry', 'computer science', 'economics', 'engineering', 'health', 'history', 'law', 'math', 'philosophy', 'physics', 'psychology', 'other']
  168. Testing biology...
  169. 100%|##############################################################################| 717/717 [4:55:16<00:00, 24.71s/it]
  170. Finished testing biology in 4 hours, 55 minutes, 16 seconds.
  171. Total, 436/717, 60.81%
  172. Random Guess Attempts, 15/717, 2.09%
  173. Correct Random Guesses, 1/15, 6.67%
  174. Adjusted Score Without Random Guesses, 435/702, 61.97%
  175. Testing business...
  176. 100%|##############################################################################| 789/789 [5:21:54<00:00, 24.48s/it]
  177. Finished testing business in 5 hours, 21 minutes, 54 seconds.
  178. Total, 294/789, 37.26%
  179. Random Guess Attempts, 23/789, 2.92%
  180. Correct Random Guesses, 4/23, 17.39%
  181. Adjusted Score Without Random Guesses, 290/766, 37.86%
  182. Testing chemistry...
  183. 100%|############################################################################| 1132/1132 [7:59:51<00:00, 25.43s/it]
  184. Finished testing chemistry in 7 hours, 59 minutes, 51 seconds.
  185. Total, 372/1132, 32.86%
  186. Random Guess Attempts, 29/1132, 2.56%
  187. Correct Random Guesses, 5/29, 17.24%
  188. Adjusted Score Without Random Guesses, 367/1103, 33.27%
  189. Testing computer science...
  190. 100%|##############################################################################| 410/410 [2:52:45<00:00, 25.28s/it]
  191. Finished testing computer science in 2 hours, 52 minutes, 45 seconds.
  192. Total, 159/410, 38.78%
  193. Random Guess Attempts, 4/410, 0.98%
  194. Correct Random Guesses, 0/4, 0.00%
  195. Adjusted Score Without Random Guesses, 159/406, 39.16%
  196. Testing economics...
  197. 100%|##############################################################################| 844/844 [6:02:45<00:00, 25.79s/it]
  198. Finished testing economics in 6 hours, 2 minutes, 45 seconds.
  199. Total, 391/844, 46.33%
  200. Random Guess Attempts, 21/844, 2.49%
  201. Correct Random Guesses, 4/21, 19.05%
  202. Adjusted Score Without Random Guesses, 387/823, 47.02%
  203. Testing engineering...
  204. 100%|##############################################################################| 969/969 [6:58:06<00:00, 25.89s/it]
  205. Finished testing engineering in 6 hours, 58 minutes, 7 seconds.
  206. Total, 226/969, 23.32%
  207. Random Guess Attempts, 34/969, 3.51%
  208. Correct Random Guesses, 3/34, 8.82%
  209. Adjusted Score Without Random Guesses, 223/935, 23.85%
  210. Testing health...
  211. 100%|##############################################################################| 818/818 [5:34:26<00:00, 24.53s/it]
  212. Finished testing health in 5 hours, 34 minutes, 26 seconds.
  213. Total, 372/818, 45.48%
  214. Random Guess Attempts, 12/818, 1.47%
  215. Correct Random Guesses, 1/12, 8.33%
  216. Adjusted Score Without Random Guesses, 371/806, 46.03%
  217. Testing history...
  218. 100%|##############################################################################| 381/381 [2:38:36<00:00, 24.98s/it]
  219. Finished testing history in 2 hours, 38 minutes, 36 seconds.
  220. Total, 152/381, 39.90%
  221. Random Guess Attempts, 2/381, 0.52%
  222. Correct Random Guesses, 1/2, 50.00%
  223. Adjusted Score Without Random Guesses, 151/379, 39.84%
  224. Testing law...
  225. 100%|############################################################################| 1101/1101 [7:27:12<00:00, 24.37s/it]
  226. Finished testing law in 7 hours, 27 minutes, 12 seconds.
  227. Total, 238/1101, 21.62%
  228. Random Guess Attempts, 13/1101, 1.18%
  229. Correct Random Guesses, 2/13, 15.38%
  230. Adjusted Score Without Random Guesses, 236/1088, 21.69%
  231. Testing math...
  232. 100%|############################################################################| 1351/1351 [9:31:54<00:00, 25.40s/it]
  233. Finished testing math in 9 hours, 31 minutes, 54 seconds.
  234. Total, 525/1351, 38.86%
  235. Random Guess Attempts, 36/1351, 2.66%
  236. Correct Random Guesses, 4/36, 11.11%
  237. Adjusted Score Without Random Guesses, 521/1315, 39.62%
  238. Testing philosophy...
  239. 100%|##############################################################################| 499/499 [3:20:12<00:00, 24.07s/it]
  240. Finished testing philosophy in 3 hours, 20 minutes, 12 seconds.
  241. Total, 173/499, 34.67%
  242. Random Guess Attempts, 1/499, 0.20%
  243. Correct Random Guesses, 1/1, 100.00%
  244. Adjusted Score Without Random Guesses, 172/498, 34.54%
  245. Testing physics...
  246. 100%|############################################################################| 1299/1299 [8:41:32<00:00, 24.09s/it]
  247. Finished testing physics in 8 hours, 41 minutes, 32 seconds.
  248. Total, 374/1299, 28.79%
  249. Random Guess Attempts, 57/1299, 4.39%
  250. Correct Random Guesses, 8/57, 14.04%
  251. Adjusted Score Without Random Guesses, 366/1242, 29.47%
  252. Testing psychology...
  253. 100%|##############################################################################| 798/798 [5:30:23<00:00, 24.84s/it]
  254. Finished testing psychology in 5 hours, 30 minutes, 24 seconds.
  255. Total, 404/798, 50.63%
  256. Random Guess Attempts, 8/798, 1.00%
  257. Correct Random Guesses, 1/8, 12.50%
  258. Adjusted Score Without Random Guesses, 403/790, 51.01%
  259. Testing other...
  260. 100%|##############################################################################| 924/924 [6:25:31<00:00, 25.03s/it]
  261. Finished testing other in 6 hours, 25 minutes, 31 seconds.
  262. Total, 409/924, 44.26%
  263. Random Guess Attempts, 4/924, 0.43%
  264. Correct Random Guesses, 1/4, 25.00%
  265. Adjusted Score Without Random Guesses, 408/920, 44.35%
  266. Finished the benchmark in 11 hours, 20 minutes, 37 seconds.
  267. Total, 4525/12032, 37.61%
  268. Random Guess Attempts, 259/12032, 2.15%
  269. Correct Random Guesses, 36/259, 13.90%
  270. Adjusted Score Without Random Guesses, 4489/11773, 38.13%
  271. Token Usage:
  272. Prompt tokens: min 871, average 1352, max 2627, total 16266109, tk/s 54.21
  273. Completion tokens: min 40, average 2017, max 2077, total 24265055, tk/s 80.87
  274. Markdown Table:
  275. | overall | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other |
  276. | ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | --- | ---- | ---------- | ------- | ---------- | ----- |
  277. | 37.61 | 60.81 | 37.26 | 32.86 | 38.78 | 46.33 | 23.32 | 45.48 | 39.90 | 21.62 | 38.86 | 34.67 | 28.79 | 50.63 | 44.26 |
Advertisement
Add Comment
Please, Sign In to add comment