Advertisement
Guest User

Untitled

a guest
Sep 15th, 2024
244
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.40 KB | None | 0 0
  1. Inference Logs:
  2.  
  3. Mistral Large 2407 2.75BPW (4096 Context, Q4), Fasttensors enabled, Auto GPU Split: [23.3, 19.6]:
  4. 160 tokens generated in 17.28 seconds (Queue: 0.0 s, Process: 0 cached tokens and 2494 new tokens at 387.93 T/s, Generate: 14.75 T/s, Context: 2494 tokens)
  5. 188 tokens generated in 13.38 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 83.26 T/s, Generate: 14.07 T/s, Context: 2494 tokens)
  6. 260 tokens generated in 17.99 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 83.26 T/s, Generate: 14.46 T/s, Context: 2494 tokens)
  7. 204 tokens generated in 14.32 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 76.85 T/s, Generate: 14.26 T/s, Context: 2494 tokens)
  8. 231 tokens generated in 16.1 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 83.26 T/s, Generate: 14.36 T/s, Context: 2494 tokens)
  9. = 14.38 token/s average
  10.  
  11. Mistral Large 2407 2.75BPW (4096 Context, Q4), Fasttensors and Tensor Parallelism enabled, Manual GPU Split: [22.4, 21.4]:
  12. 270 tokens generated in 27.19 seconds (Queue: 0.0 s, Process: 0 cached tokens and 2494 new tokens at 373.11 T/s, Generate: 13.17 T/s, Context: 2494 tokens)
  13. 286 tokens generated in 21.97 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 71.36 T/s, Generate: 13.03 T/s, Context: 2494 tokens)
  14. 178 tokens generated in 13.8 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 66.61 T/s, Generate: 12.91 T/s, Context: 2494 tokens)
  15. 270 tokens generated in 20.54 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 76.85 T/s, Generate: 13.16 T/s, Context: 2494 tokens)
  16. 183 tokens generated in 14.06 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 58.77 T/s, Generate: 13.03 T/s, Context: 2494 tokens)
  17. = 13.06 tokens/s average
  18.  
  19. Mistral Large 2407 2.75BPW (4096 Context, Q4), Fasttensors enabled, Draft Model Mistral 7B Instruct v0.3 3BPW (Q4), Auto GPU Split: [23.3, 19.6]:
  20. 247 tokens generated in 30.98 seconds (Queue: 0.0 s, Process: 0 cached tokens and 2494 new tokens at 364.24 T/s, Generate: 10.23 T/s, Context: 2494 tokens)
  21. 196 tokens generated in 19.5 seconds (Queue: 0.0 s, Process:2493 cached tokens and 1 new tokens at 76.85 T/s, Generate: 10.06 T/s, Context: 2494 tokens)
  22. 196 tokens generated in 19.28 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 90.82 T/s, Generate: 10.17 T/s, Context: 2494 tokens)
  23. 210 tokens generated in 17.06 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 83.26 T/s, Generate: 12.32 T/s, Context: 2494 tokens)
  24. 247 tokens generated in 20.74 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 99.91 T/s, Generate: 11.92 T/s, Context: 2494 tokens)
  25. = 10.94 tokens/s average
  26.  
  27. Qwen 2 72B Instruct 4BPW (4096 Context, Q4), Fasttensors enabled, Auto GPU Split [19, 18.7]:
  28. 307 tokens generated in 26.83 seconds (Queue: 0.0 s, Process: 0 cached tokens and 3081 new tokens at 675.97 T/s, Generate: 13.78 T/s, Context: 3081 tokens)
  29. 271 tokens generated in 20.75 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 142.73 T/s, Generate: 13.07 T/s, Context: 3081 tokens)
  30. 235 tokens generated in 17.18 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 111.01 T/s, Generate: 13.68 T/s, Context: 3081 tokens)
  31. 354 tokens generated in 26.72 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 142.71 T/s, Generate: 13.25 T/s, Context: 3081 tokens)
  32. 400 tokens generated in 29.69 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 166.51 T/s, Generate: 13.47 T/s, Context: 3081 tokens)
  33. = 13.45 tokens/s average
  34.  
  35. Qwen 2 72B Instruct 4BPW (4096 Context, Q4), Fasttensors and Tensor Parallel enabled, Auto GPU Split [21.5, 19.6]:
  36. 400 tokens generated in 32.72 seconds (Queue: 0.0 s, Process: 0 cached tokens and 3081 new tokens at 560.77 T/s, Generate: 14.69 T/s, Context: 3081 tokens)
  37. 266 tokens generated in 17.94 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 83.26 T/s, Generate: 14.83 T/s, Context: 3081 tokens)
  38. 348 tokens generated in 23.31 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 76.86 T/s, Generate: 14.94 T/s, Context: 3081 tokens)
  39. 400 tokens generated in 26.35 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 76.85 T/s, Generate: 15.19 T/s, Context: 3081 tokens)
  40. 400 tokens generated in 26.2 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 76.85 T/s, Generate: 15.27 T/s, Context: 3081 tokens)
  41. = 14.98 tokens/s average
  42.  
  43. Qwen 2 72B Instruct 4BPW (4096 Context, Q4), Fasttensors enabled, Draft Model Qwen 2 7B Instruct 3.5BPW (Q6), Auto GPU Split [23.1, 19.7]:
  44. 268 tokens generated in 35.0 seconds (Queue: 0.0 s, Process: 0 cached tokens and 3081 new tokens at 627.38 T/s, Generate: 8.91 T/s, Context: 3081 tokens)
  45. 396 tokens generated in 46.14 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 111.01 T/s, Generate: 8.58 T/s, Context: 3081 tokens)
  46. 343 tokens generated in 40.18 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 124.89 T/s, Generate: 8.54 T/s, Context: 3081 tokens)
  47. 364 tokens generated in 39.25 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 111.01 T/s, Generate: 9.28 T/s, Context: 3081 tokens)
  48. 241 tokens generated in 27.9 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 111.01 T/s, Generate: 8.64 T/s, Context: 3081 tokens)
  49. = 8.79 tokens/s average
  50.  
  51. Qwen 2 72B Instruct 4BPW (4096 Context, Q4), Fasttensors and Tensor Parallel enabled, Draft Model Qwen 2 7B Instruct 3.5BPW (Q6), Auto GPU Split [22.4, 21.5]:
  52. 347 tokens generated in 31.58 seconds (Queue: 0.0 s, Process: 0 cached tokens and 3081 new tokens at 472.34 T/s, Generate: 13.85 T/s, Context: 3081 tokens)
  53. 396 tokens generated in 25.77 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 66.6 T/s, Generate: 15.37 T/s, Context: 3081 tokens)
  54. 396 tokens generated in 25.75 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 71.36 T/s, Generate: 15.39 T/s, Context: 3081 tokens)
  55. 344 tokens generated in 22.21 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 62.44 T/s, Generate: 15.5 T/s, Context: 3081 tokens)
  56. 396 tokens generated in 25.33 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 66.61 T/s, Generate: 15.64 T/s, Context: 3081 tokens)
  57. = 15.15 tokens/s average
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement