Untitled

Inference Logs:

Mistral Large 2407 2.75BPW (4096 Context, Q4), Fasttensors enabled, Auto GPU Split: [23.3, 19.6]:
160 tokens generated in 17.28 seconds (Queue: 0.0 s, Process: 0 cached tokens and 2494 new tokens at 387.93 T/s, Generate: 14.75 T/s, Context: 2494 tokens)
188 tokens generated in 13.38 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 83.26 T/s, Generate: 14.07 T/s, Context: 2494 tokens)
260 tokens generated in 17.99 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 83.26 T/s, Generate: 14.46 T/s, Context: 2494 tokens)
204 tokens generated in 14.32 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 76.85 T/s, Generate: 14.26 T/s, Context: 2494 tokens)
231 tokens generated in 16.1 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 83.26 T/s, Generate: 14.36 T/s, Context: 2494 tokens)
= 14.38 token/s average

Mistral Large 2407 2.75BPW (4096 Context, Q4), Fasttensors and Tensor Parallelism enabled, Manual GPU Split: [22.4, 21.4]:
270 tokens generated in 27.19 seconds (Queue: 0.0 s, Process: 0 cached tokens and 2494 new tokens at 373.11 T/s, Generate: 13.17 T/s, Context: 2494 tokens)
286 tokens generated in 21.97 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 71.36 T/s, Generate: 13.03 T/s, Context: 2494 tokens)
178 tokens generated in 13.8 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 66.61 T/s, Generate: 12.91 T/s, Context: 2494 tokens)
270 tokens generated in 20.54 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 76.85 T/s, Generate: 13.16 T/s, Context: 2494 tokens)
183 tokens generated in 14.06 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 58.77 T/s, Generate: 13.03 T/s, Context: 2494 tokens)
= 13.06 tokens/s average

Mistral Large 2407 2.75BPW (4096 Context, Q4), Fasttensors enabled, Draft Model Mistral 7B Instruct v0.3 3BPW (Q4), Auto GPU Split: [23.3, 19.6]:
247 tokens generated in 30.98 seconds (Queue: 0.0 s, Process: 0 cached tokens and 2494 new tokens at 364.24 T/s, Generate: 10.23 T/s, Context: 2494 tokens)
196 tokens generated in 19.5 seconds (Queue: 0.0 s, Process:2493 cached tokens and 1 new tokens at 76.85 T/s, Generate: 10.06 T/s, Context: 2494 tokens)
196 tokens generated in 19.28 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 90.82 T/s, Generate: 10.17 T/s, Context: 2494 tokens)
210 tokens generated in 17.06 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 83.26 T/s, Generate: 12.32 T/s, Context: 2494 tokens)
247 tokens generated in 20.74 seconds (Queue: 0.0 s, Process: 2493 cached tokens and 1 new tokens at 99.91 T/s, Generate: 11.92 T/s, Context: 2494 tokens)
= 10.94 tokens/s average

Qwen 2 72B Instruct 4BPW (4096 Context, Q4), Fasttensors enabled, Auto GPU Split [19, 18.7]:
307 tokens generated in 26.83 seconds (Queue: 0.0 s, Process: 0 cached tokens and 3081 new tokens at 675.97 T/s, Generate: 13.78 T/s, Context: 3081 tokens)
271 tokens generated in 20.75 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 142.73 T/s, Generate: 13.07 T/s, Context: 3081 tokens)
235 tokens generated in 17.18 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 111.01 T/s, Generate: 13.68 T/s, Context: 3081 tokens)
354 tokens generated in 26.72 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 142.71 T/s, Generate: 13.25 T/s, Context: 3081 tokens)
400 tokens generated in 29.69 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 166.51 T/s, Generate: 13.47 T/s, Context: 3081 tokens)
= 13.45 tokens/s average

Qwen 2 72B Instruct 4BPW (4096 Context, Q4), Fasttensors and Tensor Parallel enabled, Auto GPU Split [21.5, 19.6]:
400 tokens generated in 32.72 seconds (Queue: 0.0 s, Process: 0 cached tokens and 3081 new tokens at 560.77 T/s, Generate: 14.69 T/s, Context: 3081 tokens)
266 tokens generated in 17.94 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 83.26 T/s, Generate: 14.83 T/s, Context: 3081 tokens)
348 tokens generated in 23.31 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 76.86 T/s, Generate: 14.94 T/s, Context: 3081 tokens)
400 tokens generated in 26.35 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 76.85 T/s, Generate: 15.19 T/s, Context: 3081 tokens)
400 tokens generated in 26.2 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 76.85 T/s, Generate: 15.27 T/s, Context: 3081 tokens)
= 14.98 tokens/s average

Qwen 2 72B Instruct 4BPW (4096 Context, Q4), Fasttensors enabled, Draft Model Qwen 2 7B Instruct 3.5BPW (Q6), Auto GPU Split [23.1, 19.7]:
268 tokens generated in 35.0 seconds (Queue: 0.0 s, Process: 0 cached tokens and 3081 new tokens at 627.38 T/s, Generate: 8.91 T/s, Context: 3081 tokens)
396 tokens generated in 46.14 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 111.01 T/s, Generate: 8.58 T/s, Context: 3081 tokens)
343 tokens generated in 40.18 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 124.89 T/s, Generate: 8.54 T/s, Context: 3081 tokens)
364 tokens generated in 39.25 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 111.01 T/s, Generate: 9.28 T/s, Context: 3081 tokens)
241 tokens generated in 27.9 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 111.01 T/s, Generate: 8.64 T/s, Context: 3081 tokens)
= 8.79 tokens/s average

Qwen 2 72B Instruct 4BPW (4096 Context, Q4), Fasttensors and Tensor Parallel enabled, Draft Model Qwen 2 7B Instruct 3.5BPW (Q6), Auto GPU Split [22.4, 21.5]:
347 tokens generated in 31.58 seconds (Queue: 0.0 s, Process: 0 cached tokens and 3081 new tokens at 472.34 T/s, Generate: 13.85 T/s, Context: 3081 tokens)
396 tokens generated in 25.77 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 66.6 T/s, Generate: 15.37 T/s, Context: 3081 tokens)
396 tokens generated in 25.75 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 71.36 T/s, Generate: 15.39 T/s, Context: 3081 tokens)
344 tokens generated in 22.21 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 62.44 T/s, Generate: 15.5 T/s, Context: 3081 tokens)
396 tokens generated in 25.33 seconds (Queue: 0.0 s, Process: 3080 cached tokens and 1 new tokens at 66.61 T/s, Generate: 15.64 T/s, Context: 3081 tokens)
= 15.15 tokens/s average