Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Testes com valor das camadas usadas pela GPU e CPU entre ():
- tulpar 7b(35 camadas) testing benchmark:
- #Testes com contexto curto(21 tokens)
- CPU(35) GPU(0):
- ```
- llama_print_timings: load time = 2776.49 ms
- llama_print_timings: sample time = 28.02 ms / 100 runs ( 0.28 ms per token, 3569.52 tokens per second)
- llama_print_timings: prompt eval time = 1225.50 ms / 11 tokens ( 111.41 ms per token, 8.98 tokens per second)
- llama_print_timings: eval time = 19845.56 ms / 99 runs ( 200.46 ms per token, 4.99 tokens per second)
- llama_print_timings: total time = 21449.92 ms
- Output generated in 21.81 seconds (4.59 tokens/s, 100 tokens, context 21, seed 80807575)
- ```
- Velocidade: 4.59 tokens por segundo
- CPU(30) GPU(5):
- ```
- llama_print_timings: load time = 2380.79 ms
- llama_print_timings: sample time = 28.52 ms / 100 runs ( 0.29 ms per token, 3505.94 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 17758.01 ms / 100 runs ( 177.58 ms per token, 5.63 tokens per second)
- llama_print_timings: total time = 18136.91 ms
- Output generated in 18.49 seconds (5.41 tokens/s, 100 tokens, context 21, seed 80807575)
- ```
- Velocidade: 5.41 tokens por segundo
- CPU(25) GPU(10):
- ```
- llama_print_timings: load time = 2139.83 ms
- llama_print_timings: sample time = 28.17 ms / 100 runs ( 0.28 ms per token, 3550.13 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 16001.52 ms / 100 runs ( 160.02 ms per token, 6.25 tokens per second)
- llama_print_timings: total time = 16382.37 ms
- Output generated in 16.73 seconds (5.98 tokens/s, 100 tokens, context 21, seed 80807575)
- ```
- Velocidade: 5.98 tokens por segundo
- CPU(15) GPU(20) Low vram:
- ```
- llama_print_timings: load time = 1455.39 ms
- llama_print_timings: sample time = 28.39 ms / 100 runs ( 0.28 ms per token, 3522.74 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 12303.83 ms / 100 runs ( 123.04 ms per token, 8.13 tokens per second)
- llama_print_timings: total time = 12699.78 ms
- Output generated in 13.07 seconds (7.65 tokens/s, 100 tokens, context 21, seed 80807575)
- ```
- Velocidade: 7.65 tokens por segundo
- #Testes com contexto longo(1179)
- CPU(35) GPU(0):
- ```
- llama_print_timings: load time = 6006.68 ms
- llama_print_timings: sample time = 28.43 ms / 100 runs ( 0.28 ms per token, 3517.66 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 23526.07 ms / 100 runs ( 235.26 ms per token, 4.25 tokens per second)
- llama_print_timings: total time = 23910.16 ms
- Output generated in 24.28 seconds (4.12 tokens/s, 100 tokens, context 1179, seed 80807575)
- ```
- Velocidade: 4.12 tokens por segundo
- CPU(30) GPU(5):
- ```
- llama_print_timings: load time = 5556.94 ms
- llama_print_timings: sample time = 25.83 ms / 92 runs ( 0.28 ms per token, 3561.75 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 19757.52 ms / 92 runs ( 214.76 ms per token, 4.66 tokens per second)
- llama_print_timings: total time = 20113.70 ms
- Output generated in 20.48 seconds (4.44 tokens/s, 91 tokens, context 1179, seed 80807575)
- ```
- Velocidade: 4.44 tokens por segundo
- CPU(25) GPU(10):
- ```
- llama_print_timings: load time = 5161.97 ms
- llama_print_timings: sample time = 25.74 ms / 92 runs ( 0.28 ms per token, 3574.20 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 18213.97 ms / 92 runs ( 197.98 ms per token, 5.05 tokens per second)
- llama_print_timings: total time = 18574.51 ms
- Output generated in 18.93 seconds (4.81 tokens/s, 91 tokens, context 1179, seed 80807575)
- ```
- Velocidade: 4.81 tokens por segundo
- CPU(15) GPU(20) Low vram:
- ```
- llama_print_timings: load time = 4627.16 ms
- llama_print_timings: sample time = 11.01 ms / 39 runs ( 0.28 ms per token, 3542.23 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 6192.65 ms / 39 runs ( 158.79 ms per token, 6.30 tokens per second)
- llama_print_timings: total time = 6343.34 ms
- Output generated in 6.72 seconds (5.66 tokens/s, 38 tokens, context 1179, seed 80807575)
- ```
- Velocidade: 5.66 tokens por segundo
- Mlewdchat L2 13b q4_K_S(43 camadas) testing benchmark:
- #Testes com contexto curto(21 tokens)
- CPU(43) GPU(0):
- ```
- llama_print_timings: load time = 33123.68 ms
- llama_print_timings: sample time = 28.14 ms / 100 runs ( 0.28 ms per token, 3554.29 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 36035.70 ms / 100 runs ( 360.36 ms per token, 2.78 tokens per second)
- llama_print_timings: total time = 36417.16 ms
- Output generated in 36.79 seconds (2.72 tokens/s, 100 tokens, context 21, seed 80807575)
- ```
- Velocidade: 2.72 tokens por segundo
- CPU(38) GPU(5):
- ```
- llama_print_timings: load time = 4640.32 ms
- llama_print_timings: sample time = 28.01 ms / 100 runs ( 0.28 ms per token, 3570.66 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 31952.95 ms / 100 runs ( 319.53 ms per token, 3.13 tokens per second)
- llama_print_timings: total time = 32338.01 ms
- Output generated in 32.70 seconds (3.06 tokens/s, 100 tokens, context 21, seed 80807575)
- ```
- Velocidade: 3.06 tokens por segundo
- CPU(33) GPU(10):
- ```
- llama_print_timings: load time = 4213.47 ms
- llama_print_timings: sample time = 28.12 ms / 100 runs ( 0.28 ms per token, 3556.19 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 29386.01 ms / 100 runs ( 293.86 ms per token, 3.40 tokens per second)
- llama_print_timings: total time = 29800.18 ms
- Output generated in 30.16 seconds (3.32 tokens/s, 100 tokens, context 21, seed 80807575)
- ```
- Velocidade: 3.32 tokens por segundo
- CPU(23) GPU(20) Low vram:
- ```
- llama_print_timings: load time = 3102.83 ms
- llama_print_timings: sample time = 28.72 ms / 100 runs ( 0.29 ms per token, 3481.53 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 23273.40 ms / 100 runs ( 232.73 ms per token, 4.30 tokens per second)
- llama_print_timings: total time = 23662.53 ms
- Output generated in 24.04 seconds (4.16 tokens/s, 100 tokens, context 21, seed 80807575)
- ```
- Velocidade: 4.16 tokens por segundo
- #Testes com contexto longo(1179)
- CPU(23) GPU(20) Low vram:
- ```
- llama_print_timings: load time = 3102.83 ms
- llama_print_timings: sample time = 29.02 ms / 100 runs ( 0.29 ms per token, 3446.26 tokens per second)
- llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
- llama_print_timings: eval time = 28811.66 ms / 100 runs ( 288.12 ms per token, 3.47 tokens per second)
- llama_print_timings: total time = 29202.11 ms
- Output generated in 29.57 seconds (3.38 tokens/s, 100 tokens, context 1179, seed 80807575)
- ```
- Velocidade: 3.38 tokens por segundo
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement