Guest User

Untitled

a guest
May 29th, 2023
36
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.91 KB | None | 0 0
  1. root@2b45959ac2a3:/workspace/exllama# python test_benchmark_inference.py -d /workspace/exllama/models/samantha-33B-GPTQ -p
  2. -- Loading model
  3. -- Tokenizer: /workspace/exllama/models/samantha-33B-GPTQ/tokenizer.model
  4. -- Model config: /workspace/exllama/models/samantha-33B-GPTQ/config.json
  5. -- Model: /workspace/exllama/models/samantha-33B-GPTQ/Samantha-33B-GPTQ-4bit.act-order.safetensors
  6. -- Sequence length: 2048
  7. -- Options: ['attention: switched', 'matmul: switched', 'mlp: switched', 'perf']
  8. ** Time, Load model: 6.89 seconds
  9. -- Groupsize (inferred): None
  10. -- Act-order (inferred): no
  11. ** VRAM, Model: [cuda:0] 15,936.28 MB
  12. -- Inference, first pass.
  13. ** Time, Inference: 1.90 seconds
  14. ** Speed: 1012.72 tokens/second
  15. -- Generating 128 tokens, 1920 token prompt...
  16. ** Speed: 35.26 tokens/second
  17. -- Generating 128 tokens, 4 token prompt...
  18. ** Speed: 40.48 tokens/second
  19. ** VRAM, Inference: [cuda:0] 3,964.67 MB
  20. ** VRAM, Total: [cuda:0] 19,900.95 MB
  21. root@2b45959ac2a3:/workspace/exllama# python test_benchmark_inference.py -d /workspace/exllama/models/guanaco-65B-GPTQ/ -p
  22. -- Loading model
  23. -- Tokenizer: /workspace/exllama/models/guanaco-65B-GPTQ/tokenizer.model
  24. -- Model config: /workspace/exllama/models/guanaco-65B-GPTQ/config.json
  25. -- Model: /workspace/exllama/models/guanaco-65B-GPTQ/Guanaco-65B-GPTQ-4bit.act-order.safetensors
  26. -- Sequence length: 2048
  27. -- Options: ['attention: switched', 'matmul: switched', 'mlp: switched', 'perf']
  28. ** Time, Load model: 8.21 seconds
  29. -- Groupsize (inferred): None
  30. -- Act-order (inferred): no
  31. ** VRAM, Model: [cuda:0] 31,399.77 MB
  32. -- Inference, first pass.
  33. ** Time, Inference: 3.57 seconds
  34. ** Speed: 537.69 tokens/second
  35. -- Generating 128 tokens, 1920 token prompt...
  36. ** Speed: 19.13 tokens/second
  37. -- Generating 128 tokens, 4 token prompt...
  38. ** Speed: 19.27 tokens/second
  39. ** VRAM, Inference: [cuda:0] 6,155.17 MB
  40. ** VRAM, Total: [cuda:0] 37,554.94 MB
Advertisement
Add Comment
Please, Sign In to add comment