Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- root@2b45959ac2a3:/workspace/exllama# python test_benchmark_inference.py -d /workspace/exllama/models/samantha-33B-GPTQ -p
- -- Loading model
- -- Tokenizer: /workspace/exllama/models/samantha-33B-GPTQ/tokenizer.model
- -- Model config: /workspace/exllama/models/samantha-33B-GPTQ/config.json
- -- Model: /workspace/exllama/models/samantha-33B-GPTQ/Samantha-33B-GPTQ-4bit.act-order.safetensors
- -- Sequence length: 2048
- -- Options: ['attention: switched', 'matmul: switched', 'mlp: switched', 'perf']
- ** Time, Load model: 6.89 seconds
- -- Groupsize (inferred): None
- -- Act-order (inferred): no
- ** VRAM, Model: [cuda:0] 15,936.28 MB
- -- Inference, first pass.
- ** Time, Inference: 1.90 seconds
- ** Speed: 1012.72 tokens/second
- -- Generating 128 tokens, 1920 token prompt...
- ** Speed: 35.26 tokens/second
- -- Generating 128 tokens, 4 token prompt...
- ** Speed: 40.48 tokens/second
- ** VRAM, Inference: [cuda:0] 3,964.67 MB
- ** VRAM, Total: [cuda:0] 19,900.95 MB
- root@2b45959ac2a3:/workspace/exllama# python test_benchmark_inference.py -d /workspace/exllama/models/guanaco-65B-GPTQ/ -p
- -- Loading model
- -- Tokenizer: /workspace/exllama/models/guanaco-65B-GPTQ/tokenizer.model
- -- Model config: /workspace/exllama/models/guanaco-65B-GPTQ/config.json
- -- Model: /workspace/exllama/models/guanaco-65B-GPTQ/Guanaco-65B-GPTQ-4bit.act-order.safetensors
- -- Sequence length: 2048
- -- Options: ['attention: switched', 'matmul: switched', 'mlp: switched', 'perf']
- ** Time, Load model: 8.21 seconds
- -- Groupsize (inferred): None
- -- Act-order (inferred): no
- ** VRAM, Model: [cuda:0] 31,399.77 MB
- -- Inference, first pass.
- ** Time, Inference: 3.57 seconds
- ** Speed: 537.69 tokens/second
- -- Generating 128 tokens, 1920 token prompt...
- ** Speed: 19.13 tokens/second
- -- Generating 128 tokens, 4 token prompt...
- ** Speed: 19.27 tokens/second
- ** VRAM, Inference: [cuda:0] 6,155.17 MB
- ** VRAM, Total: [cuda:0] 37,554.94 MB
Advertisement
Add Comment
Please, Sign In to add comment