Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ***
- Welcome to KoboldCpp - Version 1.68
- Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.
- Initializing dynamic library: koboldcpp_cublas.so
- ==========
- Namespace(model=None, model_param='/mnt/Orlando/gguf/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=9, usecublas=['rowsplit', 'mmq'], usevulkan=None, useclblast=None, noblas=False, contextsize=8192, gpulayers=999, tensor_split=[8.0, 10.0, 5.0], ropeconfig=[0.0, 10000.0], blasbatchsize=2048, blasthreads=9, lora=None, noshift=False, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, onready='', benchmark='stdout', multiuser=0, remotetunnel=False, highpriority=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False, mmproj='', password=None, ignoremissing=False, chatcompletionsadapter='', flashattention=True, quantkv=1, forceversion=0, smartcontext=False, hordemodelname='', hordeworkername='', hordekey='', hordemaxctx=0, hordegenlen=0, sdmodel='', sdthreads=0, sdclamped=0, sdvae='', sdvaeauto=False, sdquant=False, sdlora='', sdloramult=1.0, whispermodel='', hordeconfig=None, sdconfig=None)
- ==========
- Loading model: /mnt/Orlando/gguf/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf
- The reported GGUF Arch is: llama
- ---
- Identified as GGUF model: (ver 6)
- Attempting to Load...
- ---
- Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
- System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
- Applying Tensor Split...Automatic RoPE Scaling: Using (scale:1.000, base:500000.0).
- Processing Prompt [BLAS] (0 / 8092 tokens)
- Processing Prompt [BLAS] (2048 / 8092 tokens)
- Processing Prompt [BLAS] (4096 / 8092 tokens)
- Processing Prompt [BLAS] (6144 / 8092 tokens)
- Processing Prompt [BLAS] (8092 / 8092 tokens)
- Generating (1 / 100 tokens)
- Generating (2 / 100 tokens)
- Generating (3 / 100 tokens)
- Generating (4 / 100 tokens)
- Generating (5 / 100 tokens)
- Generating (6 / 100 tokens)
- Generating (7 / 100 tokens)
- Generating (8 / 100 tokens)
- Generating (9 / 100 tokens)
- Generating (10 / 100 tokens)
- Generating (11 / 100 tokens)
- Generating (12 / 100 tokens)
- Generating (13 / 100 tokens)
- Generating (14 / 100 tokens)
- Generating (15 / 100 tokens)
- Generating (16 / 100 tokens)
- Generating (17 / 100 tokens)
- Generating (18 / 100 tokens)
- Generating (19 / 100 tokens)
- Generating (20 / 100 tokens)
- Generating (21 / 100 tokens)
- Generating (22 / 100 tokens)
- Generating (23 / 100 tokens)
- Generating (24 / 100 tokens)
- Generating (25 / 100 tokens)
- Generating (26 / 100 tokens)
- Generating (27 / 100 tokens)
- Generating (28 / 100 tokens)
- Generating (29 / 100 tokens)
- Generating (30 / 100 tokens)
- Generating (31 / 100 tokens)
- Generating (32 / 100 tokens)
- Generating (33 / 100 tokens)
- Generating (34 / 100 tokens)
- Generating (35 / 100 tokens)
- Generating (36 / 100 tokens)
- Generating (37 / 100 tokens)
- Generating (38 / 100 tokens)
- Generating (39 / 100 tokens)
- Generating (40 / 100 tokens)
- Generating (41 / 100 tokens)
- Generating (42 / 100 tokens)
- Generating (43 / 100 tokens)
- Generating (44 / 100 tokens)
- Generating (45 / 100 tokens)
- Generating (46 / 100 tokens)
- Generating (47 / 100 tokens)
- Generating (48 / 100 tokens)
- Generating (49 / 100 tokens)
- Generating (50 / 100 tokens)
- Generating (51 / 100 tokens)
- Generating (52 / 100 tokens)
- Generating (53 / 100 tokens)
- Generating (54 / 100 tokens)
- Generating (55 / 100 tokens)
- Generating (56 / 100 tokens)
- Generating (57 / 100 tokens)
- Generating (58 / 100 tokens)
- Generating (59 / 100 tokens)
- Generating (60 / 100 tokens)
- Generating (61 / 100 tokens)
- Generating (62 / 100 tokens)
- Generating (63 / 100 tokens)
- Generating (64 / 100 tokens)
- Generating (65 / 100 tokens)
- Generating (66 / 100 tokens)
- Generating (67 / 100 tokens)
- Generating (68 / 100 tokens)
- Generating (69 / 100 tokens)
- Generating (70 / 100 tokens)
- Generating (71 / 100 tokens)
- Generating (72 / 100 tokens)
- Generating (73 / 100 tokens)
- Generating (74 / 100 tokens)
- Generating (75 / 100 tokens)
- Generating (76 / 100 tokens)
- Generating (77 / 100 tokens)
- Generating (78 / 100 tokens)
- Generating (79 / 100 tokens)
- Generating (80 / 100 tokens)
- Generating (81 / 100 tokens)
- Generating (82 / 100 tokens)
- Generating (83 / 100 tokens)
- Generating (84 / 100 tokens)
- Generating (85 / 100 tokens)
- Generating (86 / 100 tokens)
- Generating (87 / 100 tokens)
- Generating (88 / 100 tokens)
- Generating (89 / 100 tokens)
- Generating (90 / 100 tokens)
- Generating (91 / 100 tokens)
- Generating (92 / 100 tokens)
- Generating (93 / 100 tokens)
- Generating (94 / 100 tokens)
- Generating (95 / 100 tokens)
- Generating (96 / 100 tokens)
- Generating (97 / 100 tokens)
- Generating (98 / 100 tokens)
- Generating (99 / 100 tokens)
- Generating (100 / 100 tokens)
- CtxLimit: 8192/8192, Process:70.86s (8.8ms/T = 114.20T/s), Generate:19.92s (199.2ms/T = 5.02T/s), Total:90.78s (1.10T/s)Load Text Model OK: True
- Embedded KoboldAI Lite loaded.
- Embedded API docs loaded.
- Starting Kobold API on port 5001 at http://localhost:5001/api/
- Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
- Running benchmark (Not Saved)...
- Benchmark Completed - v1.68 Results:
- ======
- Flags: NoAVX2=False Threads=9 HighPriority=False NoBlas=False Cublas_Args=['rowsplit', 'mmq'] Tensor_Split=[8.0, 10.0, 5.0] BlasThreads=9 BlasBatchSize=2048 FlashAttention=True KvCache=1
- Timestamp: 2024-06-24 23:00:48.688551+00:00
- Backend: koboldcpp_cublas.so
- Layers: 999
- Model: Meta-Llama-3-70B-Instruct-Q4_K_M
- MaxCtx: 8192
- GenAmount: 100
- -----
- ProcessingTime: 70.858s
- ProcessingSpeed: 114.20T/s
- GenerationTime: 19.923s
- GenerationSpeed: 5.02T/s
- TotalTime: 90.781s
- Output:
- ```
- -----
- Server was not started, main function complete. Idling.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement