Advertisement
genericPaster

Untitled

Jun 24th, 2024
171
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 5.96 KB | None | 0 0
  1. ***
  2. Welcome to KoboldCpp - Version 1.68
  3. Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.
  4. Initializing dynamic library: koboldcpp_cublas.so
  5. ==========
  6. Namespace(model=None, model_param='/mnt/Orlando/gguf/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=9, usecublas=['rowsplit', 'mmq'], usevulkan=None, useclblast=None, noblas=False, contextsize=8192, gpulayers=999, tensor_split=[8.0, 10.0, 5.0], ropeconfig=[0.0, 10000.0], blasbatchsize=2048, blasthreads=9, lora=None, noshift=False, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, onready='', benchmark='stdout', multiuser=0, remotetunnel=False, highpriority=False, foreground=False, preloadstory='', quiet=False, ssl=None, nocertify=False, mmproj='', password=None, ignoremissing=False, chatcompletionsadapter='', flashattention=True, quantkv=1, forceversion=0, smartcontext=False, hordemodelname='', hordeworkername='', hordekey='', hordemaxctx=0, hordegenlen=0, sdmodel='', sdthreads=0, sdclamped=0, sdvae='', sdvaeauto=False, sdquant=False, sdlora='', sdloramult=1.0, whispermodel='', hordeconfig=None, sdconfig=None)
  7. ==========
  8. Loading model: /mnt/Orlando/gguf/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf
  9.  
  10. The reported GGUF Arch is: llama
  11.  
  12. ---
  13. Identified as GGUF model: (ver 6)
  14. Attempting to Load...
  15. ---
  16. Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
  17. System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
  18.  
  19. Applying Tensor Split...Automatic RoPE Scaling: Using (scale:1.000, base:500000.0).
  20.  
  21.  
  22. Processing Prompt [BLAS] (0 / 8092 tokens)
  23. Processing Prompt [BLAS] (2048 / 8092 tokens)
  24. Processing Prompt [BLAS] (4096 / 8092 tokens)
  25. Processing Prompt [BLAS] (6144 / 8092 tokens)
  26. Processing Prompt [BLAS] (8092 / 8092 tokens)
  27.  
  28. Generating (1 / 100 tokens)
  29. Generating (2 / 100 tokens)
  30. Generating (3 / 100 tokens)
  31. Generating (4 / 100 tokens)
  32. Generating (5 / 100 tokens)
  33. Generating (6 / 100 tokens)
  34. Generating (7 / 100 tokens)
  35. Generating (8 / 100 tokens)
  36. Generating (9 / 100 tokens)
  37. Generating (10 / 100 tokens)
  38. Generating (11 / 100 tokens)
  39. Generating (12 / 100 tokens)
  40. Generating (13 / 100 tokens)
  41. Generating (14 / 100 tokens)
  42. Generating (15 / 100 tokens)
  43. Generating (16 / 100 tokens)
  44. Generating (17 / 100 tokens)
  45. Generating (18 / 100 tokens)
  46. Generating (19 / 100 tokens)
  47. Generating (20 / 100 tokens)
  48. Generating (21 / 100 tokens)
  49. Generating (22 / 100 tokens)
  50. Generating (23 / 100 tokens)
  51. Generating (24 / 100 tokens)
  52. Generating (25 / 100 tokens)
  53. Generating (26 / 100 tokens)
  54. Generating (27 / 100 tokens)
  55. Generating (28 / 100 tokens)
  56. Generating (29 / 100 tokens)
  57. Generating (30 / 100 tokens)
  58. Generating (31 / 100 tokens)
  59. Generating (32 / 100 tokens)
  60. Generating (33 / 100 tokens)
  61. Generating (34 / 100 tokens)
  62. Generating (35 / 100 tokens)
  63. Generating (36 / 100 tokens)
  64. Generating (37 / 100 tokens)
  65. Generating (38 / 100 tokens)
  66. Generating (39 / 100 tokens)
  67. Generating (40 / 100 tokens)
  68. Generating (41 / 100 tokens)
  69. Generating (42 / 100 tokens)
  70. Generating (43 / 100 tokens)
  71. Generating (44 / 100 tokens)
  72. Generating (45 / 100 tokens)
  73. Generating (46 / 100 tokens)
  74. Generating (47 / 100 tokens)
  75. Generating (48 / 100 tokens)
  76. Generating (49 / 100 tokens)
  77. Generating (50 / 100 tokens)
  78. Generating (51 / 100 tokens)
  79. Generating (52 / 100 tokens)
  80. Generating (53 / 100 tokens)
  81. Generating (54 / 100 tokens)
  82. Generating (55 / 100 tokens)
  83. Generating (56 / 100 tokens)
  84. Generating (57 / 100 tokens)
  85. Generating (58 / 100 tokens)
  86. Generating (59 / 100 tokens)
  87. Generating (60 / 100 tokens)
  88. Generating (61 / 100 tokens)
  89. Generating (62 / 100 tokens)
  90. Generating (63 / 100 tokens)
  91. Generating (64 / 100 tokens)
  92. Generating (65 / 100 tokens)
  93. Generating (66 / 100 tokens)
  94. Generating (67 / 100 tokens)
  95. Generating (68 / 100 tokens)
  96. Generating (69 / 100 tokens)
  97. Generating (70 / 100 tokens)
  98. Generating (71 / 100 tokens)
  99. Generating (72 / 100 tokens)
  100. Generating (73 / 100 tokens)
  101. Generating (74 / 100 tokens)
  102. Generating (75 / 100 tokens)
  103. Generating (76 / 100 tokens)
  104. Generating (77 / 100 tokens)
  105. Generating (78 / 100 tokens)
  106. Generating (79 / 100 tokens)
  107. Generating (80 / 100 tokens)
  108. Generating (81 / 100 tokens)
  109. Generating (82 / 100 tokens)
  110. Generating (83 / 100 tokens)
  111. Generating (84 / 100 tokens)
  112. Generating (85 / 100 tokens)
  113. Generating (86 / 100 tokens)
  114. Generating (87 / 100 tokens)
  115. Generating (88 / 100 tokens)
  116. Generating (89 / 100 tokens)
  117. Generating (90 / 100 tokens)
  118. Generating (91 / 100 tokens)
  119. Generating (92 / 100 tokens)
  120. Generating (93 / 100 tokens)
  121. Generating (94 / 100 tokens)
  122. Generating (95 / 100 tokens)
  123. Generating (96 / 100 tokens)
  124. Generating (97 / 100 tokens)
  125. Generating (98 / 100 tokens)
  126. Generating (99 / 100 tokens)
  127. Generating (100 / 100 tokens)
  128. CtxLimit: 8192/8192, Process:70.86s (8.8ms/T = 114.20T/s), Generate:19.92s (199.2ms/T = 5.02T/s), Total:90.78s (1.10T/s)Load Text Model OK: True
  129. Embedded KoboldAI Lite loaded.
  130. Embedded API docs loaded.
  131. Starting Kobold API on port 5001 at http://localhost:5001/api/
  132. Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
  133.  
  134. Running benchmark (Not Saved)...
  135.  
  136. Benchmark Completed - v1.68 Results:
  137. ======
  138. Flags: NoAVX2=False Threads=9 HighPriority=False NoBlas=False Cublas_Args=['rowsplit', 'mmq'] Tensor_Split=[8.0, 10.0, 5.0] BlasThreads=9 BlasBatchSize=2048 FlashAttention=True KvCache=1
  139. Timestamp: 2024-06-24 23:00:48.688551+00:00
  140. Backend: koboldcpp_cublas.so
  141. Layers: 999
  142. Model: Meta-Llama-3-70B-Instruct-Q4_K_M
  143. MaxCtx: 8192
  144. GenAmount: 100
  145. -----
  146. ProcessingTime: 70.858s
  147. ProcessingSpeed: 114.20T/s
  148. GenerationTime: 19.923s
  149. GenerationSpeed: 5.02T/s
  150. TotalTime: 90.781s
  151. Output:
  152. ```
  153.  
  154. -----
  155. Server was not started, main function complete. Idling.
  156.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement