Guest User

Untitled

a guest
Aug 12th, 2024
99
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.00 KB | None | 0 0
  1. I currently have Backyard.AI app version 0.26.2 with Experimental backend enabled.
  2. Max Model Context is set to 8k.
  3.  
  4.  
  5. [2024-08-12 10:52:43.619] [info] Spawning new model process...
  6. [2024-08-12 10:52:43.620] [info] Entered init()
  7. [2024-08-12 10:52:43.620] [info] Dispatching action "SPAWN"
  8. [2024-08-12 10:52:43.620] [info] Handling side effects after entering state "spawning-px"
  9. [2024-08-12 10:52:45.370] [info] Starting gpu detection for cublas-12.1.0
  10. [2024-08-12 10:52:45.478] [info] Finished gpu detection for cublas-12.1.0 after 108 ms
  11. [2024-08-12 10:52:45.478] [info] Using free VRAM if available
  12. [2024-08-12 10:52:45.479] [info] Fetched GPU and available vRAM {
  13. cardName: 'NVIDIA GeForce RTX 4060 Ti',
  14. maxUsableVRamMiB: 15647.671875,
  15. gpuDeviceInfo: { index: 0, type: 'cublas' }
  16. }
  17. [2024-08-12 10:52:45.480] [info] Running auto kv quant detection.
  18. [2024-08-12 10:52:45.480] [info] Found 23 layers: {
  19. maxLayers: 28,
  20. ctxAdjustment: 4,
  21. kvCacheSize: 540,
  22. printedVRam: 892.6299999999999,
  23. scratchBufferSize: 372.06,
  24. estimatedScratchBufferSize: 372.06,
  25. vRamBudget: 15447.671875,
  26. isNotCLBlast: true,
  27. vRamPerLayer: 500.5699999999999,
  28. vRamForLayerMiB: 15236.998499999996
  29. }
  30. [2024-08-12 10:52:45.481] [info] Found 23/28 layers for {"k":"f16","v":"f16"}
  31. [2024-08-12 10:52:45.481] [info] Found 24 layers: {
  32. maxLayers: 28,
  33. ctxAdjustment: 4,
  34. kvCacheSize: 540,
  35. printedVRam: 892.6299999999999,
  36. scratchBufferSize: 372.06,
  37. estimatedScratchBufferSize: 372.06,
  38. vRamBudget: 15447.671875,
  39. isNotCLBlast: true,
  40. vRamPerLayer: 500.5699999999999,
  41. vRamForLayerMiB: 14901.596999999996
  42. }
  43. [2024-08-12 10:52:45.481] [info] Found 24/28 layers for {"k":"q8_0","v":"q8_0"}
  44. [2024-08-12 10:52:45.481] [info] Found 25 layers: {
  45. maxLayers: 28,
  46. ctxAdjustment: 4,
  47. kvCacheSize: 540,
  48. printedVRam: 892.6299999999999,
  49. scratchBufferSize: 372.06,
  50. estimatedScratchBufferSize: 372.06,
  51. vRamBudget: 15447.671875,
  52. isNotCLBlast: true,
  53. vRamPerLayer: 500.5699999999999,
  54. vRamForLayerMiB: 14946.820499999996
  55. }
  56. [2024-08-12 10:52:45.481] [info] Found 25/28 layers for {"k":"q4_0","v":"q4_0"}
  57. [2024-08-12 10:52:45.481] [info] Spawning model px: { gpuLayers: 25, isAtMaxLayers: false }
  58. [2024-08-12 10:52:45.481] [info] Spawning llama server process...
  59. [2024-08-12 10:52:46.038] [info] Parsing GGUF model header took 541 ms
  60. [2024-08-12 10:52:46.039] [info] Detected model architecture: deepseek2
  61. [2024-08-12 10:52:46.039] [info] Rope params: {
  62. ropeFreqBase: 10000,
  63. ropeFreqScale: 1,
  64. finetuneContextLength: 163840,
  65. ctxSize: 8192
  66. }
  67. [2024-08-12 10:52:46.039] [info] {
  68. model: 'mradermacher__DeepSeek-V2-Lite-Chat-i1-GGUF__DeepSeek-V2-Lite-Chat.i1-Q6_K.gguf',
  69. llamaBin: 'llama-cpp-binaries\\windows\\cublas-12.1.0\\v0.25.28\\noavx\\backyard.exe',
  70. flags: [
  71. '--host',
  72. '127.0.0.1',
  73. '--port',
  74. '62240',
  75. '--model',
  76. 'D:\\AI\\character\\models\\mradermacher__DeepSeek-V2-Lite-Chat-i1-GGUF__DeepSeek-V2-Lite-Chat.i1-Q6_K.gguf',
  77. '--ctx-size',
  78. '8192',
  79. '--rope-freq-base',
  80. '10000',
  81. '--rope-freq-scale',
  82. '1',
  83. '--batch-size',
  84. '512',
  85. '--log-disable',
  86. '--flash-attn',
  87. '--cache-type-k',
  88. 'q4_0',
  89. '--cache-type-v',
  90. 'q4_0',
  91. '--mlock',
  92. '--n-gpu-layers',
  93. '25'
  94. ]
  95. }
  96. [2024-08-12 10:52:46.040] [info] Attempting to start llama process { CUDA_VISIBLE_DEVICES: '0' }
  97. [2024-08-12 10:52:46.042] [info] Spawned llama process, pid: 40228 GPU Acceleration: 25
  98. [2024-08-12 10:52:46.042] [info] Dispatching action "SPAWN_DONE"
  99. [2024-08-12 10:52:46.042] [info] Finished init()
  100. [2024-08-12 10:52:46.042] [info] Handling side effects after entering state "starting-llama"
  101. [2024-08-12 10:52:46.042] [info] Starting llama server...
  102. [2024-08-12 10:52:48.581] [info] Tried to cancel when not streaming
  103. [2024-08-12 10:52:51.695] [info] Unsupported file format. Try a different quantization for this model, or toggle the Experimental backend in the Advanced settings.
  104. [2024-08-12 10:52:51.695] [info]
  105. ___STDERR___
  106. llama_model_loader: loaded meta data with 49 key-value pairs and 377 tensors from D:\AI\character\models\mradermacher__DeepSeek-V2-Lite-Chat-i1-GGUF__DeepSeek-V2-Lite-Chat.i1-Q6_K.gguf (version GGUF V3 (latest))
  107. llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
  108. llama_model_loader: - kv 0: general.architecture str = deepseek2
  109. llama_model_loader: - kv 1: general.name str = DeepSeek-V2-Lite-Chat
  110. llama_model_loader: - kv 2: deepseek2.block_count u32 = 27
  111. llama_model_loader: - kv 3: deepseek2.context_length u32 = 163840
  112. llama_model_loader: - kv 4: deepseek2.embedding_length u32 = 2048
  113. llama_model_loader: - kv 5: deepseek2.feed_forward_length u32 = 10944
  114. llama_model_loader: - kv 6: deepseek2.attention.head_count u32 = 16
  115. llama_model_loader: - kv 7: deepseek2.attention.head_count_kv u32 = 16
  116. llama_model_loader: - kv 8: deepseek2.rope.freq_base f32 = 10000.000000
  117. llama_model_loader: - kv 9: deepseek2.attention.layer_norm_rms_epsilon f32 = 0.000001
  118. llama_model_loader: - kv 10: deepseek2.expert_used_count u32 = 6
  119. llama_model_loader: - kv 11: general.file_type u32 = 18
  120. llama_model_loader: - kv 12: deepseek2.leading_dense_block_count u32 = 1
  121. llama_model_loader: - kv 13: deepseek2.vocab_size u32 = 102400
  122. llama_model_loader: - kv 14: deepseek2.attention.kv_lora_rank u32 = 512
  123. llama_model_loader: - kv 15: deepseek2.attention.key_length u32 = 192
  124. llama_model_loader: - kv 16: deepseek2.attention.value_length u32 = 128
  125. llama_model_loader: - kv 17: deepseek2.expert_feed_forward_length u32 = 1408
  126. ............
  127. llm_load_print_meta: n_lora_q = 0
  128. llm_load_print_meta: n_lora_kv = 512
  129. llm_load_print_meta: n_ff_exp = 1408
  130. llm_load_print_meta: n_expert_shared = 2
  131. llm_load_print_meta: expert_weights_scale = 1.0
  132. llm_load_print_meta: rope_yarn_log_mul = 0.0707
  133. ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
  134. ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
  135. ggml_cuda_init: found 1 CUDA devices:
  136. Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
  137. llm_load_tensors: ggml ctx size = 0.32 MiB
  138. llm_load_tensors: offloading 25 repeating layers to GPU
  139. llm_load_tensors: offloaded 25/28 layers to GPU
  140. llm_load_tensors: CPU buffer size = 3424.92 MiB
  141. llm_load_tensors: CUDA0 buffer size = 12514.25 MiB
  142. ......................................................................................
  143. llama_new_context_with_model: flash_attn requires n_embd_head_k == n_embd_head_v - forcing off
  144. llama_new_context_with_model: V cache quantization requires flash_attn
  145. llama_init_from_gpt_params: error: failed to create context with model 'D:\AI\character\models\mradermacher__DeepSeek-V2-Lite-Chat-i1-GGUF__DeepSeek-V2-Lite-Chat.i1-Q6_K.gguf'
  146.  
  147. ___STDERR___
  148. [2024-08-12 10:52:51.696] [error] Unexpected error initializing server: Error: Unsupported file format. Try a different quantization for this model, or toggle the Experimental backend in the Advanced settings.
  149. at ChildProcess.<anonymous> (C:\Users\progmars\AppData\Local\faraday\app-0.26.2\resources\app.asar\dist\server\main.js:1095:13476)
  150. at ChildProcess.emit (node:events:519:28)
  151. at ChildProcess.emit (node:domain:488:12)
  152. at ChildProcess._handle.onexit (node:internal/child_process:294:12)
  153. [2024-08-12 10:52:51.697] [info] Successfully terminated server px and removed listeners.
  154. [2024-08-12 10:52:51.698] [info] Dispatching action "ERROR"
  155. [2024-08-12 10:52:51.699] [info] Handling side effects after entering state "error"
  156.  
  157.  
Advertisement
Add Comment
Please, Sign In to add comment