Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- I currently have Backyard.AI app version 0.26.2 with Experimental backend enabled.
- Max Model Context is set to 8k.
- [2024-08-12 10:52:43.619] [info] Spawning new model process...
- [2024-08-12 10:52:43.620] [info] Entered init()
- [2024-08-12 10:52:43.620] [info] Dispatching action "SPAWN"
- [2024-08-12 10:52:43.620] [info] Handling side effects after entering state "spawning-px"
- [2024-08-12 10:52:45.370] [info] Starting gpu detection for cublas-12.1.0
- [2024-08-12 10:52:45.478] [info] Finished gpu detection for cublas-12.1.0 after 108 ms
- [2024-08-12 10:52:45.478] [info] Using free VRAM if available
- [2024-08-12 10:52:45.479] [info] Fetched GPU and available vRAM {
- cardName: 'NVIDIA GeForce RTX 4060 Ti',
- maxUsableVRamMiB: 15647.671875,
- gpuDeviceInfo: { index: 0, type: 'cublas' }
- }
- [2024-08-12 10:52:45.480] [info] Running auto kv quant detection.
- [2024-08-12 10:52:45.480] [info] Found 23 layers: {
- maxLayers: 28,
- ctxAdjustment: 4,
- kvCacheSize: 540,
- printedVRam: 892.6299999999999,
- scratchBufferSize: 372.06,
- estimatedScratchBufferSize: 372.06,
- vRamBudget: 15447.671875,
- isNotCLBlast: true,
- vRamPerLayer: 500.5699999999999,
- vRamForLayerMiB: 15236.998499999996
- }
- [2024-08-12 10:52:45.481] [info] Found 23/28 layers for {"k":"f16","v":"f16"}
- [2024-08-12 10:52:45.481] [info] Found 24 layers: {
- maxLayers: 28,
- ctxAdjustment: 4,
- kvCacheSize: 540,
- printedVRam: 892.6299999999999,
- scratchBufferSize: 372.06,
- estimatedScratchBufferSize: 372.06,
- vRamBudget: 15447.671875,
- isNotCLBlast: true,
- vRamPerLayer: 500.5699999999999,
- vRamForLayerMiB: 14901.596999999996
- }
- [2024-08-12 10:52:45.481] [info] Found 24/28 layers for {"k":"q8_0","v":"q8_0"}
- [2024-08-12 10:52:45.481] [info] Found 25 layers: {
- maxLayers: 28,
- ctxAdjustment: 4,
- kvCacheSize: 540,
- printedVRam: 892.6299999999999,
- scratchBufferSize: 372.06,
- estimatedScratchBufferSize: 372.06,
- vRamBudget: 15447.671875,
- isNotCLBlast: true,
- vRamPerLayer: 500.5699999999999,
- vRamForLayerMiB: 14946.820499999996
- }
- [2024-08-12 10:52:45.481] [info] Found 25/28 layers for {"k":"q4_0","v":"q4_0"}
- [2024-08-12 10:52:45.481] [info] Spawning model px: { gpuLayers: 25, isAtMaxLayers: false }
- [2024-08-12 10:52:45.481] [info] Spawning llama server process...
- [2024-08-12 10:52:46.038] [info] Parsing GGUF model header took 541 ms
- [2024-08-12 10:52:46.039] [info] Detected model architecture: deepseek2
- [2024-08-12 10:52:46.039] [info] Rope params: {
- ropeFreqBase: 10000,
- ropeFreqScale: 1,
- finetuneContextLength: 163840,
- ctxSize: 8192
- }
- [2024-08-12 10:52:46.039] [info] {
- model: 'mradermacher__DeepSeek-V2-Lite-Chat-i1-GGUF__DeepSeek-V2-Lite-Chat.i1-Q6_K.gguf',
- llamaBin: 'llama-cpp-binaries\\windows\\cublas-12.1.0\\v0.25.28\\noavx\\backyard.exe',
- flags: [
- '--host',
- '127.0.0.1',
- '--port',
- '62240',
- '--model',
- 'D:\\AI\\character\\models\\mradermacher__DeepSeek-V2-Lite-Chat-i1-GGUF__DeepSeek-V2-Lite-Chat.i1-Q6_K.gguf',
- '--ctx-size',
- '8192',
- '--rope-freq-base',
- '10000',
- '--rope-freq-scale',
- '1',
- '--batch-size',
- '512',
- '--log-disable',
- '--flash-attn',
- '--cache-type-k',
- 'q4_0',
- '--cache-type-v',
- 'q4_0',
- '--mlock',
- '--n-gpu-layers',
- '25'
- ]
- }
- [2024-08-12 10:52:46.040] [info] Attempting to start llama process { CUDA_VISIBLE_DEVICES: '0' }
- [2024-08-12 10:52:46.042] [info] Spawned llama process, pid: 40228 GPU Acceleration: 25
- [2024-08-12 10:52:46.042] [info] Dispatching action "SPAWN_DONE"
- [2024-08-12 10:52:46.042] [info] Finished init()
- [2024-08-12 10:52:46.042] [info] Handling side effects after entering state "starting-llama"
- [2024-08-12 10:52:46.042] [info] Starting llama server...
- [2024-08-12 10:52:48.581] [info] Tried to cancel when not streaming
- [2024-08-12 10:52:51.695] [info] Unsupported file format. Try a different quantization for this model, or toggle the Experimental backend in the Advanced settings.
- [2024-08-12 10:52:51.695] [info]
- ___STDERR___
- llama_model_loader: loaded meta data with 49 key-value pairs and 377 tensors from D:\AI\character\models\mradermacher__DeepSeek-V2-Lite-Chat-i1-GGUF__DeepSeek-V2-Lite-Chat.i1-Q6_K.gguf (version GGUF V3 (latest))
- llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
- llama_model_loader: - kv 0: general.architecture str = deepseek2
- llama_model_loader: - kv 1: general.name str = DeepSeek-V2-Lite-Chat
- llama_model_loader: - kv 2: deepseek2.block_count u32 = 27
- llama_model_loader: - kv 3: deepseek2.context_length u32 = 163840
- llama_model_loader: - kv 4: deepseek2.embedding_length u32 = 2048
- llama_model_loader: - kv 5: deepseek2.feed_forward_length u32 = 10944
- llama_model_loader: - kv 6: deepseek2.attention.head_count u32 = 16
- llama_model_loader: - kv 7: deepseek2.attention.head_count_kv u32 = 16
- llama_model_loader: - kv 8: deepseek2.rope.freq_base f32 = 10000.000000
- llama_model_loader: - kv 9: deepseek2.attention.layer_norm_rms_epsilon f32 = 0.000001
- llama_model_loader: - kv 10: deepseek2.expert_used_count u32 = 6
- llama_model_loader: - kv 11: general.file_type u32 = 18
- llama_model_loader: - kv 12: deepseek2.leading_dense_block_count u32 = 1
- llama_model_loader: - kv 13: deepseek2.vocab_size u32 = 102400
- llama_model_loader: - kv 14: deepseek2.attention.kv_lora_rank u32 = 512
- llama_model_loader: - kv 15: deepseek2.attention.key_length u32 = 192
- llama_model_loader: - kv 16: deepseek2.attention.value_length u32 = 128
- llama_model_loader: - kv 17: deepseek2.expert_feed_forward_length u32 = 1408
- ............
- llm_load_print_meta: n_lora_q = 0
- llm_load_print_meta: n_lora_kv = 512
- llm_load_print_meta: n_ff_exp = 1408
- llm_load_print_meta: n_expert_shared = 2
- llm_load_print_meta: expert_weights_scale = 1.0
- llm_load_print_meta: rope_yarn_log_mul = 0.0707
- ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
- ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
- ggml_cuda_init: found 1 CUDA devices:
- Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
- llm_load_tensors: ggml ctx size = 0.32 MiB
- llm_load_tensors: offloading 25 repeating layers to GPU
- llm_load_tensors: offloaded 25/28 layers to GPU
- llm_load_tensors: CPU buffer size = 3424.92 MiB
- llm_load_tensors: CUDA0 buffer size = 12514.25 MiB
- ......................................................................................
- llama_new_context_with_model: flash_attn requires n_embd_head_k == n_embd_head_v - forcing off
- llama_new_context_with_model: V cache quantization requires flash_attn
- llama_init_from_gpt_params: error: failed to create context with model 'D:\AI\character\models\mradermacher__DeepSeek-V2-Lite-Chat-i1-GGUF__DeepSeek-V2-Lite-Chat.i1-Q6_K.gguf'
- ___STDERR___
- [2024-08-12 10:52:51.696] [error] Unexpected error initializing server: Error: Unsupported file format. Try a different quantization for this model, or toggle the Experimental backend in the Advanced settings.
- at ChildProcess.<anonymous> (C:\Users\progmars\AppData\Local\faraday\app-0.26.2\resources\app.asar\dist\server\main.js:1095:13476)
- at ChildProcess.emit (node:events:519:28)
- at ChildProcess.emit (node:domain:488:12)
- at ChildProcess._handle.onexit (node:internal/child_process:294:12)
- [2024-08-12 10:52:51.697] [info] Successfully terminated server px and removed listeners.
- [2024-08-12 10:52:51.698] [info] Dispatching action "ERROR"
- [2024-08-12 10:52:51.699] [info] Handling side effects after entering state "error"
Advertisement
Add Comment
Please, Sign In to add comment