Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Log start
- llama_model_loader: loaded meta data with 45 key-value pairs and 579 tensors from Qwen3-30B-A3B-Thinking-2507-Q4_K_S.gguf (version GGUF V3 (latest))
- llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
- llama_model_loader: - kv 0: general.architecture str = qwen3moe
- llama_model_loader: - kv 1: general.type str = model
- llama_model_loader: - kv 2: general.name str = Qwen3-30B-A3B-Thinking-2507
- llama_model_loader: - kv 3: general.version str = 2507
- llama_model_loader: - kv 4: general.finetune str = Thinking
- llama_model_loader: - kv 5: general.basename str = Qwen3-30B-A3B-Thinking-2507
- llama_model_loader: - kv 6: general.quantized_by str = Unsloth
- llama_model_loader: - kv 7: general.size_label str = 30B-A3B
- llama_model_loader: - kv 8: general.license str = apache-2.0
- llama_model_loader: - kv 9: general.license.link str = https://huggingface.co/Qwen/Qwen3-30B...
- llama_model_loader: - kv 10: general.repo_url str = https://huggingface.co/unsloth
- llama_model_loader: - kv 11: general.base_model.count u32 = 1
- llama_model_loader: - kv 12: general.base_model.0.name str = Qwen3 30B A3B Thinking 2507
- llama_model_loader: - kv 13: general.base_model.0.version str = 2507
- llama_model_loader: - kv 14: general.base_model.0.organization str = Qwen
- llama_model_loader: - kv 15: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-30B...
- llama_model_loader: - kv 16: general.tags arr[str,2] = ["unsloth", "text-generation"]
- llama_model_loader: - kv 17: qwen3moe.block_count u32 = 48
- llama_model_loader: - kv 18: qwen3moe.context_length u32 = 262144
- llama_model_loader: - kv 19: qwen3moe.embedding_length u32 = 2048
- llama_model_loader: - kv 20: qwen3moe.feed_forward_length u32 = 6144
- llama_model_loader: - kv 21: qwen3moe.attention.head_count u32 = 32
- llama_model_loader: - kv 22: qwen3moe.attention.head_count_kv u32 = 4
- llama_model_loader: - kv 23: qwen3moe.rope.freq_base f32 = 10000000.000000
- llama_model_loader: - kv 24: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0.000001
- llama_model_loader: - kv 25: qwen3moe.expert_used_count u32 = 8
- llama_model_loader: - kv 26: qwen3moe.attention.key_length u32 = 128
- llama_model_loader: - kv 27: qwen3moe.attention.value_length u32 = 128
- llama_model_loader: - kv 28: qwen3moe.expert_count u32 = 128
- llama_model_loader: - kv 29: qwen3moe.expert_feed_forward_length u32 = 768
- llama_model_loader: - kv 30: tokenizer.ggml.model str = gpt2
- llama_model_loader: - kv 31: tokenizer.ggml.pre str = qwen2
- llama_model_loader: - kv 32: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
- llama_model_loader: - kv 33: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
- llama_model_loader: - kv 34: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
- llama_model_loader: - kv 35: tokenizer.ggml.eos_token_id u32 = 151645
- llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 151654
- llama_model_loader: - kv 37: tokenizer.ggml.add_bos_token bool = false
- llama_model_loader: - kv 38: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
- llama_model_loader: - kv 39: general.quantization_version u32 = 2
- llama_model_loader: - kv 40: general.file_type u32 = 14
- llama_model_loader: - kv 41: quantize.imatrix.file str = Qwen3-30B-A3B-Thinking-2507-GGUF/imat...
- llama_model_loader: - kv 42: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-30B-A3B-Thi...
- llama_model_loader: - kv 43: quantize.imatrix.entries_count u32 = 384
- llama_model_loader: - kv 44: quantize.imatrix.chunks_count u32 = 684
- llama_model_loader: - type f32: 241 tensors
- llama_model_loader: - type q4_K: 327 tensors
- llama_model_loader: - type q5_K: 10 tensors
- llama_model_loader: - type q6_K: 1 tensors
- llm_load_vocab: special tokens cache size = 26
- llm_load_vocab: token to piece cache size = 0.9311 MB
- llm_load_print_meta: format = GGUF V3 (latest)
- llm_load_print_meta: arch = qwen3moe
- llm_load_print_meta: vocab type = BPE
- llm_load_print_meta: n_vocab = 151936
- llm_load_print_meta: n_merges = 151387
- llm_load_print_meta: vocab_only = 0
- llm_load_print_meta: n_ctx_train = 262144
- llm_load_print_meta: n_embd = 2048
- llm_load_print_meta: n_layer = 48
- llm_load_print_meta: n_head = 32
- llm_load_print_meta: n_head_kv = 4
- llm_load_print_meta: n_rot = 128
- llm_load_print_meta: n_swa = 0
- llm_load_print_meta: n_swa_pattern = 1
- llm_load_print_meta: n_embd_head_k = 128
- llm_load_print_meta: n_embd_head_v = 128
- llm_load_print_meta: n_gqa = 8
- llm_load_print_meta: n_embd_k_gqa = 512
- llm_load_print_meta: n_embd_v_gqa = 512
- llm_load_print_meta: f_norm_eps = 0.0e+00
- llm_load_print_meta: f_norm_rms_eps = 1.0e-06
- llm_load_print_meta: f_clamp_kqv = 0.0e+00
- llm_load_print_meta: f_max_alibi_bias = 0.0e+00
- llm_load_print_meta: f_logit_scale = 0.0e+00
- llm_load_print_meta: n_ff = 6144
- llm_load_print_meta: n_expert = 128
- llm_load_print_meta: n_expert_used = 8
- llm_load_print_meta: causal attn = 1
- llm_load_print_meta: pooling type = 0
- llm_load_print_meta: rope type = 2
- llm_load_print_meta: rope scaling = linear
- llm_load_print_meta: freq_base_train = 10000000.0
- llm_load_print_meta: freq_scale_train = 1
- llm_load_print_meta: n_ctx_orig_yarn = 262144
- llm_load_print_meta: rope_finetuned = unknown
- llm_load_print_meta: ssm_d_conv = 0
- llm_load_print_meta: ssm_d_inner = 0
- llm_load_print_meta: ssm_d_state = 0
- llm_load_print_meta: ssm_dt_rank = 0
- llm_load_print_meta: model type = ?B
- llm_load_print_meta: model ftype = Q4_K - Small
- llm_load_print_meta: model params = 30.532 B
- llm_load_print_meta: model size = 16.252 GiB (4.572 BPW)
- llm_load_print_meta: repeating layers = 15.851 GiB (4.552 BPW, 29.910 B parameters)
- llm_load_print_meta: general.name = Qwen3-30B-A3B-Thinking-2507
- llm_load_print_meta: BOS token = 11 ','
- llm_load_print_meta: EOS token = 151645 '<|im_end|>'
- llm_load_print_meta: PAD token = 151654 '<|vision_pad|>'
- llm_load_print_meta: LF token = 148848 'ÄĬ'
- llm_load_print_meta: EOT token = 151645 '<|im_end|>'
- llm_load_print_meta: max token length = 256
- llm_load_print_meta: n_ff_exp = 768
- llm_load_tensors: ggml ctx size = 0.25 MiB
- llm_load_tensors: CPU buffer size = 16641.65 MiB
- ....................................................................................................
- llama_new_context_with_model: n_ctx = 32768
- llama_new_context_with_model: n_batch = 2048
- llama_new_context_with_model: n_ubatch = 512
- llama_new_context_with_model: flash_attn = 0
- llama_new_context_with_model: mla_attn = 0
- llama_new_context_with_model: attn_max_b = 0
- llama_new_context_with_model: fused_moe = 0
- llama_new_context_with_model: ser = -1, 0
- llama_new_context_with_model: freq_base = 10000000.0
- llama_new_context_with_model: freq_scale = 1
- llama_kv_cache_init: CPU KV buffer size = 3072.00 MiB
- llama_new_context_with_model: KV self size = 3072.00 MiB, K (f16): 1536.00 MiB, V (f16): 1536.00 MiB
- llama_new_context_with_model: CPU output buffer size = 0.58 MiB
- llama_new_context_with_model: CPU compute buffer size = 2136.01 MiB
- llama_new_context_with_model: graph nodes = 2165
- llama_new_context_with_model: graph splits = 578
- llama_model_loader: loaded meta data with 32 key-value pairs and 310 tensors from Qwen3-0.6B-UD-Q5_K_XL.gguf (version GGUF V3 (latest))
- llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
- llama_model_loader: - kv 0: general.architecture str = qwen3
- llama_model_loader: - kv 1: general.type str = model
- llama_model_loader: - kv 2: general.name str = Qwen3-0.6B
- llama_model_loader: - kv 3: general.basename str = Qwen3-0.6B
- llama_model_loader: - kv 4: general.quantized_by str = Unsloth
- llama_model_loader: - kv 5: general.size_label str = 0.6B
- llama_model_loader: - kv 6: general.repo_url str = https://huggingface.co/unsloth
- llama_model_loader: - kv 7: qwen3.block_count u32 = 28
- llama_model_loader: - kv 8: qwen3.context_length u32 = 40960
- llama_model_loader: - kv 9: qwen3.embedding_length u32 = 1024
- llama_model_loader: - kv 10: qwen3.feed_forward_length u32 = 3072
- llama_model_loader: - kv 11: qwen3.attention.head_count u32 = 16
- llama_model_loader: - kv 12: qwen3.attention.head_count_kv u32 = 8
- llama_model_loader: - kv 13: qwen3.rope.freq_base f32 = 1000000.000000
- llama_model_loader: - kv 14: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
- llama_model_loader: - kv 15: qwen3.attention.key_length u32 = 128
- llama_model_loader: - kv 16: qwen3.attention.value_length u32 = 128
- llama_model_loader: - kv 17: tokenizer.ggml.model str = gpt2
- llama_model_loader: - kv 18: tokenizer.ggml.pre str = qwen2
- llama_model_loader: - kv 19: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
- llama_model_loader: - kv 20: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
- llama_model_loader: - kv 21: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
- llama_model_loader: - kv 22: tokenizer.ggml.eos_token_id u32 = 151645
- llama_model_loader: - kv 23: tokenizer.ggml.padding_token_id u32 = 151654
- llama_model_loader: - kv 24: tokenizer.ggml.add_bos_token bool = false
- llama_model_loader: - kv 25: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
- llama_model_loader: - kv 26: general.quantization_version u32 = 2
- llama_model_loader: - kv 27: general.file_type u32 = 17
- llama_model_loader: - kv 28: quantize.imatrix.file str = Qwen3-0.6B-GGUF/imatrix_unsloth.dat
- llama_model_loader: - kv 29: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-0.6B.txt
- llama_model_loader: - kv 30: quantize.imatrix.entries_count u32 = 196
- llama_model_loader: - kv 31: quantize.imatrix.chunks_count u32 = 688
- llama_model_loader: - type f32: 113 tensors
- llama_model_loader: - type q8_0: 1 tensors
- llama_model_loader: - type q4_K: 20 tensors
- llama_model_loader: - type q5_K: 120 tensors
- llama_model_loader: - type q6_K: 56 tensors
- llm_load_vocab: special tokens cache size = 26
- llm_load_vocab: token to piece cache size = 0.9311 MB
- llm_load_print_meta: format = GGUF V3 (latest)
- llm_load_print_meta: arch = qwen3
- llm_load_print_meta: vocab type = BPE
- llm_load_print_meta: n_vocab = 151936
- llm_load_print_meta: n_merges = 151387
- llm_load_print_meta: vocab_only = 0
- llm_load_print_meta: n_ctx_train = 40960
- llm_load_print_meta: n_embd = 1024
- llm_load_print_meta: n_layer = 28
- llm_load_print_meta: n_head = 16
- llm_load_print_meta: n_head_kv = 8
- llm_load_print_meta: n_rot = 128
- llm_load_print_meta: n_swa = 0
- llm_load_print_meta: n_swa_pattern = 1
- llm_load_print_meta: n_embd_head_k = 128
- llm_load_print_meta: n_embd_head_v = 128
- llm_load_print_meta: n_gqa = 2
- llm_load_print_meta: n_embd_k_gqa = 1024
- llm_load_print_meta: n_embd_v_gqa = 1024
- llm_load_print_meta: f_norm_eps = 0.0e+00
- llm_load_print_meta: f_norm_rms_eps = 1.0e-06
- llm_load_print_meta: f_clamp_kqv = 0.0e+00
- llm_load_print_meta: f_max_alibi_bias = 0.0e+00
- llm_load_print_meta: f_logit_scale = 0.0e+00
- llm_load_print_meta: n_ff = 3072
- llm_load_print_meta: n_expert = 0
- llm_load_print_meta: n_expert_used = 0
- llm_load_print_meta: causal attn = 1
- llm_load_print_meta: pooling type = 0
- llm_load_print_meta: rope type = 2
- llm_load_print_meta: rope scaling = linear
- llm_load_print_meta: freq_base_train = 1000000.0
- llm_load_print_meta: freq_scale_train = 1
- llm_load_print_meta: n_ctx_orig_yarn = 40960
- llm_load_print_meta: rope_finetuned = unknown
- llm_load_print_meta: ssm_d_conv = 0
- llm_load_print_meta: ssm_d_inner = 0
- llm_load_print_meta: ssm_d_state = 0
- llm_load_print_meta: ssm_dt_rank = 0
- llm_load_print_meta: model type = ?B
- llm_load_print_meta: model ftype = Q5_K - Medium
- llm_load_print_meta: model params = 596.050 M
- llm_load_print_meta: model size = 420.026 MiB (5.911 BPW)
- llm_load_print_meta: general.name = Qwen3-0.6B
- llm_load_print_meta: BOS token = 11 ','
- llm_load_print_meta: EOS token = 151645 '<|im_end|>'
- llm_load_print_meta: PAD token = 151654 '<|vision_pad|>'
- llm_load_print_meta: LF token = 148848 'ÄĬ'
- llm_load_print_meta: EOT token = 151645 '<|im_end|>'
- llm_load_print_meta: max token length = 256
- llm_load_tensors: ggml ctx size = 0.14 MiB
- llm_load_tensors: CPU buffer size = 420.03 MiB
- ..........................................................
- llama_new_context_with_model: n_ctx = 32768
- llama_new_context_with_model: n_batch = 2048
- llama_new_context_with_model: n_ubatch = 512
- llama_new_context_with_model: flash_attn = 0
- llama_new_context_with_model: mla_attn = 0
- llama_new_context_with_model: attn_max_b = 0
- llama_new_context_with_model: fused_moe = 0
- llama_new_context_with_model: ser = -1, 0
- llama_new_context_with_model: freq_base = 1000000.0
- llama_new_context_with_model: freq_scale = 1
- llama_kv_cache_init: CPU KV buffer size = 3584.00 MiB
- llama_new_context_with_model: KV self size = 3584.00 MiB, K (f16): 1792.00 MiB, V (f16): 1792.00 MiB
- llama_new_context_with_model: CPU output buffer size = 0.58 MiB
- llama_new_context_with_model: CPU compute buffer size = 1100.01 MiB
- llama_new_context_with_model: graph nodes = 873
- llama_new_context_with_model: graph splits = 394
- /build/source/src/llama.cpp:18273: GGML_ASSERT(n_tokens_all <= cparams.n_batch) failed
- /nix/store/ila9g3xmkicpfgpvyx9db2cwv23ng9ni-llama-cpp-blas-0.0.0/lib/libggml.so(+0x1e4eb)[0x7f58f561e4eb]
- /nix/store/ila9g3xmkicpfgpvyx9db2cwv23ng9ni-llama-cpp-blas-0.0.0/lib/libggml.so(ggml_abort+0x15f)[0x7f58f562010f]
- /nix/store/ila9g3xmkicpfgpvyx9db2cwv23ng9ni-llama-cpp-blas-0.0.0/lib/libllama.so(llama_decode+0x1976)[0x7f58f6185ab6]
- llama-speculative[0x41cc7c]
- /nix/store/0wydilnf1c9vznywsvxqnaing4wraaxp-glibc-2.39-52/lib/libc.so.6(+0x2a14e)[0x7f58f503314e]
- /nix/store/0wydilnf1c9vznywsvxqnaing4wraaxp-glibc-2.39-52/lib/libc.so.6(__libc_start_main+0x89)[0x7f58f5033209]
- llama-speculative[0x421515]
- Aborted (core dumped) llama-speculative -m Qwen3-30B-A3B-Thinking-2507-Q4_K_S.gguf -md Qwen3-0.6B-UD-Q5_K_XL.gguf -c 32768
Advertisement
Add Comment
Please, Sign In to add comment