Advertisement
remghoost

alpaca model error

Mar 22nd, 2023
154
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 59.68 KB | None | 0 0
  1. Loading alpaca-native-4bit...
  2. Loading model ...
  3. Traceback (most recent call last):
  4. File "/mnt/c/stable-diffusion/text-generation-webui/server.py", line 241, in <module>
  5. shared.model, shared.tokenizer = load_model(shared.model_name)
  6. File "/mnt/c/stable-diffusion/text-generation-webui/modules/models.py", line 101, in load_model
  7. model = load_quantized(model_name)
  8. File "/mnt/c/stable-diffusion/text-generation-webui/modules/GPTQ_loader.py", line 64, in load_quantized
  9. model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
  10. File "/mnt/c/stable-diffusion/text-generation-webui/repositories/GPTQ-for-LLaMa/llama.py", line 245, in load_quant
  11. model.load_state_dict(torch.load(checkpoint))
  12. File "/home/rem/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
  13. raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
  14. RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
  15. Missing key(s) in state_dict: "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.up_proj.zeros", "model.layers.1.self_attn.q_proj.zeros", "model.layers.1.self_attn.k_proj.zeros", "model.layers.1.self_attn.v_proj.zeros", "model.layers.1.self_attn.o_proj.zeros", "model.layers.1.mlp.gate_proj.zeros", "model.layers.1.mlp.down_proj.zeros", "model.layers.1.mlp.up_proj.zeros", "model.layers.2.self_attn.q_proj.zeros", "model.layers.2.self_attn.k_proj.zeros", "model.layers.2.self_attn.v_proj.zeros", "model.layers.2.self_attn.o_proj.zeros", "model.layers.2.mlp.gate_proj.zeros", "model.layers.2.mlp.down_proj.zeros", "model.layers.2.mlp.up_proj.zeros", "model.layers.3.self_attn.q_proj.zeros", "model.layers.3.self_attn.k_proj.zeros", "model.layers.3.self_attn.v_proj.zeros", "model.layers.3.self_attn.o_proj.zeros", "model.layers.3.mlp.gate_proj.zeros", "model.layers.3.mlp.down_proj.zeros", "model.layers.3.mlp.up_proj.zeros", "model.layers.4.self_attn.q_proj.zeros", "model.layers.4.self_attn.k_proj.zeros", "model.layers.4.self_attn.v_proj.zeros", "model.layers.4.self_attn.o_proj.zeros", "model.layers.4.mlp.gate_proj.zeros", "model.layers.4.mlp.down_proj.zeros", "model.layers.4.mlp.up_proj.zeros", "model.layers.5.self_attn.q_proj.zeros", "model.layers.5.self_attn.k_proj.zeros", "model.layers.5.self_attn.v_proj.zeros", "model.layers.5.self_attn.o_proj.zeros", "model.layers.5.mlp.gate_proj.zeros", "model.layers.5.mlp.down_proj.zeros", "model.layers.5.mlp.up_proj.zeros", "model.layers.6.self_attn.q_proj.zeros", "model.layers.6.self_attn.k_proj.zeros", "model.layers.6.self_attn.v_proj.zeros", "model.layers.6.self_attn.o_proj.zeros", "model.layers.6.mlp.gate_proj.zeros", "model.layers.6.mlp.down_proj.zeros", "model.layers.6.mlp.up_proj.zeros", "model.layers.7.self_attn.q_proj.zeros", "model.layers.7.self_attn.k_proj.zeros", "model.layers.7.self_attn.v_proj.zeros", "model.layers.7.self_attn.o_proj.zeros", "model.layers.7.mlp.gate_proj.zeros", "model.layers.7.mlp.down_proj.zeros", "model.layers.7.mlp.up_proj.zeros", "model.layers.8.self_attn.q_proj.zeros", "model.layers.8.self_attn.k_proj.zeros", "model.layers.8.self_attn.v_proj.zeros", "model.layers.8.self_attn.o_proj.zeros", "model.layers.8.mlp.gate_proj.zeros", "model.layers.8.mlp.down_proj.zeros", "model.layers.8.mlp.up_proj.zeros", "model.layers.9.self_attn.q_proj.zeros", "model.layers.9.self_attn.k_proj.zeros", "model.layers.9.self_attn.v_proj.zeros", "model.layers.9.self_attn.o_proj.zeros", "model.layers.9.mlp.gate_proj.zeros", "model.layers.9.mlp.down_proj.zeros", "model.layers.9.mlp.up_proj.zeros", "model.layers.10.self_attn.q_proj.zeros", "model.layers.10.self_attn.k_proj.zeros", "model.layers.10.self_attn.v_proj.zeros", "model.layers.10.self_attn.o_proj.zeros", "model.layers.10.mlp.gate_proj.zeros", "model.layers.10.mlp.down_proj.zeros", "model.layers.10.mlp.up_proj.zeros", "model.layers.11.self_attn.q_proj.zeros", "model.layers.11.self_attn.k_proj.zeros", "model.layers.11.self_attn.v_proj.zeros", "model.layers.11.self_attn.o_proj.zeros", "model.layers.11.mlp.gate_proj.zeros", "model.layers.11.mlp.down_proj.zeros", "model.layers.11.mlp.up_proj.zeros", "model.layers.12.self_attn.q_proj.zeros", "model.layers.12.self_attn.k_proj.zeros", "model.layers.12.self_attn.v_proj.zeros", "model.layers.12.self_attn.o_proj.zeros", "model.layers.12.mlp.gate_proj.zeros", "model.layers.12.mlp.down_proj.zeros", "model.layers.12.mlp.up_proj.zeros", "model.layers.13.self_attn.q_proj.zeros", "model.layers.13.self_attn.k_proj.zeros", "model.layers.13.self_attn.v_proj.zeros", "model.layers.13.self_attn.o_proj.zeros", "model.layers.13.mlp.gate_proj.zeros", "model.layers.13.mlp.down_proj.zeros", "model.layers.13.mlp.up_proj.zeros", "model.layers.14.self_attn.q_proj.zeros", "model.layers.14.self_attn.k_proj.zeros", "model.layers.14.self_attn.v_proj.zeros", "model.layers.14.self_attn.o_proj.zeros", "model.layers.14.mlp.gate_proj.zeros", "model.layers.14.mlp.down_proj.zeros", "model.layers.14.mlp.up_proj.zeros", "model.layers.15.self_attn.q_proj.zeros", "model.layers.15.self_attn.k_proj.zeros", "model.layers.15.self_attn.v_proj.zeros", "model.layers.15.self_attn.o_proj.zeros", "model.layers.15.mlp.gate_proj.zeros", "model.layers.15.mlp.down_proj.zeros", "model.layers.15.mlp.up_proj.zeros", "model.layers.16.self_attn.q_proj.zeros", "model.layers.16.self_attn.k_proj.zeros", "model.layers.16.self_attn.v_proj.zeros", "model.layers.16.self_attn.o_proj.zeros", "model.layers.16.mlp.gate_proj.zeros", "model.layers.16.mlp.down_proj.zeros", "model.layers.16.mlp.up_proj.zeros", "model.layers.17.self_attn.q_proj.zeros", "model.layers.17.self_attn.k_proj.zeros", "model.layers.17.self_attn.v_proj.zeros", "model.layers.17.self_attn.o_proj.zeros", "model.layers.17.mlp.gate_proj.zeros", "model.layers.17.mlp.down_proj.zeros", "model.layers.17.mlp.up_proj.zeros", "model.layers.18.self_attn.q_proj.zeros", "model.layers.18.self_attn.k_proj.zeros", "model.layers.18.self_attn.v_proj.zeros", "model.layers.18.self_attn.o_proj.zeros", "model.layers.18.mlp.gate_proj.zeros", "model.layers.18.mlp.down_proj.zeros", "model.layers.18.mlp.up_proj.zeros", "model.layers.19.self_attn.q_proj.zeros", "model.layers.19.self_attn.k_proj.zeros", "model.layers.19.self_attn.v_proj.zeros", "model.layers.19.self_attn.o_proj.zeros", "model.layers.19.mlp.gate_proj.zeros", "model.layers.19.mlp.down_proj.zeros", "model.layers.19.mlp.up_proj.zeros", "model.layers.20.self_attn.q_proj.zeros", "model.layers.20.self_attn.k_proj.zeros", "model.layers.20.self_attn.v_proj.zeros", "model.layers.20.self_attn.o_proj.zeros", "model.layers.20.mlp.gate_proj.zeros", "model.layers.20.mlp.down_proj.zeros", "model.layers.20.mlp.up_proj.zeros", "model.layers.21.self_attn.q_proj.zeros", "model.layers.21.self_attn.k_proj.zeros", "model.layers.21.self_attn.v_proj.zeros", "model.layers.21.self_attn.o_proj.zeros", "model.layers.21.mlp.gate_proj.zeros", "model.layers.21.mlp.down_proj.zeros", "model.layers.21.mlp.up_proj.zeros", "model.layers.22.self_attn.q_proj.zeros", "model.layers.22.self_attn.k_proj.zeros", "model.layers.22.self_attn.v_proj.zeros", "model.layers.22.self_attn.o_proj.zeros", "model.layers.22.mlp.gate_proj.zeros", "model.layers.22.mlp.down_proj.zeros", "model.layers.22.mlp.up_proj.zeros", "model.layers.23.self_attn.q_proj.zeros", "model.layers.23.self_attn.k_proj.zeros", "model.layers.23.self_attn.v_proj.zeros", "model.layers.23.self_attn.o_proj.zeros", "model.layers.23.mlp.gate_proj.zeros", "model.layers.23.mlp.down_proj.zeros", "model.layers.23.mlp.up_proj.zeros", "model.layers.24.self_attn.q_proj.zeros", "model.layers.24.self_attn.k_proj.zeros", "model.layers.24.self_attn.v_proj.zeros", "model.layers.24.self_attn.o_proj.zeros", "model.layers.24.mlp.gate_proj.zeros", "model.layers.24.mlp.down_proj.zeros", "model.layers.24.mlp.up_proj.zeros", "model.layers.25.self_attn.q_proj.zeros", "model.layers.25.self_attn.k_proj.zeros", "model.layers.25.self_attn.v_proj.zeros", "model.layers.25.self_attn.o_proj.zeros", "model.layers.25.mlp.gate_proj.zeros", "model.layers.25.mlp.down_proj.zeros", "model.layers.25.mlp.up_proj.zeros", "model.layers.26.self_attn.q_proj.zeros", "model.layers.26.self_attn.k_proj.zeros", "model.layers.26.self_attn.v_proj.zeros", "model.layers.26.self_attn.o_proj.zeros", "model.layers.26.mlp.gate_proj.zeros", "model.layers.26.mlp.down_proj.zeros", "model.layers.26.mlp.up_proj.zeros", "model.layers.27.self_attn.q_proj.zeros", "model.layers.27.self_attn.k_proj.zeros", "model.layers.27.self_attn.v_proj.zeros", "model.layers.27.self_attn.o_proj.zeros", "model.layers.27.mlp.gate_proj.zeros", "model.layers.27.mlp.down_proj.zeros", "model.layers.27.mlp.up_proj.zeros", "model.layers.28.self_attn.q_proj.zeros", "model.layers.28.self_attn.k_proj.zeros", "model.layers.28.self_attn.v_proj.zeros", "model.layers.28.self_attn.o_proj.zeros", "model.layers.28.mlp.gate_proj.zeros", "model.layers.28.mlp.down_proj.zeros", "model.layers.28.mlp.up_proj.zeros", "model.layers.29.self_attn.q_proj.zeros", "model.layers.29.self_attn.k_proj.zeros", "model.layers.29.self_attn.v_proj.zeros", "model.layers.29.self_attn.o_proj.zeros", "model.layers.29.mlp.gate_proj.zeros", "model.layers.29.mlp.down_proj.zeros", "model.layers.29.mlp.up_proj.zeros", "model.layers.30.self_attn.q_proj.zeros", "model.layers.30.self_attn.k_proj.zeros", "model.layers.30.self_attn.v_proj.zeros", "model.layers.30.self_attn.o_proj.zeros", "model.layers.30.mlp.gate_proj.zeros", "model.layers.30.mlp.down_proj.zeros", "model.layers.30.mlp.up_proj.zeros", "model.layers.31.self_attn.q_proj.zeros", "model.layers.31.self_attn.k_proj.zeros", "model.layers.31.self_attn.v_proj.zeros", "model.layers.31.self_attn.o_proj.zeros", "model.layers.31.mlp.gate_proj.zeros", "model.layers.31.mlp.down_proj.zeros", "model.layers.31.mlp.up_proj.zeros".
  16. Unexpected key(s) in state_dict: "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", "model.layers.1.mlp.gate_proj.qzeros", "model.layers.1.mlp.down_proj.qzeros", "model.layers.1.mlp.up_proj.qzeros", "model.layers.2.self_attn.q_proj.qzeros", "model.layers.2.self_attn.k_proj.qzeros", "model.layers.2.self_attn.v_proj.qzeros", "model.layers.2.self_attn.o_proj.qzeros", "model.layers.2.mlp.gate_proj.qzeros", "model.layers.2.mlp.down_proj.qzeros", "model.layers.2.mlp.up_proj.qzeros", "model.layers.3.self_attn.q_proj.qzeros", "model.layers.3.self_attn.k_proj.qzeros", "model.layers.3.self_attn.v_proj.qzeros", "model.layers.3.self_attn.o_proj.qzeros", "model.layers.3.mlp.gate_proj.qzeros", "model.layers.3.mlp.down_proj.qzeros", "model.layers.3.mlp.up_proj.qzeros", "model.layers.4.self_attn.q_proj.qzeros", "model.layers.4.self_attn.k_proj.qzeros", "model.layers.4.self_attn.v_proj.qzeros", "model.layers.4.self_attn.o_proj.qzeros", "model.layers.4.mlp.gate_proj.qzeros", "model.layers.4.mlp.down_proj.qzeros", "model.layers.4.mlp.up_proj.qzeros", "model.layers.5.self_attn.q_proj.qzeros", "model.layers.5.self_attn.k_proj.qzeros", "model.layers.5.self_attn.v_proj.qzeros", "model.layers.5.self_attn.o_proj.qzeros", "model.layers.5.mlp.gate_proj.qzeros", "model.layers.5.mlp.down_proj.qzeros", "model.layers.5.mlp.up_proj.qzeros", "model.layers.6.self_attn.q_proj.qzeros", "model.layers.6.self_attn.k_proj.qzeros", "model.layers.6.self_attn.v_proj.qzeros", "model.layers.6.self_attn.o_proj.qzeros", "model.layers.6.mlp.gate_proj.qzeros", "model.layers.6.mlp.down_proj.qzeros", "model.layers.6.mlp.up_proj.qzeros", "model.layers.7.self_attn.q_proj.qzeros", "model.layers.7.self_attn.k_proj.qzeros", "model.layers.7.self_attn.v_proj.qzeros", "model.layers.7.self_attn.o_proj.qzeros", "model.layers.7.mlp.gate_proj.qzeros", "model.layers.7.mlp.down_proj.qzeros", "model.layers.7.mlp.up_proj.qzeros", "model.layers.8.self_attn.q_proj.qzeros", "model.layers.8.self_attn.k_proj.qzeros", "model.layers.8.self_attn.v_proj.qzeros", "model.layers.8.self_attn.o_proj.qzeros", "model.layers.8.mlp.gate_proj.qzeros", "model.layers.8.mlp.down_proj.qzeros", "model.layers.8.mlp.up_proj.qzeros", "model.layers.9.self_attn.q_proj.qzeros", "model.layers.9.self_attn.k_proj.qzeros", "model.layers.9.self_attn.v_proj.qzeros", "model.layers.9.self_attn.o_proj.qzeros", "model.layers.9.mlp.gate_proj.qzeros", "model.layers.9.mlp.down_proj.qzeros", "model.layers.9.mlp.up_proj.qzeros", "model.layers.10.self_attn.q_proj.qzeros", "model.layers.10.self_attn.k_proj.qzeros", "model.layers.10.self_attn.v_proj.qzeros", "model.layers.10.self_attn.o_proj.qzeros", "model.layers.10.mlp.gate_proj.qzeros", "model.layers.10.mlp.down_proj.qzeros", "model.layers.10.mlp.up_proj.qzeros", "model.layers.11.self_attn.q_proj.qzeros", "model.layers.11.self_attn.k_proj.qzeros", "model.layers.11.self_attn.v_proj.qzeros", "model.layers.11.self_attn.o_proj.qzeros", "model.layers.11.mlp.gate_proj.qzeros", "model.layers.11.mlp.down_proj.qzeros", "model.layers.11.mlp.up_proj.qzeros", "model.layers.12.self_attn.q_proj.qzeros", "model.layers.12.self_attn.k_proj.qzeros", "model.layers.12.self_attn.v_proj.qzeros", "model.layers.12.self_attn.o_proj.qzeros", "model.layers.12.mlp.gate_proj.qzeros", "model.layers.12.mlp.down_proj.qzeros", "model.layers.12.mlp.up_proj.qzeros", "model.layers.13.self_attn.q_proj.qzeros", "model.layers.13.self_attn.k_proj.qzeros", "model.layers.13.self_attn.v_proj.qzeros", "model.layers.13.self_attn.o_proj.qzeros", "model.layers.13.mlp.gate_proj.qzeros", "model.layers.13.mlp.down_proj.qzeros", "model.layers.13.mlp.up_proj.qzeros", "model.layers.14.self_attn.q_proj.qzeros", "model.layers.14.self_attn.k_proj.qzeros", "model.layers.14.self_attn.v_proj.qzeros", "model.layers.14.self_attn.o_proj.qzeros", "model.layers.14.mlp.gate_proj.qzeros", "model.layers.14.mlp.down_proj.qzeros", "model.layers.14.mlp.up_proj.qzeros", "model.layers.15.self_attn.q_proj.qzeros", "model.layers.15.self_attn.k_proj.qzeros", "model.layers.15.self_attn.v_proj.qzeros", "model.layers.15.self_attn.o_proj.qzeros", "model.layers.15.mlp.gate_proj.qzeros", "model.layers.15.mlp.down_proj.qzeros", "model.layers.15.mlp.up_proj.qzeros", "model.layers.16.self_attn.q_proj.qzeros", "model.layers.16.self_attn.k_proj.qzeros", "model.layers.16.self_attn.v_proj.qzeros", "model.layers.16.self_attn.o_proj.qzeros", "model.layers.16.mlp.gate_proj.qzeros", "model.layers.16.mlp.down_proj.qzeros", "model.layers.16.mlp.up_proj.qzeros", "model.layers.17.self_attn.q_proj.qzeros", "model.layers.17.self_attn.k_proj.qzeros", "model.layers.17.self_attn.v_proj.qzeros", "model.layers.17.self_attn.o_proj.qzeros", "model.layers.17.mlp.gate_proj.qzeros", "model.layers.17.mlp.down_proj.qzeros", "model.layers.17.mlp.up_proj.qzeros", "model.layers.18.self_attn.q_proj.qzeros", "model.layers.18.self_attn.k_proj.qzeros", "model.layers.18.self_attn.v_proj.qzeros", "model.layers.18.self_attn.o_proj.qzeros", "model.layers.18.mlp.gate_proj.qzeros", "model.layers.18.mlp.down_proj.qzeros", "model.layers.18.mlp.up_proj.qzeros", "model.layers.19.self_attn.q_proj.qzeros", "model.layers.19.self_attn.k_proj.qzeros", "model.layers.19.self_attn.v_proj.qzeros", "model.layers.19.self_attn.o_proj.qzeros", "model.layers.19.mlp.gate_proj.qzeros", "model.layers.19.mlp.down_proj.qzeros", "model.layers.19.mlp.up_proj.qzeros", "model.layers.20.self_attn.q_proj.qzeros", "model.layers.20.self_attn.k_proj.qzeros", "model.layers.20.self_attn.v_proj.qzeros", "model.layers.20.self_attn.o_proj.qzeros", "model.layers.20.mlp.gate_proj.qzeros", "model.layers.20.mlp.down_proj.qzeros", "model.layers.20.mlp.up_proj.qzeros", "model.layers.21.self_attn.q_proj.qzeros", "model.layers.21.self_attn.k_proj.qzeros", "model.layers.21.self_attn.v_proj.qzeros", "model.layers.21.self_attn.o_proj.qzeros", "model.layers.21.mlp.gate_proj.qzeros", "model.layers.21.mlp.down_proj.qzeros", "model.layers.21.mlp.up_proj.qzeros", "model.layers.22.self_attn.q_proj.qzeros", "model.layers.22.self_attn.k_proj.qzeros", "model.layers.22.self_attn.v_proj.qzeros", "model.layers.22.self_attn.o_proj.qzeros", "model.layers.22.mlp.gate_proj.qzeros", "model.layers.22.mlp.down_proj.qzeros", "model.layers.22.mlp.up_proj.qzeros", "model.layers.23.self_attn.q_proj.qzeros", "model.layers.23.self_attn.k_proj.qzeros", "model.layers.23.self_attn.v_proj.qzeros", "model.layers.23.self_attn.o_proj.qzeros", "model.layers.23.mlp.gate_proj.qzeros", "model.layers.23.mlp.down_proj.qzeros", "model.layers.23.mlp.up_proj.qzeros", "model.layers.24.self_attn.q_proj.qzeros", "model.layers.24.self_attn.k_proj.qzeros", "model.layers.24.self_attn.v_proj.qzeros", "model.layers.24.self_attn.o_proj.qzeros", "model.layers.24.mlp.gate_proj.qzeros", "model.layers.24.mlp.down_proj.qzeros", "model.layers.24.mlp.up_proj.qzeros", "model.layers.25.self_attn.q_proj.qzeros", "model.layers.25.self_attn.k_proj.qzeros", "model.layers.25.self_attn.v_proj.qzeros", "model.layers.25.self_attn.o_proj.qzeros", "model.layers.25.mlp.gate_proj.qzeros", "model.layers.25.mlp.down_proj.qzeros", "model.layers.25.mlp.up_proj.qzeros", "model.layers.26.self_attn.q_proj.qzeros", "model.layers.26.self_attn.k_proj.qzeros", "model.layers.26.self_attn.v_proj.qzeros", "model.layers.26.self_attn.o_proj.qzeros", "model.layers.26.mlp.gate_proj.qzeros", "model.layers.26.mlp.down_proj.qzeros", "model.layers.26.mlp.up_proj.qzeros", "model.layers.27.self_attn.q_proj.qzeros", "model.layers.27.self_attn.k_proj.qzeros", "model.layers.27.self_attn.v_proj.qzeros", "model.layers.27.self_attn.o_proj.qzeros", "model.layers.27.mlp.gate_proj.qzeros", "model.layers.27.mlp.down_proj.qzeros", "model.layers.27.mlp.up_proj.qzeros", "model.layers.28.self_attn.q_proj.qzeros", "model.layers.28.self_attn.k_proj.qzeros", "model.layers.28.self_attn.v_proj.qzeros", "model.layers.28.self_attn.o_proj.qzeros", "model.layers.28.mlp.gate_proj.qzeros", "model.layers.28.mlp.down_proj.qzeros", "model.layers.28.mlp.up_proj.qzeros", "model.layers.29.self_attn.q_proj.qzeros", "model.layers.29.self_attn.k_proj.qzeros", "model.layers.29.self_attn.v_proj.qzeros", "model.layers.29.self_attn.o_proj.qzeros", "model.layers.29.mlp.gate_proj.qzeros", "model.layers.29.mlp.down_proj.qzeros", "model.layers.29.mlp.up_proj.qzeros", "model.layers.30.self_attn.q_proj.qzeros", "model.layers.30.self_attn.k_proj.qzeros", "model.layers.30.self_attn.v_proj.qzeros", "model.layers.30.self_attn.o_proj.qzeros", "model.layers.30.mlp.gate_proj.qzeros", "model.layers.30.mlp.down_proj.qzeros", "model.layers.30.mlp.up_proj.qzeros", "model.layers.31.self_attn.q_proj.qzeros", "model.layers.31.self_attn.k_proj.qzeros", "model.layers.31.self_attn.v_proj.qzeros", "model.layers.31.self_attn.o_proj.qzeros", "model.layers.31.mlp.gate_proj.qzeros", "model.layers.31.mlp.down_proj.qzeros", "model.layers.31.mlp.up_proj.qzeros".
  17. size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  18. size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  19. size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  20. size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  21. size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  22. size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  23. size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  24. size mismatch for model.layers.1.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  25. size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  26. size mismatch for model.layers.1.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  27. size mismatch for model.layers.1.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  28. size mismatch for model.layers.1.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  29. size mismatch for model.layers.1.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  30. size mismatch for model.layers.1.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  31. size mismatch for model.layers.2.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  32. size mismatch for model.layers.2.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  33. size mismatch for model.layers.2.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  34. size mismatch for model.layers.2.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  35. size mismatch for model.layers.2.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  36. size mismatch for model.layers.2.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  37. size mismatch for model.layers.2.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  38. size mismatch for model.layers.3.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  39. size mismatch for model.layers.3.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  40. size mismatch for model.layers.3.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  41. size mismatch for model.layers.3.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  42. size mismatch for model.layers.3.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  43. size mismatch for model.layers.3.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  44. size mismatch for model.layers.3.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  45. size mismatch for model.layers.4.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  46. size mismatch for model.layers.4.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  47. size mismatch for model.layers.4.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  48. size mismatch for model.layers.4.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  49. size mismatch for model.layers.4.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  50. size mismatch for model.layers.4.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  51. size mismatch for model.layers.4.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  52. size mismatch for model.layers.5.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  53. size mismatch for model.layers.5.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  54. size mismatch for model.layers.5.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  55. size mismatch for model.layers.5.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  56. size mismatch for model.layers.5.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  57. size mismatch for model.layers.5.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  58. size mismatch for model.layers.5.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  59. size mismatch for model.layers.6.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  60. size mismatch for model.layers.6.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  61. size mismatch for model.layers.6.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  62. size mismatch for model.layers.6.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  63. size mismatch for model.layers.6.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  64. size mismatch for model.layers.6.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  65. size mismatch for model.layers.6.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  66. size mismatch for model.layers.7.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  67. size mismatch for model.layers.7.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  68. size mismatch for model.layers.7.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  69. size mismatch for model.layers.7.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  70. size mismatch for model.layers.7.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  71. size mismatch for model.layers.7.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  72. size mismatch for model.layers.7.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  73. size mismatch for model.layers.8.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  74. size mismatch for model.layers.8.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  75. size mismatch for model.layers.8.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  76. size mismatch for model.layers.8.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  77. size mismatch for model.layers.8.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  78. size mismatch for model.layers.8.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  79. size mismatch for model.layers.8.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  80. size mismatch for model.layers.9.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  81. size mismatch for model.layers.9.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  82. size mismatch for model.layers.9.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  83. size mismatch for model.layers.9.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  84. size mismatch for model.layers.9.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  85. size mismatch for model.layers.9.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  86. size mismatch for model.layers.9.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  87. size mismatch for model.layers.10.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  88. size mismatch for model.layers.10.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  89. size mismatch for model.layers.10.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  90. size mismatch for model.layers.10.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  91. size mismatch for model.layers.10.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  92. size mismatch for model.layers.10.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  93. size mismatch for model.layers.10.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  94. size mismatch for model.layers.11.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  95. size mismatch for model.layers.11.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  96. size mismatch for model.layers.11.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  97. size mismatch for model.layers.11.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  98. size mismatch for model.layers.11.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  99. size mismatch for model.layers.11.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  100. size mismatch for model.layers.11.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  101. size mismatch for model.layers.12.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  102. size mismatch for model.layers.12.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  103. size mismatch for model.layers.12.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  104. size mismatch for model.layers.12.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  105. size mismatch for model.layers.12.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  106. size mismatch for model.layers.12.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  107. size mismatch for model.layers.12.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  108. size mismatch for model.layers.13.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  109. size mismatch for model.layers.13.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  110. size mismatch for model.layers.13.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  111. size mismatch for model.layers.13.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  112. size mismatch for model.layers.13.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  113. size mismatch for model.layers.13.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  114. size mismatch for model.layers.13.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  115. size mismatch for model.layers.14.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  116. size mismatch for model.layers.14.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  117. size mismatch for model.layers.14.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  118. size mismatch for model.layers.14.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  119. size mismatch for model.layers.14.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  120. size mismatch for model.layers.14.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  121. size mismatch for model.layers.14.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  122. size mismatch for model.layers.15.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  123. size mismatch for model.layers.15.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  124. size mismatch for model.layers.15.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  125. size mismatch for model.layers.15.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  126. size mismatch for model.layers.15.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  127. size mismatch for model.layers.15.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  128. size mismatch for model.layers.15.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  129. size mismatch for model.layers.16.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  130. size mismatch for model.layers.16.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  131. size mismatch for model.layers.16.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  132. size mismatch for model.layers.16.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  133. size mismatch for model.layers.16.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  134. size mismatch for model.layers.16.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  135. size mismatch for model.layers.16.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  136. size mismatch for model.layers.17.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  137. size mismatch for model.layers.17.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  138. size mismatch for model.layers.17.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  139. size mismatch for model.layers.17.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  140. size mismatch for model.layers.17.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  141. size mismatch for model.layers.17.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  142. size mismatch for model.layers.17.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  143. size mismatch for model.layers.18.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  144. size mismatch for model.layers.18.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  145. size mismatch for model.layers.18.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  146. size mismatch for model.layers.18.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  147. size mismatch for model.layers.18.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  148. size mismatch for model.layers.18.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  149. size mismatch for model.layers.18.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  150. size mismatch for model.layers.19.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  151. size mismatch for model.layers.19.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  152. size mismatch for model.layers.19.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  153. size mismatch for model.layers.19.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  154. size mismatch for model.layers.19.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  155. size mismatch for model.layers.19.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  156. size mismatch for model.layers.19.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  157. size mismatch for model.layers.20.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  158. size mismatch for model.layers.20.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  159. size mismatch for model.layers.20.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  160. size mismatch for model.layers.20.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  161. size mismatch for model.layers.20.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  162. size mismatch for model.layers.20.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  163. size mismatch for model.layers.20.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  164. size mismatch for model.layers.21.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  165. size mismatch for model.layers.21.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  166. size mismatch for model.layers.21.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  167. size mismatch for model.layers.21.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  168. size mismatch for model.layers.21.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  169. size mismatch for model.layers.21.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  170. size mismatch for model.layers.21.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  171. size mismatch for model.layers.22.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  172. size mismatch for model.layers.22.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  173. size mismatch for model.layers.22.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  174. size mismatch for model.layers.22.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  175. size mismatch for model.layers.22.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  176. size mismatch for model.layers.22.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  177. size mismatch for model.layers.22.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  178. size mismatch for model.layers.23.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  179. size mismatch for model.layers.23.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  180. size mismatch for model.layers.23.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  181. size mismatch for model.layers.23.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  182. size mismatch for model.layers.23.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  183. size mismatch for model.layers.23.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  184. size mismatch for model.layers.23.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  185. size mismatch for model.layers.24.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  186. size mismatch for model.layers.24.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  187. size mismatch for model.layers.24.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  188. size mismatch for model.layers.24.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  189. size mismatch for model.layers.24.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  190. size mismatch for model.layers.24.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  191. size mismatch for model.layers.24.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  192. size mismatch for model.layers.25.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  193. size mismatch for model.layers.25.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  194. size mismatch for model.layers.25.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  195. size mismatch for model.layers.25.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  196. size mismatch for model.layers.25.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  197. size mismatch for model.layers.25.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  198. size mismatch for model.layers.25.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  199. size mismatch for model.layers.26.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  200. size mismatch for model.layers.26.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  201. size mismatch for model.layers.26.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  202. size mismatch for model.layers.26.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  203. size mismatch for model.layers.26.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  204. size mismatch for model.layers.26.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  205. size mismatch for model.layers.26.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  206. size mismatch for model.layers.27.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  207. size mismatch for model.layers.27.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  208. size mismatch for model.layers.27.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  209. size mismatch for model.layers.27.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  210. size mismatch for model.layers.27.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  211. size mismatch for model.layers.27.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  212. size mismatch for model.layers.27.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  213. size mismatch for model.layers.28.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  214. size mismatch for model.layers.28.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  215. size mismatch for model.layers.28.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  216. size mismatch for model.layers.28.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  217. size mismatch for model.layers.28.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  218. size mismatch for model.layers.28.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  219. size mismatch for model.layers.28.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  220. size mismatch for model.layers.29.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  221. size mismatch for model.layers.29.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  222. size mismatch for model.layers.29.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  223. size mismatch for model.layers.29.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  224. size mismatch for model.layers.29.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  225. size mismatch for model.layers.29.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  226. size mismatch for model.layers.29.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  227. size mismatch for model.layers.30.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  228. size mismatch for model.layers.30.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  229. size mismatch for model.layers.30.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  230. size mismatch for model.layers.30.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  231. size mismatch for model.layers.30.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  232. size mismatch for model.layers.30.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  233. size mismatch for model.layers.30.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  234. size mismatch for model.layers.31.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  235. size mismatch for model.layers.31.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  236. size mismatch for model.layers.31.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  237. size mismatch for model.layers.31.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  238. size mismatch for model.layers.31.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
  239. size mismatch for model.layers.31.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
  240. size mismatch for model.layers.31.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement