Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Unsloth: Merging 4bit and LoRA weights to 16bit...
- Unsloth: Will use up to 5.34 out of 12.67 RAM for saving.
- 100%|██████████| 28/28 [01:37<00:00, 3.49s/it]
- Unsloth: Saving tokenizer... Done.
- Unsloth: Saving model... This might take 5 minutes for Llama-7b...
- Unsloth: Saving BramNH/gemma-7b-bnb-4bit-homeassistant-nl/pytorch_model-00001-of-00004.bin...
- Unsloth: Saving BramNH/gemma-7b-bnb-4bit-homeassistant-nl/pytorch_model-00002-of-00004.bin...
- Unsloth: Saving BramNH/gemma-7b-bnb-4bit-homeassistant-nl/pytorch_model-00003-of-00004.bin...
- Unsloth: Saving BramNH/gemma-7b-bnb-4bit-homeassistant-nl/pytorch_model-00004-of-00004.bin...
- Done.
- ==((====))== Unsloth: Conversion from QLoRA to GGUF information
- \\ /| [0] Installing llama.cpp will take 3 minutes.
- O^O/ \_/ \ [1] Converting HF to GUUF 16bits will take 3 minutes.
- \ / [2] Converting GGUF 16bits to q4_k_m will take 20 minutes.
- "-____-" In total, you will have to wait around 26 minutes.
- Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
- Unsloth: [1] Converting model at BramNH/gemma-7b-bnb-4bit-homeassistant-nl into f16 GGUF format.
- The output location will be ./BramNH/gemma-7b-bnb-4bit-homeassistant-nl-unsloth.F16.gguf
- This will take 3 minutes...
- ---------------------------------------------------------------------------
- RuntimeError Traceback (most recent call last)
- <ipython-input-27-20e4264a3018> in <cell line: 11>()
- 9 # Save to q4_k_m GGUF
- 10 if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
- ---> 11 if True: model.push_to_hub_gguf("BramNH/gemma-7b-bnb-4bit-homeassistant-nl", tokenizer, quantization_method = "q4_k_m", token = "")
- 1 frames
- /usr/local/lib/python3.10/dist-packages/unsloth/save.py in save_to_gguf(model_type, model_directory, quantization_method, first_conversion, _run_installer)
- 794 # Check if quantization succeeded!
- 795 if not os.path.isfile(final_location):
- --> 796 raise RuntimeError(
- 797 f"Unsloth: Quantization failed for {final_location}\n"\
- 798 "You might have to compile llama.cpp yourself, then run this again.\n"\
- RuntimeError: Unsloth: Quantization failed for ./BramNH/gemma-7b-bnb-4bit-homeassistant-nl-unsloth.F16.gguf
- You might have to compile llama.cpp yourself, then run this again.
- You do not need to close this Python program. Run the following commands in a new terminal:
- You must run this in the same folder as you're saving your model.
- git clone https://github.com/ggerganov/llama.cpp
- cd llama.cpp && make clean && LLAMA_CUBLAS=1 make all -j
- Once that's done, redo the quantization.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement