Advertisement
Guest User

Untitled

a guest
Mar 18th, 2024
65
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.67 KB | None | 0 0
  1. Unsloth: Merging 4bit and LoRA weights to 16bit...
  2. Unsloth: Will use up to 5.34 out of 12.67 RAM for saving.
  3. 100%|██████████| 28/28 [01:37<00:00, 3.49s/it]
  4. Unsloth: Saving tokenizer... Done.
  5. Unsloth: Saving model... This might take 5 minutes for Llama-7b...
  6. Unsloth: Saving BramNH/gemma-7b-bnb-4bit-homeassistant-nl/pytorch_model-00001-of-00004.bin...
  7. Unsloth: Saving BramNH/gemma-7b-bnb-4bit-homeassistant-nl/pytorch_model-00002-of-00004.bin...
  8. Unsloth: Saving BramNH/gemma-7b-bnb-4bit-homeassistant-nl/pytorch_model-00003-of-00004.bin...
  9. Unsloth: Saving BramNH/gemma-7b-bnb-4bit-homeassistant-nl/pytorch_model-00004-of-00004.bin...
  10. Done.
  11. ==((====))== Unsloth: Conversion from QLoRA to GGUF information
  12. \\ /| [0] Installing llama.cpp will take 3 minutes.
  13. O^O/ \_/ \ [1] Converting HF to GUUF 16bits will take 3 minutes.
  14. \ / [2] Converting GGUF 16bits to q4_k_m will take 20 minutes.
  15. "-____-" In total, you will have to wait around 26 minutes.
  16.  
  17. Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
  18. Unsloth: [1] Converting model at BramNH/gemma-7b-bnb-4bit-homeassistant-nl into f16 GGUF format.
  19. The output location will be ./BramNH/gemma-7b-bnb-4bit-homeassistant-nl-unsloth.F16.gguf
  20. This will take 3 minutes...
  21. ---------------------------------------------------------------------------
  22. RuntimeError Traceback (most recent call last)
  23. <ipython-input-27-20e4264a3018> in <cell line: 11>()
  24. 9 # Save to q4_k_m GGUF
  25. 10 if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
  26. ---> 11 if True: model.push_to_hub_gguf("BramNH/gemma-7b-bnb-4bit-homeassistant-nl", tokenizer, quantization_method = "q4_k_m", token = "")
  27.  
  28. 1 frames
  29. /usr/local/lib/python3.10/dist-packages/unsloth/save.py in save_to_gguf(model_type, model_directory, quantization_method, first_conversion, _run_installer)
  30. 794 # Check if quantization succeeded!
  31. 795 if not os.path.isfile(final_location):
  32. --> 796 raise RuntimeError(
  33. 797 f"Unsloth: Quantization failed for {final_location}\n"\
  34. 798 "You might have to compile llama.cpp yourself, then run this again.\n"\
  35.  
  36. RuntimeError: Unsloth: Quantization failed for ./BramNH/gemma-7b-bnb-4bit-homeassistant-nl-unsloth.F16.gguf
  37. You might have to compile llama.cpp yourself, then run this again.
  38. You do not need to close this Python program. Run the following commands in a new terminal:
  39. You must run this in the same folder as you're saving your model.
  40. git clone https://github.com/ggerganov/llama.cpp
  41. cd llama.cpp && make clean && LLAMA_CUBLAS=1 make all -j
  42. Once that's done, redo the quantization.
  43.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement