Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp GGUF breaks [FIXED] #1376

Open
danielhanchen opened this issue Dec 4, 2024 · 5 comments
Open

llama.cpp GGUF breaks [FIXED] #1376

danielhanchen opened this issue Dec 4, 2024 · 5 comments
Labels
fixed Fixed! URGENT BUG Urgent bug

Comments

@danielhanchen
Copy link
Contributor

danielhanchen commented Dec 4, 2024

As of 3rd December 2024 - fixed.

Please update Unsloth via

pip install --upgrade --no-deps --no-cache-dir unsloth
@danielhanchen danielhanchen changed the title llama.cpp GGUF breaks llama.cpp GGUF breaks [FIXED] Dec 4, 2024
@danielhanchen danielhanchen pinned this issue Dec 4, 2024
@danielhanchen danielhanchen added URGENT BUG Urgent bug fixed Fixed! labels Dec 4, 2024
@criogennn
Copy link

RuntimeError                              Traceback (most recent call last)
Cell In[13], [line 12](vscode-notebook-cell:?execution_count=13&line=12)
      [9](vscode-notebook-cell:?execution_count=13&line=9) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")
     [11](vscode-notebook-cell:?execution_count=13&line=11) # Save to q4_k_m GGUF
---> [12](vscode-notebook-cell:?execution_count=13&line=12) if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
     [13](vscode-notebook-cell:?execution_count=13&line=13) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")
     [15](vscode-notebook-cell:?execution_count=13&line=15) # Save to multiple GGUF options - much faster if you want multiple!

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1683, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
   [1681](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1681) python_install = install_python_non_blocking(["gguf", "protobuf"])
   [1682](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1682) git_clone.wait()
-> [1683](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1683) makefile = install_llama_cpp_make_non_blocking()
   [1684](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1684) new_save_directory, old_username = unsloth_save_model(**arguments)
   [1685](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1685) python_install.wait()

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:778, in install_llama_cpp_make_non_blocking()
    [776](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:776) check = os.system("cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=OFF -DLLAMA_CURL=ON")
    [777](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:777) if check != 0:
--> [778](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:778)     raise RuntimeError(f"*** Unsloth: Failed compiling llama.cpp using os.system(...) with error {check}. Please report this ASAP!")
    [779](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:779) pass
    [780](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:780) # f"cmake --build llama.cpp/build --config Release -j{psutil.cpu_count()*2} --clean-first --target {' '.join(LLAMA_CPP_TARGETS)}",

RuntimeError: *** Unsloth: Failed compiling llama.cpp using os.system(...) with error 32512. Please report this ASAP!


Error on

if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

model unsloth/Llama-3.2-3B-Instruct
GPU RTX 3050 8GB

pytorch 2.5.1

@criogennn
Copy link

install cmake and get this one

RuntimeError                              Traceback (most recent call last)
File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1689, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
   [1688](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1688) try:
-> [1689](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1689)     new_save_directory, old_username = unsloth_save_model(**arguments)
   [1690](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:1690)     makefile = None

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    [115](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py:115) with ctx_factory():
--> [116](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py:116)     return func(*args, **kwargs)

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:714, in unsloth_save_model(model, tokenizer, save_directory, save_method, push_to_hub, token, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, use_temp_dir, commit_message, private, create_pr, revision, commit_description, tags, temporary_location, maximum_memory_usage)
    [713](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:713) else:
--> [714](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:714)     internal_model.save_pretrained(**save_pretrained_settings)
    [715](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:715) pass

File ~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2938, in PreTrainedModel.save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, token, save_peft_format, **kwargs)
   [2937](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2937) for name in disjoint_names:
-> [2938](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2938)     state_dict[name] = state_dict[name].clone()
   [2940](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2940) # When not all duplicates have been cleaned, still remove those keys, but put a clear warning.
   [2941](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2941) # If the link between tensors was done at runtime then `from_pretrained` will not get
   [2942](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2942) # the key back leading to random tensor. A proper warning will be shown
   [2943](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2943) # during reload (if applicable), but since the file is not necessarily compatible with
   [2944](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/modeling_utils.py:2944) # the config, better show a proper warning.
...
--> [778](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:778)     raise RuntimeError(f"*** Unsloth: Failed compiling llama.cpp using os.system(...) with error {check}. Please report this ASAP!")
    [779](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:779) pass
    [780](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/neuro/~/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py:780) # f"cmake --build llama.cpp/build --config Release -j{psutil.cpu_count()*2} --clean-first --target {' '.join(LLAMA_CPP_TARGETS)}",

RuntimeError: *** Unsloth: Failed compiling llama.cpp using os.system(...) with error 256. Please report this ASAP!

@criogennn
Copy link

Before updating unsloth, I encountered the error "CUDA driver error: out of memory". After updating yesterday, the error changed to the one described above. This issue arises when I attempt to save a model in GGUF 4-bit format to run it later in Ollama. Is the 8GB memory of my RTX 3050 insufficient for this task? Training the model completed successfully; the problem occurs specifically during the saving process. I would greatly appreciate any advice or assistance.

@shimmyshimmer
Copy link
Collaborator

Before updating unsloth, I encountered the error "CUDA driver error: out of memory". After updating yesterday, the error changed to the one described above. This issue arises when I attempt to save a model in GGUF 4-bit format to run it later in Ollama. Is the 8GB memory of my RTX 3050 insufficient for this task? Training the model completed successfully; the problem occurs specifically during the saving process. I would greatly appreciate any advice or assistance.

We're working on the process easier.

You updated unsloth AND unsloth-zoo correct? https://docs.unsloth.ai/get-started/install-update/updating

@Eleanorkong
Copy link

Hi, How to update through conda?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Fixed! URGENT BUG Urgent bug
Projects
None yet
Development

No branches or pull requests

5 participants
@danielhanchen @Eleanorkong @shimmyshimmer @criogennn and others