Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error upon attempting to change the loaded model while using HF 4bit #425

Open
Alephrin opened this issue Jul 24, 2023 · 1 comment

Comments

@Alephrin
Copy link

Attemping to load a new model after the first when using HF 4bit results in a CUDA error:

ERROR      | modeling.inference_models.hf_torch:_get_model:402 - Lazyloader failed, falling back to stock HF load. You may run out of RAM here.
ERROR      | modeling.inference_models.hf_torch:_get_model:403 - CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

ERROR      | modeling.inference_models.hf_torch:_get_model:404 - Traceback (most recent call last):
  File "/home/***/AI/KoboldAI/modeling/inference_models/hf_torch.py", line 392, in _get_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/hf_bleeding_edge/__init__.py", line 59, in from_pretrained
    return AM.from_pretrained(path, *args, **kwargs)
  File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
    return model_class.from_pretrained(
  File "/home/***/AI/KoboldAI/modeling/patches.py", line 92, in new_from_pretrained
    return old_from_pretrained(
  File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3260, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/***/AI/KoboldAI/modeling/patches.py", line 302, in _load_state_dict_into_meta_model
    set_module_quantized_tensor_to_device(
  File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/utils/bitsandbytes.py", line 109, in set_module_quantized_tensor_to_device
    new_value = value.to(device)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

If I try launching with CUDA_LAUNCH_BLOCKING=1, it just gets stuck loading the second model (no error) and never finishes.

@henk717
Copy link
Owner

henk717 commented Jul 25, 2023

This is currently a known issue we are still trying to solve, restarting KoboldAI is currently the best method until we figure out a solution for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants