-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running into CUDA out of memory on Colab #3
Comments
Has anyone found a solution to this? Same her. |
I am still unable to merge the model because I am getting the same error as @smilinrobin. I have Colab Pro and am running on a V100. I have implemented the recommendations that are readily available on the Internet, such as setting max_split_size_mb to 250, and none of them have made much difference. Can we have a team effort to improve this situation? If you have llm-trainer working on Colab, can you please share your configuration? |
Yes I am having the same issue as others at the Merge section, it gets to about 50% before running out of memory (even though I am using a V100 GPU on Colab Pro. Hope this gets fixed soon as this is a very useful project |
There is a comment in the code about needing authorization by Meta and passing a token to Hugging Face.
QUESTION 1: Does this mean that the default code assumes that you have HF/Meta access to 2-7b-chat-hf? And if you don't have it, does that explain the memory error? QUESTION 2: Has anyone plugged anything else in here with success? If so, what? |
UPDATE: I went through the sign-up and have verified that I have access to Meta, to NousResearch, to Hugging Face, and am successfully passing the login. I also set max_split_per_mb to 32 and experimented with various values of memory fraction. I updated to Google Colab Pro+. No benefit, since they still only issue one 16GB GPU. At this point I have to conclude that the model can't be run on standard Google Colab Pro accounts, unless I am missing something. Can anyone prove me wrong? |
same conclusion here, I got access to the llama 2 from Meta , and still getting "CUDA out of memory" error. |
Yes the issue continues, the gpt 3.5 llm trainer does work well on Colab though. But would love to have a model that doesn't incur suc h outrageous costs once it is spun into a production environment |
Commenting to add that this problem still exists as of Jan. 31, 2024. I've managed to plug in CodeLlama-7b-Instruct-hf using gpt-3.5-turbo-1106. All works well right up to the point where we "Merge the model and store in Google Drive". At that point I receive the same error as above ("CUDA out of memory"). I'm only running this on a T4, but given every other comment in this thread, the problem seems to run deeper than allocated VRAM. This is a great project, and I would love to bump this thing forward. Any advice is much appreciated. EDIT: I wanted to update my previous comment to say that I was able to move past this issue by increasing my GPU allocation inside Google Colab. With the T4 and V100 runtimes, I was getting limited to 16GB of VRAM. By switching to an A100 with 40GB of VRAM, I was able to complete my training operation, which maxed out at around 20GB. It's worth noting, that the availability of the A100 is very limited right now. It took 4 hours of trying before I was finally able to use that runtime. |
Hello @mshumer . I am trying to run the code on colab and running into CUDA out of memory error as below :
OutOfMemoryError Traceback (most recent call last)
in <cell line: 14>()
12
13 # Reload model in FP16 and merge it with LoRA weights
---> 14 base_model = AutoModelForCausalLM.from_pretrained(
15 model_name,
16 low_cpu_mem_usage=True,
4 frames
/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics)
296 module._parameters[tensor_name] = param_cls(new_value, requires_grad=old_value.requires_grad)
297 elif isinstance(value, torch.Tensor):
--> 298 new_value = value.to(device)
299 else:
300 new_value = torch.tensor(value, device=device)
OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00 MiB (GPU 0; 14.75 GiB total capacity; 13.52 GiB already allocated; 48.81 MiB free; 13.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Its happening at "Merge the model and store in Google Drive" step.
The text was updated successfully, but these errors were encountered: