Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running into CUDA out of memory on Colab #3

Open
smilinrobin opened this issue Aug 15, 2023 · 8 comments
Open

Running into CUDA out of memory on Colab #3

smilinrobin opened this issue Aug 15, 2023 · 8 comments

Comments

@smilinrobin
Copy link

Hello @mshumer . I am trying to run the code on colab and running into CUDA out of memory error as below :
OutOfMemoryError Traceback (most recent call last)
in <cell line: 14>()
12
13 # Reload model in FP16 and merge it with LoRA weights
---> 14 base_model = AutoModelForCausalLM.from_pretrained(
15 model_name,
16 low_cpu_mem_usage=True,

4 frames
/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics)
296 module._parameters[tensor_name] = param_cls(new_value, requires_grad=old_value.requires_grad)
297 elif isinstance(value, torch.Tensor):
--> 298 new_value = value.to(device)
299 else:
300 new_value = torch.tensor(value, device=device)

OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00 MiB (GPU 0; 14.75 GiB total capacity; 13.52 GiB already allocated; 48.81 MiB free; 13.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Its happening at "Merge the model and store in Google Drive" step.

@fredzannarbor
Copy link

Has anyone found a solution to this? Same her.

@fredzannarbor
Copy link

I am still unable to merge the model because I am getting the same error as @smilinrobin. I have Colab Pro and am running on a V100. I have implemented the recommendations that are readily available on the Internet, such as setting max_split_size_mb to 250, and none of them have made much difference.

Can we have a team effort to improve this situation? If you have llm-trainer working on Colab, can you please share your configuration?

@KabaTubare
Copy link

Yes I am having the same issue as others at the Merge section, it gets to about 50% before running out of memory (even though I am using a V100 GPU on Colab Pro. Hope this gets fixed soon as this is a very useful project

@fredzannarbor
Copy link

fredzannarbor commented Aug 19, 2023

There is a comment in the code about needing authorization by Meta and passing a token to Hugging Face.

model_name = "NousResearch/llama-2-7b-chat-hf" # use this if you have access to the official LLaMA 2 model "meta-llama/Llama-2-7b-chat-hf", though keep in mind you'll need to pass a Hugging Face key argument

QUESTION 1: Does this mean that the default code assumes that you have HF/Meta access to 2-7b-chat-hf? And if you don't have it, does that explain the memory error?

QUESTION 2: Has anyone plugged anything else in here with success? If so, what?

@fredzannarbor
Copy link

fredzannarbor commented Aug 20, 2023

UPDATE: I went through the sign-up and have verified that I have access to Meta, to NousResearch, to Hugging Face, and am successfully passing the login.

I also set max_split_per_mb to 32 and experimented with various values of memory fraction.

I updated to Google Colab Pro+. No benefit, since they still only issue one 16GB GPU.

At this point I have to conclude that the model can't be run on standard Google Colab Pro accounts, unless I am missing something. Can anyone prove me wrong?

@arielshaulov
Copy link

UPDATE: I went through the sign-up and have verified that I have access to Meta, to NousResearch, to Hugging Face, and am successfully passing the login.

I also set max_split_per_mb to 32 and experimented with various values of memory fraction.

I updated to Google Colab Pro+. No benefit, since they still only issue one 16GB GPU.

At this point I have to conclude that the model can't be run on standard Google Cloud Pro accounts, unless I am missing something. Can anyone prove me wrong?

same conclusion here, I got access to the llama 2 from Meta , and still getting "CUDA out of memory" error.
if anyone plugged anything else other then the llama 2 it would be very helpful.

@KabaTubare
Copy link

Yes the issue continues, the gpt 3.5 llm trainer does work well on Colab though. But would love to have a model that doesn't incur suc h outrageous costs once it is spun into a production environment

@alex-bluetec
Copy link

alex-bluetec commented Jan 31, 2024

Commenting to add that this problem still exists as of Jan. 31, 2024.

I've managed to plug in CodeLlama-7b-Instruct-hf using gpt-3.5-turbo-1106. All works well right up to the point where we "Merge the model and store in Google Drive". At that point I receive the same error as above ("CUDA out of memory"). I'm only running this on a T4, but given every other comment in this thread, the problem seems to run deeper than allocated VRAM.

This is a great project, and I would love to bump this thing forward. Any advice is much appreciated.

EDIT: I wanted to update my previous comment to say that I was able to move past this issue by increasing my GPU allocation inside Google Colab. With the T4 and V100 runtimes, I was getting limited to 16GB of VRAM. By switching to an A100 with 40GB of VRAM, I was able to complete my training operation, which maxed out at around 20GB.

It's worth noting, that the availability of the A100 is very limited right now. It took 4 hours of trying before I was finally able to use that runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants