Skip to content

CUDA out of memory #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
EddyPianist opened this issue Dec 13, 2024 · 11 comments
Open

CUDA out of memory #161

EddyPianist opened this issue Dec 13, 2024 · 11 comments

Comments

@EddyPianist
Copy link

Hi there,

I'm encountering a CUDA out of memory error while fine-tuning the stable-audio-open model, even with the batch size set to 1. I'm using an NVIDIA A10 GPU with 24 GB of memory, which I believe should be sufficient for this task.

I’m wondering if anyone else has encountered a similar issue and how you managed to resolve it. Thanks!

@DarkAlchy
Copy link

Can't be trained on 24GB as the code is just highly unoptimized to allow it. 2xA6000 will do it with ease. One can, maybe.

@DarkAlchy
Copy link

Let me add that there is this PR to allow it - #162

@lyramakesmusic
Copy link
Contributor

Can't be trained on 24GB as the code is just highly unoptimized to allow it. 2xA6000 will do it with ease. One can, maybe.

one can for sure. vram use is ~27-30gb (dependent on batch size and sample length- bs1 10s was 27.6, bs4 47s was 30 ish), a single a6000 has 48

@DarkAlchy
Copy link

Excuse you, but you just said what I said. 24GB can't train it, and why the PR so we can. Why are you disagreeing with me then with the rest you confirmed what I was saying? Too much eggnog per chance?

@lyramakesmusic
Copy link
Contributor

Why are you disagreeing with me

one a6000

I'm confirming not disagreeing

@DarkAlchy
Copy link

My point was not about 6000 it was about the fact 24GB can't then you rolled in and said one can. It was just the way you phrased it leaving out that "one A6000 can". No biggie, but do realize each BS is a song so the more BS you can give it the better it learns for generality. Hence, 2xA6000 being the sweet spot and anything past that is overkill. Even one H100 at 80GB would be it.

@Frei-2
Copy link

Frei-2 commented May 6, 2025

Hello, if 2 NVIDIA GPUs of 24GB each are used, can the model be finetuned? @EddyPianist @DarkAlchy @lyramakesmusic

@DarkAlchy
Copy link

That would be 48GB in total, and you would need Linux and Accelerate to link them.

@Frei-2
Copy link

Frei-2 commented May 7, 2025

That would be 48GB in total, and you would need Linux and Accelerate to link them.

Thank you for your answering, I'll try 'Accelerate'!

  • I ran the finetuning command in a Linux server with 2 NVIDIA GPUs of 24GB each, but still encountered CUDA Out of Memory error. When running the command, it seems that the model is not distributed in 2 GPUs but a copy on each GPU instead. And the log is almost the same as using one GPU of 24GB.
  • Then, I tried to change the 'strategy' config from auto(the default value) to ddp_find_unused_parameters_true in default.ini to achieve distributed training. However, the same error occurred.

I wonder if the code support using 2 GPUs of 24GB each to finetune the model? And if flash attention will work? Any tip or advice on this problem will be highly appreciated.

@DarkAlchy
Copy link

I never had more than one card, and it was a nightmare. Supposedly things have drastically changed. Wish I could help with specifics.

@Taikakim
Copy link

Taikakim commented May 7, 2025

There was discussion on this in the Discord recently, I was wondering the same. SAT uses data, not model parallelism: https://discord.com/channels/1001555636569509948/1162090696220606525/1355825741589118976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants