CUDA out of memory #161

EddyPianist · 2024-12-13T20:43:44Z

Hi there,

I'm encountering a CUDA out of memory error while fine-tuning the stable-audio-open model, even with the batch size set to 1. I'm using an NVIDIA A10 GPU with 24 GB of memory, which I believe should be sufficient for this task.

I’m wondering if anyone else has encountered a similar issue and how you managed to resolve it. Thanks!

DarkAlchy · 2024-12-24T04:39:18Z

Can't be trained on 24GB as the code is just highly unoptimized to allow it. 2xA6000 will do it with ease. One can, maybe.

DarkAlchy · 2024-12-25T05:55:19Z

Let me add that there is this PR to allow it - #162

lyramakesmusic · 2024-12-29T06:44:46Z

Can't be trained on 24GB as the code is just highly unoptimized to allow it. 2xA6000 will do it with ease. One can, maybe.

one can for sure. vram use is ~27-30gb (dependent on batch size and sample length- bs1 10s was 27.6, bs4 47s was 30 ish), a single a6000 has 48

DarkAlchy · 2024-12-30T08:37:26Z

Excuse you, but you just said what I said. 24GB can't train it, and why the PR so we can. Why are you disagreeing with me then with the rest you confirmed what I was saying? Too much eggnog per chance?

lyramakesmusic · 2024-12-30T10:04:11Z

Why are you disagreeing with me

one a6000

I'm confirming not disagreeing

DarkAlchy · 2024-12-31T11:40:09Z

My point was not about 6000 it was about the fact 24GB can't then you rolled in and said one can. It was just the way you phrased it leaving out that "one A6000 can". No biggie, but do realize each BS is a song so the more BS you can give it the better it learns for generality. Hence, 2xA6000 being the sweet spot and anything past that is overkill. Even one H100 at 80GB would be it.

Frei-2 · 2025-05-06T15:26:07Z

Hello, if 2 NVIDIA GPUs of 24GB each are used, can the model be finetuned? @EddyPianist @DarkAlchy @lyramakesmusic

DarkAlchy · 2025-05-06T15:34:54Z

That would be 48GB in total, and you would need Linux and Accelerate to link them.

Frei-2 · 2025-05-07T01:15:10Z

That would be 48GB in total, and you would need Linux and Accelerate to link them.

Thank you for your answering, I'll try 'Accelerate'!

I ran the finetuning command in a Linux server with 2 NVIDIA GPUs of 24GB each, but still encountered CUDA Out of Memory error. When running the command, it seems that the model is not distributed in 2 GPUs but a copy on each GPU instead. And the log is almost the same as using one GPU of 24GB.
Then, I tried to change the 'strategy' config from auto(the default value) to ddp_find_unused_parameters_true in default.ini to achieve distributed training. However, the same error occurred.

I wonder if the code support using 2 GPUs of 24GB each to finetune the model? And if flash attention will work? Any tip or advice on this problem will be highly appreciated.

DarkAlchy · 2025-05-07T02:25:26Z

I never had more than one card, and it was a nightmare. Supposedly things have drastically changed. Wish I could help with specifics.

Taikakim · 2025-05-07T09:29:08Z

There was discussion on this in the Discord recently, I was wondering the same. SAT uses data, not model parallelism: https://discord.com/channels/1001555636569509948/1162090696220606525/1355825741589118976

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA out of memory #161

CUDA out of memory #161

EddyPianist commented Dec 13, 2024

DarkAlchy commented Dec 24, 2024

Uh oh!

DarkAlchy commented Dec 25, 2024

Uh oh!

lyramakesmusic commented Dec 29, 2024

Uh oh!

DarkAlchy commented Dec 30, 2024

Uh oh!

lyramakesmusic commented Dec 30, 2024

Uh oh!

DarkAlchy commented Dec 31, 2024

Uh oh!

Frei-2 commented May 6, 2025

Uh oh!

DarkAlchy commented May 6, 2025

Uh oh!

Frei-2 commented May 7, 2025

Uh oh!

DarkAlchy commented May 7, 2025

Uh oh!

Taikakim commented May 7, 2025

Uh oh!

CUDA out of memory #161

CUDA out of memory #161

Comments

EddyPianist commented Dec 13, 2024

DarkAlchy commented Dec 24, 2024

Uh oh!

DarkAlchy commented Dec 25, 2024

Uh oh!

lyramakesmusic commented Dec 29, 2024

Uh oh!

DarkAlchy commented Dec 30, 2024

Uh oh!

lyramakesmusic commented Dec 30, 2024

Uh oh!

DarkAlchy commented Dec 31, 2024

Uh oh!

Frei-2 commented May 6, 2025

Uh oh!

DarkAlchy commented May 6, 2025

Uh oh!

Frei-2 commented May 7, 2025

Uh oh!

DarkAlchy commented May 7, 2025

Uh oh!

Taikakim commented May 7, 2025

Uh oh!