CUDA Memory Issues #227
Replies: 2 comments
-
I've experimented with using ESM-1b for classification by adding an additional output layer and fine tuning all the weights - and I did find that a batch size above 2 or 3 sequences would use up all the memory in Google Colab. My eventual solution was to use a server with more GPU memory, but depending on what you're doing could you freeze some of the layers of the model to reduce the number of trainable parameters? I guess some other alternatives could be to explore model parallel (but you might've tried this already using FairScale) or gradient checkpointing? |
Beta Was this translation helpful? Give feedback.
-
Your model is too big to fit into GPU memory: One solution I usually use is to reduce the batch size, or max token size when working with language models. If the problem still persists, try distributed training with model or data parallelism, it can optimize ur taining time dramatically. I hope I was helpful :) |
Beta Was this translation helpful? Give feedback.
-
I'm running into OOM errors whenever I try to fine-tune esm1 with pytorch data distributed. I'm currently trying to use FairScale to help with the memory issues, but it does not seem to be enough. Does anyone have any other solutions?
Beta Was this translation helpful? Give feedback.
All reactions