Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient checkpointing #78

Merged
merged 7 commits into from
Dec 17, 2021
Merged

Gradient checkpointing #78

merged 7 commits into from
Dec 17, 2021

Conversation

neverix
Copy link
Contributor

@neverix neverix commented Dec 5, 2021

This patch enables gradient checkpointing for ruDALLE.

It's possible to use up to 3x higher batch sizes in memory-limited environments during training.

Setting the gradient_checkpointing during model.forward makes a checkpoint every gradient_checkpointing layers.
6 is a good starting value.

@neverix
Copy link
Contributor Author

neverix commented Dec 14, 2021

:(

@shonenkov
Copy link
Collaborator

@neverix we have a gradient checkpointing in the main training pipeline, but your version doesn't support deepspeed - so I can't merge your code right now - I beware of lost compatibility of our internal and open-source models :(

I suggest creating a new branch with your version gradient checkpointing, what about it?

@neverix
Copy link
Contributor Author

neverix commented Dec 15, 2021

What changes are needed for compatibility with DeepSpeed? Having this included in the default install is useful for notebooks, if it's not possible we should look for different solutions

@shonenkov shonenkov merged commit a6e01de into ai-forever:master Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants