Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does global_train_batch_size support gradient accumulation? #672

Open
jinzhuoran opened this issue Jul 21, 2024 · 1 comment
Open

Does global_train_batch_size support gradient accumulation? #672

jinzhuoran opened this issue Jul 21, 2024 · 1 comment
Labels
type/question An issue that's a question

Comments

@jinzhuoran
Copy link

❓ The question

Hello authors, thank you very much for your inspiring work. I now have 8 A100s. If I want to pretrain the model at a certain checkpoint, can I set global_train_batch_size to the original 2048 and then set device_train_microbatch_size to 2? Is this equivalent to using more GPUs?

@jinzhuoran jinzhuoran added the type/question An issue that's a question label Jul 21, 2024
@AkshitaB
Copy link
Contributor

@jinzhuoran Yes, this should be possible. Have you faced an issue when trying this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question An issue that's a question
Projects
None yet
Development

No branches or pull requests

2 participants