Does global_train_batch_size support gradient accumulation? #672

jinzhuoran · 2024-07-21T10:58:51Z

❓ The question

Hello authors, thank you very much for your inspiring work. I now have 8 A100s. If I want to pretrain the model at a certain checkpoint, can I set global_train_batch_size to the original 2048 and then set device_train_microbatch_size to 2? Is this equivalent to using more GPUs?

AkshitaB · 2024-07-29T16:15:29Z

@jinzhuoran Yes, this should be possible. Have you faced an issue when trying this?

jinzhuoran added the type/question An issue that's a question label Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does global_train_batch_size support gradient accumulation? #672

Does global_train_batch_size support gradient accumulation? #672

jinzhuoran commented Jul 21, 2024

AkshitaB commented Jul 29, 2024

Does global_train_batch_size support gradient accumulation? #672

Does global_train_batch_size support gradient accumulation? #672

Comments

jinzhuoran commented Jul 21, 2024

❓ The question

AkshitaB commented Jul 29, 2024