Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi gpu training #57

Open
aitalk opened this issue Aug 13, 2020 · 2 comments
Open

multi gpu training #57

aitalk opened this issue Aug 13, 2020 · 2 comments

Comments

@aitalk
Copy link

aitalk commented Aug 13, 2020

Is it possible to train it on multi gpu? multi node? Thanks.

@deepconsc
Copy link

@aitalk not recommended. Multiple size of batches have been tested and reported, till 32 it's good, then it can't catch up with the flow. Original authors reported 16 batch size as perfect.
I've personally tested up to 1024 batch size, it basically takes the same amount of time to converge, but 4 times more resources at least.
Keep it to 16.

@ioannist
Copy link

@aitalk not recommended. Multiple size of batches have been tested and reported, till 32 it's good, then it can't catch up with the flow. Original authors reported 16 batch size as perfect.
I've personally tested up to 1024 batch size, it basically takes the same amount of time to converge, but 4 times more resources at least.
Keep it to 16.

I am a bit confused here. When i set the batch size to 50 i get 1 it/s (9 GB memory used) whereas when I leave it at 16 I get 2.6 it/sec (7 GB memory used) on an Nvidia 1080 Ti. Should I still use batch size 16?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants