-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openCL branch of caffe reports much higher speeds #12
Comments
My speeds are forward / backward times for an entire minibatch of 16 images; they divide by the minibatch size to try and compute a per-image time. You need to divide my times by 16 to be comparable to theirs, at which point the GTX 1080 is significantly faster than their R290X times. Another subtle issue is that they use a minibatch size of 128, while I used a minibatch size of 16 for a fair comparison across all models. Since AlexNet is a small model and GPUs are massively parallel, I'd expect the per-image time to decrease as the batch size increases, which gives their benchmark a slight advantage. |
Thanks, this clears it up :) On an unrelated note, I'd really love to see benchmarks of SqueezeNet 1.1, which should be much faster than all of these networks. |
Hello, Im getting following error " May i missing something here while running on CPU only. |
on OpenCL-caffe, there are performance matrices claiming speeds of about 4ms per image for training AlexNet with Radeon R290X, Considering this GPU is much weaker than a GTX 1080, these figures seem very weird compared with the 20ms in your tests.
What's your take on this?
The text was updated successfully, but these errors were encountered: