Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More improvements to Conv2d speed #98

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

More improvements to Conv2d speed #98

wants to merge 2 commits into from

Conversation

hunse
Copy link
Collaborator

@hunse hunse commented May 9, 2016

Trying computing multiple filters per group to reduce the number of times the image has to be loaded. So far, it hasn't seemed to help that much, though.

Maybe it makes sense that this doesn't help. For global filters (convolution), the limiting factor seems to be FLOPS, not memory access, so reducing image loads wouldn't make a difference.

For local filters, the amount of memory in the filters (nf * ni * nj * nc * si * sj) is much greater than in the image (ni * nj * nc), assuming the filter stride is 1, so reducing image loads won't make a difference. It would only be the case when the stride is about the size of the kernel width that the image data would be on par with the filter data for one workgroup, and so in this case computing multiple kernels per group might help.

This should allow each patch to be used for many filters.

TODO:
- Play around with index order. Should filters be smallest?
- Generalize: take out hardcoded lsize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant