Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove redundant memory traffic #7100

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

FindHao
Copy link

@FindHao FindHao commented Dec 10, 2020

In make_convolutional_layer function, l.output is initialized to 0 by xcalloc. Then it is copied to l.output_gpu, and conditionally copied to l.x_gpu and l.x_norm_gpu.

We can use cuda_memset to set these three arrays in gpu side rather than doing copy all zeros from cpu side. This optimization will save a lot of memory traffic. In my simple test for dog.jpg, it saves about 20% memory copy traffic.

@FindHao
Copy link
Author

FindHao commented Dec 10, 2020

Between the initialization for l.output and the copies, l.output never changes.

@FindHao
Copy link
Author

FindHao commented Dec 14, 2020

I found more examples with the same issue. And change cudamemset to async version.
For the simple test on data/dog.jpg, it gains 1.02x speedup on a RTX 2080Ti. This speedup is free lunch without harm in accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants