Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async + Mitigate host-device memory transfer bottlenecks #16

Open
jonysy opened this issue Feb 15, 2017 · 3 comments
Open

Async + Mitigate host-device memory transfer bottlenecks #16

jonysy opened this issue Feb 15, 2017 · 3 comments

Comments

@jonysy
Copy link
Owner

jonysy commented Feb 15, 2017

An application is only as fast as its slowest part..

Taken from the SO question: mitigate host + device memory tranfer bottlenecks in OpenCL/CUDA

There are a couple things you can try to mitigate the PCIe bottleneck:

  • Asynchronous transfers - permits overlapping computation and bulk transfer
  • Mapped memory - allows a kernel to stream data to/from the GPU during execution

Full answer.

@jonysy
Copy link
Owner Author

jonysy commented Mar 6, 2017

L48:

Async operations: it looks like currently most time is spent waiting for in/out transfers even on mid-range GPU hardware. Async may help a lot. Async can be implemented by making transfer_in/transfer_out to return an object that can be waited on until transfer completes when sync is required, e.g. CUDA -> Host. Tensor::get_memory() could block until transfer completes.

@jonysy jonysy changed the title Mitigate host-device memory transfer bottlenecks Async + Mitigate host-device memory transfer bottlenecks Mar 17, 2017
@jonysy
Copy link
Owner Author

jonysy commented Mar 24, 2017

@drahnr's point on OpenCL implementation via Gitter:

The way it is currently implemented lacks the either a cl finish or waiting for the last event in the chain (forward propagation)

@jonysy
Copy link
Owner Author

jonysy commented Mar 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant