You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Async operations: it looks like currently most time is spent waiting for in/out transfers even on mid-range GPU hardware. Async may help a lot. Async can be implemented by making transfer_in/transfer_out to return an object that can be waited on until transfer completes when sync is required, e.g. CUDA -> Host. Tensor::get_memory() could block until transfer completes.
jonysy
changed the title
Mitigate host-device memory transfer bottlenecks
Async + Mitigate host-device memory transfer bottlenecks
Mar 17, 2017
An application is only as fast as its slowest part..
Taken from the SO question: mitigate host + device memory tranfer bottlenecks in OpenCL/CUDA
Full answer.
The text was updated successfully, but these errors were encountered: