Async + Mitigate host-device memory transfer bottlenecks #16

jonysy · 2017-02-15T19:13:40Z

An application is only as fast as its slowest part..

Taken from the SO question: mitigate host + device memory tranfer bottlenecks in OpenCL/CUDA

There are a couple things you can try to mitigate the PCIe bottleneck:

Asynchronous transfers - permits overlapping computation and bulk transfer

Mapped memory - allows a kernel to stream data to/from the GPU during execution

Full answer.

jonysy · 2017-03-06T01:23:58Z

L48:

Async operations: it looks like currently most time is spent waiting for in/out transfers even on mid-range GPU hardware. Async may help a lot. Async can be implemented by making transfer_in/transfer_out to return an object that can be waited on until transfer completes when sync is required, e.g. CUDA -> Host. Tensor::get_memory() could block until transfer completes.

jonysy · 2017-03-24T18:06:59Z

@drahnr's point on OpenCL implementation via Gitter:

The way it is currently implemented lacks the either a cl finish or waiting for the last event in the chain (forward propagation)

jonysy · 2017-03-25T19:49:44Z

A look at GPU memory transfer

jonysy added discussion enhancement help wanted labels Feb 15, 2017

jonysy changed the title ~~Mitigate host-device memory transfer bottlenecks~~ Async + Mitigate host-device memory transfer bottlenecks Mar 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async + Mitigate host-device memory transfer bottlenecks #16

Async + Mitigate host-device memory transfer bottlenecks #16

jonysy commented Feb 15, 2017 •

edited

Loading

jonysy commented Mar 6, 2017

jonysy commented Mar 24, 2017

jonysy commented Mar 25, 2017

Async + Mitigate host-device memory transfer bottlenecks #16

Async + Mitigate host-device memory transfer bottlenecks #16

Comments

jonysy commented Feb 15, 2017 • edited Loading

jonysy commented Mar 6, 2017

jonysy commented Mar 24, 2017

jonysy commented Mar 25, 2017

jonysy commented Feb 15, 2017 •

edited

Loading