Avoid overwriting input tensors during backward pass
In previous versions, both the input/output tensor y
and the gradient tensor dy
were overwritten during the backward pass. This was causing issues with some network topologies, producing wrong gradients.
To fix this issue, a pair of temporary tensors is now created during the backward pass to hold the results of intermediate computations. This change will increase the amount of temporary memory required, meaning that in some cases where GPU memory utilization was already very close to the limit OOM errors might now occur. An alternative, more complex fix is also possible at the expense of additional computational costs. We are evaluating the impact of these changes and will provide updates in a future release.