You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that you do a sparse-to-dense conversion and use softmax_cross_entropy_with_logits. Have you tried eliding the sparse-to-dense conversion and using sparse_softmax_cross_entropy_with_logits? In my experience the sparse version is faster.
Also, reduce_mean does not have a GPU kernel. reduce_sum with a division would prevent a GPU -> CPU -> GPU step when calculating the loss.
The text was updated successfully, but these errors were encountered:
@rryan i dont think the sparse softmax will improve the overall benchmarks, the time contributed by a 1000-way softmax is very little. I'll def change reduce_mean to reduce_sum + division, if you think it helps.
@soumith -- I can't say how much it would affect this benchmark. On a much simpler model -- the CIFAR-10 multi-GPU example -- replacing reduce_mean sped up my batches by 5% or so on a k20.
Hi -- thanks for the benchmarks!
I noticed that you do a sparse-to-dense conversion and use softmax_cross_entropy_with_logits. Have you tried eliding the sparse-to-dense conversion and using sparse_softmax_cross_entropy_with_logits? In my experience the sparse version is faster.
Also, reduce_mean does not have a GPU kernel. reduce_sum with a division would prevent a GPU -> CPU -> GPU step when calculating the loss.
The text was updated successfully, but these errors were encountered: