potential TensorFlow benchmark improvements #91

rryan · 2016-02-29T03:36:06Z

Hi -- thanks for the benchmarks!

I noticed that you do a sparse-to-dense conversion and use softmax_cross_entropy_with_logits. Have you tried eliding the sparse-to-dense conversion and using sparse_softmax_cross_entropy_with_logits? In my experience the sparse version is faster.

Also, reduce_mean does not have a GPU kernel. reduce_sum with a division would prevent a GPU -> CPU -> GPU step when calculating the loss.

soumith · 2016-02-29T04:22:25Z

@rryan i dont think the sparse softmax will improve the overall benchmarks, the time contributed by a 1000-way softmax is very little. I'll def change reduce_mean to reduce_sum + division, if you think it helps.

rryan · 2016-02-29T19:32:35Z

@soumith -- I can't say how much it would affect this benchmark. On a much simpler model -- the CIFAR-10 multi-GPU example -- replacing reduce_mean sped up my batches by 5% or so on a k20.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential TensorFlow benchmark improvements #91

potential TensorFlow benchmark improvements #91

rryan commented Feb 29, 2016

soumith commented Feb 29, 2016

rryan commented Feb 29, 2016

potential TensorFlow benchmark improvements #91

potential TensorFlow benchmark improvements #91

Comments

rryan commented Feb 29, 2016

soumith commented Feb 29, 2016

rryan commented Feb 29, 2016