Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Utterance Mini-Batch #7

Open
mschonwe opened this issue Oct 30, 2015 · 2 comments
Open

Multi-Utterance Mini-Batch #7

mschonwe opened this issue Oct 30, 2015 · 2 comments

Comments

@mschonwe
Copy link

Do you have a suggestion for supporting processing of mini-batches of multiple utterances at a time?

We have refactored our data to have feature files of fixed frame lengths. We can have dataLoader load in the .bin features as utterances(rows) x frame--features(columns), but it seems we would need to modify ctc_fast.pyx to loop over the utterances, and somehow combine the gradients. The loop over the utts seems easy enough, but not sure how to combine gradients.

Have you already tested multi-utterance mini-batches and decided that they are not appropriate for the task?

@amaas
Copy link
Owner

amaas commented Oct 31, 2015

Updating from multiple utterances would likely help to smooth out the
gradients, but might not be better in terms of overall final solution. The
easiest way to parallelize computing the ctc loss function is to take a
data-parallel map-reduce style approach rather than trying to vectorize the
code in ctc_fast. You could look at parallel python module or similar to
call a separate instance of ctc_fast for each utterance after forward
propagating all utterances through the RNN.

On Fri, Oct 30, 2015 at 3:05 PM, mschonwe [email protected] wrote:

Do you have a suggestion for supporting processing of mini-batches of
multiple utterances at a time?

We have refactored our data to have feature files of fixed frame lengths.
We can have dataLoader load in the .bin features as utterances(rows) x
frame--features(columns), but it seems we would need to modify ctc_fast.pyx
to loop over the utterances, and somehow combine the gradients. The loop
over the utts seems easy enough, but not sure how to combine gradients.

Have you already tested multi-utterance mini-batches and decided that they
are not appropriate for the task?


Reply to this email directly or view it on GitHub
#7.

@mschonwe
Copy link
Author

mschonwe commented Nov 2, 2015

My motivation is based on a lecture from Adam Coates describing the roofline model, and how to increase system throughput by saturating both computational and bandwidth resources.

I was looking to increase the arithmetic intensity (and therefore hopefully throughput) by feeding mini-batches into costAndGrad. The profiling I did indicates that nearly all the processing time is spent in cudamat calls (in costAndGrad), so I wasn't thinking the CTC calculation was substantially rate limiting. (runsnake image below: ctc_loss is the oval at the bottom right). I think we'll be ok looping over each utterance and averaging the gradients for each mini-batch.

runnnet_profile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants