Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High values of WER on Libri Dataset (test-clean and dev-clean) #80

Open
ismorphism opened this issue Mar 2, 2017 · 5 comments
Open

Comments

@ismorphism
Copy link

ismorphism commented Mar 2, 2017

Hi everyone! I have the following problem: when I firstly try to train pure DeepSpeech net with default parameters (7 layers, 1760 neurons, batch size is 20) I always get
cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-3543/cutorch/lib/THC/generic/THCStorage.cu:66. (My dataset is default Libri dataset of 600 Mb which was described in Data preparation and running wiki chapter of this repository. My GPU is Nvidia 1080 GTX.)
I thought it's ok and maybe I have to decrease batch size and number of layers/neurons. But only satisfied architecture on this moment is 6 layers , 1200 neurons and batch size is 12. Other architectures caused out of memory error every time on 5 or 6 epoch or something. At the same time my best WER value is about 76.5 and doesn't seem to get low. I tried to change batch size a little but my maximum working value is 12 now. Also, I tried to use LSTM architecture with 600 neurons and 6 layers and it showed worse results. I tried to change learning rates, rate annealing, maxnorm and momentum, add some permute Batch but always I get something about 76.5. Does anyoune know what else could I do?
Maybe the answer is to use bigger batch size and deeper architecture but then I have to use more computational power and it doesn't seem good for me..

@mtanana
Copy link

mtanana commented Mar 2, 2017

Does the 1080 have 6GB? I'm not sure if that will be able to fit the full model.

If you look back at my responses to the thread on running out of memory, I found some tweaks to the code that drastically reduced the memory. (But, alas, I haven't had time to commit them...)

In the end, if you don't have much memory, you can't run large batches. (on a 6gb testing card, I don't think I could run more that 3 or 4 in a batch).

As the batch size changes, the ideal learning rate usually does as well (in my experience). You may have to play with that (this is the real hard work of deep learning)

@suhaspillai
Copy link

Hi Boris,
Well you are right, you need to have bigger batch size and more gpus. I think the reason you get out of memory error is because in permute batch after 5 or 6 epochs, you suddenly have a batch, where speech files have many timesteps and you need to store intermediate layers values for each timestep for backpropagation, the only way to make it work is to have less batch size. Other way around is to store gradients locally on your machine (i.e if you want 30 samples per batch and you can only run 10 samples per batch), you will store gradients 3 times for each batch of 10 samples and update the gradient once after 3 batchs.This will require you to make changes in the code.

@mtanana
Copy link

mtanana commented Mar 2, 2017

#71
There's a comment from me near the bottom that helps with memory..

@SeanNaren
Copy link
Owner

SeanNaren commented Mar 13, 2017

Sorry for the late response, the GTX 1080 is a great card but as said above, only has 8gb of VRAM. Reduce the minibatch size if you want to train on this GPU!

Which dataset are you training on specifically?

@SeanNaren SeanNaren reopened this Mar 13, 2017
@mtanana
Copy link

mtanana commented Mar 24, 2017

Don't forget to downsize the minibatch for testing too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants