Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss value becomes NaN after 7 epoch in training phase #28

Open
faruk-ahmad opened this issue Aug 14, 2017 · 1 comment
Open

loss value becomes NaN after 7 epoch in training phase #28

faruk-ahmad opened this issue Aug 14, 2017 · 1 comment

Comments

@faruk-ahmad
Copy link

faruk-ahmad commented Aug 14, 2017

we are using the implementation for training our own model. We have preprocessed the dataset with the given scripts. But after 7 epoch of training , the loss becomes 'nan'. What can be the possible cause?
Here is the last part of training-log file:
2017-08-14 17:43:18,105 INFO (main) Epoch: 6, Iteration: 50, Loss: 79.07887268066406
2017-08-14 17:44:07,976 INFO (data_generator) Iters: 6
2017-08-14 17:44:25,415 INFO (utils) Checkpointing model to: ./model/
2017-08-14 17:44:25,805 INFO (data_generator) Iters: 54
2017-08-14 17:44:37,734 INFO (main) Epoch: 7, Iteration: 0, Loss: 86.4606704711914
2017-08-14 17:47:57,033 INFO (main) Epoch: 7, Iteration: 10, Loss: 79.75791931152344
2017-08-14 17:51:46,978 INFO (main) Epoch: 7, Iteration: 20, Loss: 81.86383819580078
2017-08-14 17:56:00,494 INFO (main) Epoch: 7, Iteration: 30, Loss: 83.92363739013672
2017-08-14 18:00:55,395 INFO (main) Epoch: 7, Iteration: 40, Loss: 71.31178283691406
2017-08-14 18:06:30,210 INFO (main) Epoch: 7, Iteration: 50, Loss: 85.3790054321289
2017-08-14 18:08:03,423 INFO (data_generator) Iters: 6
2017-08-14 18:08:27,113 INFO (utils) Checkpointing model to: ./model/
2017-08-14 18:08:27,578 INFO (data_generator) Iters: 54
2017-08-14 18:08:57,878 INFO (main) Epoch: 8, Iteration: 0, Loss: 61.189476013183594
2017-08-14 18:14:00,523 INFO (main) Epoch: 8, Iteration: 10, Loss: 98.21914672851562
2017-08-14 18:18:31,384 INFO (main) Epoch: 8, Iteration: 20, Loss: 84.95768819580078
2017-08-14 18:23:51,395 INFO (main) Epoch: 8, Iteration: 30, Loss: nan

N.B. We are using CPU machine, core-i5 with 32GB memory.

Any help would be appreciated.
Thanks in advance.

@zhangdapeng1207
Copy link

I have the same problem too. Do you know how to solve this?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants