-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: producing NAN values during training in MaskablePPO #221
Comments
Hi, Could you please let me know if this is my code problem or the package problem? @araffin Thank you |
@araffin Thank you for your response. |
The detailed error is: @araffin
|
@araffin More info if that helps:
|
Might be a duplicate of #81 or #195 Please note that we do not offer tech support, see #81 (comment) |
🐛 Bug
During training, nan values are produced by the algorithm. These nan values are produced in the neural network. I found several ideas in issues that were proposed, I tried all of them but still got the error. The solutions were: changing np.float64 to np.float32, which doesn't work. Using use_expln=True, which MaskablePPO doesn't have. I also changed the model's parameter such as gamma, but still got the same error. Tried to decrease learning rate that again faced the error
To Reproduce
Relevant log output / Error message
System Info
No response
Checklist
The text was updated successfully, but these errors were encountered: