-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRU + Large recurrent_dropout
Bug
#20276
Comments
Hi @nokados - Here as the points you mentioned:
And reducing recurrent_dropout is leading to underfitting so reducing the recurrent_dropout will get learn pattern of input data easily. Here the gist shows all the changes mentioned with different sequence length. Let me know anything more required...!!! |
This seems more like a workaround than a solution to the original problem. Adding an extra layer with a tangent function doesn't address the issue but merely "hides" the enormous outputs from the GRU, restricting them to the range of -1 to 1 with the tangent. However, the problem is that these values should already be in this range after the GRU, as tanh is already built into it. Mathematically, it shouldn't produce values like -2.5e25. The same behavior is expected from Keras as well. |
@nokados ,
The
What happens is that the To avoid this, you have to adapt the |
Keras version: 3.5.0
Backend: TensorFlow 2.17.0
I encountered a strange bug when working with the GRU layer. If you create a simple model with a GRU layer and set
recurrent_dropout=0.5
, very strange behavior occurs:tanh
activation, produces very large values in the range of ±1e25, even though it should be constrained to [-1, 1]. This results in an extremely large loss.I was unable to reproduce this behavior in Colab; there, either the loss becomes
inf
, or it behaves similarly to the longer sequence lengths.3. With sequence length 200: It throws an error:
Key points:
recurrent_dropout=0.5
. It works fine with smallerrecurrent_dropout
values, such as 0.1.Irrelevant factors:
rmsprop
;adam
did not throw errors but resulted inloss = nan
.dropout
does not affect the issue.I have prepared a minimal reproducible example in Colab. Here is the link: https://colab.research.google.com/drive/1msGuYB5E_eg_IIU_YK4cJcWrkEm3o0NL?usp=sharing.
The text was updated successfully, but these errors were encountered: