-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S4 Listops have nan loss #138
Comments
I came across the same problem and decreasing learning rate by 10 cannot solve this problem. |
Same problem here. I am using a completely different dataset for audio processing. I extracted the S4ND and S4 layers into a different neural network architecture and I also got NaN after one epoch because the self.log_dt in SSKernelNPLR is nan. This must have happened during backpropagation because it is not updated otherwise (I believe)? |
Sorry for not responding to this. I don't know why this is happening. I haven't revisited these experiments in a long time, but I'm quite confident that they were reproducible in the past. Perhaps something has changed in the libraries or perhaps there are some numerical issues on certain hardware |
same problem, here is a solution to circumvent by changing the SSKernelNPLR class: with torch.no_grad():
# Increase the internal length if needed
while rate * L > self.L:
self.double_length()
dt = torch.exp(self.log_dt) * rate
B = _r2c(self.B)
C = _r2c(self.C)
P = _r2c(self.P)
Q = P.conj() if self.Q is None else _r2c(self.Q)
w = self._w() I don't know whether this will be detrimental to the performance or not, at least, no nan ever reported. |
I have the same problem. @icannotnamemyself, could you comment a bit more on your solution? I am not sure I understand exactly the reasoning or where to make the modifications. |
First of all, thank you for the comprehensive code base for all variants of S4 models.
However, as I try to run the Listops experiments with S4 (HYYT version), the losses for train, test and val all become nan after 1 epoch.
I ran the following script:
python -m train experiment=lra/s4-listops wandb=null
The final accuracy is also way below the reported accuracy (train=0.17).
Is there something that I have done wrong..?
The text was updated successfully, but these errors were encountered: