-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KL-loss is too large #5
Comments
Based on past experience, I suggest investigating the following potential causes:
The exact cause of the issue is unclear due to limited information. However, I recommend utilizing breakpoints to inspect outputs at critical junctures, particularly between lines 201 and 215 in |
Based on your suggestions, I've checked the possible problems such as Initialization/Learning Rate/Data Preprocessing and they were all fine. But the dimension of variance_dul is 512(the output of mu_dul、variance_dul are below), the kl-loss calculated according to the formula can easily exceed 100 right? I wonder why your kl-loss is so small. Maybe the range of my mu_dul and variance_dul is abnormal? Thank you for your patience! loss_kl = ((variance_dul + mu_dul ** 2 - torch.log(variance_dul) - 1) * 0.5).sum(dim=-1).mean() |
I've looked over the code and the original paper. The big difference in our KL loss values could be because of using sum or mean. Honestly, I'm not too sure which one I used in my final experiments - I forgot it. The paper doesn't really say if we should sum up or average the loss with the mean and variance vectors. I think using mean might be better. So, you can try changing the line to Honestly, both summing and averaging should be fine, as long as the trend in the results makes sense. |
Thanks for your great work!
I noticed the KL-loss in your log is about 4. But when I run this code, the KL-loss was too large and turned out to be as high as 400 at the beginning! It's not normal but I haven't figure it out. Do you know the possible reason? Thanks a lot!
The text was updated successfully, but these errors were encountered: