Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KL-loss is too large #5

Open
zerooooone opened this issue Dec 5, 2023 · 3 comments
Open

KL-loss is too large #5

zerooooone opened this issue Dec 5, 2023 · 3 comments

Comments

@zerooooone
Copy link

Thanks for your great work!
I noticed the KL-loss in your log is about 4. But when I run this code, the KL-loss was too large and turned out to be as high as 400 at the beginning! It's not normal but I haven't figure it out. Do you know the possible reason? Thanks a lot!

Screenshot from 2023-12-05 15-44-42

Screenshot from 2023-12-05 15-45-20

@MouxiaoHuang
Copy link
Owner

Based on past experience, I suggest investigating the following potential causes:

  1. Initialization: Implement Xavier or Kaiming initialization to ensure proper parameter settings from the start.
  2. Learning Rate: Consider reducing the learning rate. Alternatively, for debugging purposes, you could temporarily set it to zero.
  3. Data Preprocessing: Verify that the data input is correctly preprocessed and suitable for your model.
  4. Numerical Stability: If variance_dul is approaching zero, employ torch.log(variance_dul + 1e-8) to prevent instability.

The exact cause of the issue is unclear due to limited information. However, I recommend utilizing breakpoints to inspect outputs at critical junctures, particularly between lines 201 and 215 in train_dul.py, which may provide further insights.

@zerooooone
Copy link
Author

Based on your suggestions, I've checked the possible problems such as Initialization/Learning Rate/Data Preprocessing and they were all fine.

But the dimension of variance_dul is 512(the output of mu_dul、variance_dul are below), the kl-loss calculated according to the formula can easily exceed 100 right? I wonder why your kl-loss is so small. Maybe the range of my mu_dul and variance_dul is abnormal? Thank you for your patience!

loss_kl = ((variance_dul + mu_dul ** 2 - torch.log(variance_dul) - 1) * 0.5).sum(dim=-1).mean()

image

image2

@MouxiaoHuang
Copy link
Owner

I've looked over the code and the original paper. The big difference in our KL loss values could be because of using sum or mean. Honestly, I'm not too sure which one I used in my final experiments - I forgot it.

The paper doesn't really say if we should sum up or average the loss with the mean and variance vectors. I think using mean might be better. So, you can try changing the line to loss_kl = ((variance_dul + mu_dul ** 2 - torch.log(variance_dul + 1e-8) - 1) * 0.5).mean().

Honestly, both summing and averaging should be fine, as long as the trend in the results makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants