KL-loss is too large #5

zerooooone · 2023-12-05T07:46:27Z

Thanks for your great work!
I noticed the KL-loss in your log is about 4. But when I run this code, the KL-loss was too large and turned out to be as high as 400 at the beginning! It's not normal but I haven't figure it out. Do you know the possible reason? Thanks a lot!

MouxiaoHuang · 2023-12-05T08:50:28Z

Based on past experience, I suggest investigating the following potential causes:

Initialization: Implement Xavier or Kaiming initialization to ensure proper parameter settings from the start.
Learning Rate: Consider reducing the learning rate. Alternatively, for debugging purposes, you could temporarily set it to zero.
Data Preprocessing: Verify that the data input is correctly preprocessed and suitable for your model.
Numerical Stability: If variance_dul is approaching zero, employ torch.log(variance_dul + 1e-8) to prevent instability.

The exact cause of the issue is unclear due to limited information. However, I recommend utilizing breakpoints to inspect outputs at critical junctures, particularly between lines 201 and 215 in train_dul.py, which may provide further insights.

zerooooone · 2023-12-11T04:44:00Z

Based on your suggestions, I've checked the possible problems such as Initialization/Learning Rate/Data Preprocessing and they were all fine.

But the dimension of variance_dul is 512(the output of mu_dul、variance_dul are below), the kl-loss calculated according to the formula can easily exceed 100 right? I wonder why your kl-loss is so small. Maybe the range of my mu_dul and variance_dul is abnormal? Thank you for your patience!

loss_kl = ((variance_dul + mu_dul ** 2 - torch.log(variance_dul) - 1) * 0.5).sum(dim=-1).mean()

MouxiaoHuang · 2023-12-14T07:22:53Z

I've looked over the code and the original paper. The big difference in our KL loss values could be because of using sum or mean. Honestly, I'm not too sure which one I used in my final experiments - I forgot it.

The paper doesn't really say if we should sum up or average the loss with the mean and variance vectors. I think using mean might be better. So, you can try changing the line to loss_kl = ((variance_dul + mu_dul ** 2 - torch.log(variance_dul + 1e-8) - 1) * 0.5).mean().

Honestly, both summing and averaging should be fine, as long as the trend in the results makes sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KL-loss is too large #5

KL-loss is too large #5

zerooooone commented Dec 5, 2023

MouxiaoHuang commented Dec 5, 2023

zerooooone commented Dec 11, 2023

MouxiaoHuang commented Dec 14, 2023

KL-loss is too large #5

KL-loss is too large #5

Comments

zerooooone commented Dec 5, 2023

MouxiaoHuang commented Dec 5, 2023

zerooooone commented Dec 11, 2023

MouxiaoHuang commented Dec 14, 2023