-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is NLL correct? #5
Comments
maybe related to this, the magnitude of the trainable parameters The preprint reads: I may oversee something or |
@psteinb Tips: l2 normalization is good for inv1x1conv which is discussed in Open AI's Glow openai/glow#40 (comment) |
thanks for the hint. I must confess that I think I understand the math but maybe I am just too new to pytorch and freia. Can you or @ardizzone et al give me direct hint/pointer where in this repo or its dependencies this L2 weight normalisation is performed? I'd appreciate that. |
@ardizzone You should watch the hyper-parameter, weight_decay in Adam Optimizer. See weight-decay's explanation. (But we know l2 regularization is good for kernel parameter, not good for bias... |
Alright, triggered by your reply I looked into this a bit. As mentioned, please excuse my novice expertise with pytorch and feel free to correct me at any point. So if I understand correctly, the L2 regularization term mentioned in equation (6) of the preprint to this repo is assumed to be backed into lines like https://github.com/VLL-HD/conditional_invertible_neural_networks/blob/45dc7250ebfecf10d1a278edafde0fe899f30aa1/colorization_cINN/model.py#L249 But I start to believe that this assumption does not hold and should rather be used with
So this supports my notion expressed above, that the code does not do what the paper promises, i.e. perform L2 regularisation on the weights in the loss term. However, the only thing I can suggest to do in order to mitigate it is adding the L2 regularisation explicitly to the loss term. E.g. here https://github.com/VLL-HD/conditional_invertible_neural_networks/blob/45dc7250ebfecf10d1a278edafde0fe899f30aa1/colorization_cINN/train.py#L67
Or you use plain |
Yeah, you are correct. Adam is the black box for me, too. And also, we should read their all release notes. But I don't recommend it. Or... In OpenAI's Glow or many other Flow-based Model Implementation, they don't use l2 regularization for whole training parameters. |
Hello, I wonder your nll is correct?
In,
https://github.com/VLL-HD/conditional_invertible_neural_networks/blob/master/mnist_minimal_example/eval.py#L44-L45
I think, your z is the latent variable, and jac is log_det_jacobian in normalizing flow.
But I think you forgot the transformation cost about a discrete image to a continuous image.
In RealNVP, he calculates its cost image by image.
https://github.com/tensorflow/models/blob/master/research/real_nvp/real_nvp_multiscale_dataset.py#L1063-L1077 (in Glow, Flow++, we can find this cost function, too)
The text was updated successfully, but these errors were encountered: