-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the gauge warning in svd AD #139
Comments
I'm not yet very sure of the exact origins of this gauge sensitivity. I also suspect that it has to do with the (near-) degeneracy of the singular values. Typically, you work around this by applying Lorentzian broadening to singular value differences in the SVD reverse-rule. Currently, the TensorKit and KrylovKit SVD reverse-rules do not yet support this broadening. For smaller gauge sensitivities I could imagine that they are not super problematic and might be projected out later in the backpropagation. I wanted to investigate this further for a while now since these gauge warnings really pop up in many cases. I have a custom SVD reverse-rule which supports broadening, so perhaps we can implement that as a workaround and see if that improves some things. @lkdvos Is there any ideas on how to incorporate Lorentzian broadening in the TensorKit/KrylovKit adjoints? I would have to think how to do that for KrylovKit's vector-wise formulation and also in the Arnoldi case. |
I think if there are degeneracies in the spectrum, this warning will always fire, since then I think even our math assumptions are wrong and I'm not sure the SVD rrule implementation is expected to work at all. Otherwise, indeed this warning means the cost function depends on the choice of gauge, which should not happen. However, the actual implementation simply projects out this contribution, so this might not necessarily be a problem. It also depends a lot on the configuration of all the algorithms, for example if we are doing linear solvers with random initial guesses, I think we are indeed giving it components that have contributions along these "gauge directions", but we really do want to project them out so that would be completely okay, although it might lead to some stability loss because of finite precision things. I'm a bit confused about the Lorentzian broadening helping here though, as that is typically designed to resolve the problem that changes in the smallest singular values get blown up by the Of course, the entire PEPS stack is riddled with things that seem to be sensitive to numerical stability issues, so it is not unreasonable that one might affect the other. Practically though, this whole svd thing has been something that I should have done for quite I while, and just don't find the time to get to. With some novel changes in TensorKit coming, based around |
I am also quite unsure about this. Certainly, for exact degeneracies the math of the SVD adjoint breaks down but maybe quasi degeneracies are equally problematic. For those we wouldn't get a warning in the CTMRG forward pass (we only check for relatively exact degeneracies) but they still might lead to instabilities in the reverse pass. This might lead to issues further down the PEPS stack. In any case, we shouldn't forget about the Lorentzian broadening because it seems that it really is a necessary thing to have in some cases.
I wouldn't stress it :-) But perhaps in the meanwhile we can try to get a better grasp on where these gauge dependency really come up and if/when they are relevant. I also find it hard to find time for these things currently, but perhaps I can allocate some time next week. |
Many thanks for the helpful discussions. After several tests, I've gained some insights of these warnings. Aligh with above discussions, my findings suggest that nearly degenerate singular values are not the primary cause, and the broadening may not help resolve the issue. I think the problem lies in the construction of PEPS states with specific symmetry constraints for my tests. My trail states are classified by both SU(2) and point group symmetries, and some of them become highly specialized and consequently more fragile during optimization, resulting in potentially large gradients. I found that initializing from either random states or physically reasonable states significantly reduces the frequency of warnings. Although warnings still occasionally appear, their magnitudes typically are close to the specified tolerance value |
Thanks a lot for reporting! |
When running, I sometimes encounter the following warning:
My understanding is that this warning indicates that the gradients (cotangents) produced by the SVD operation are extremely sensitive to the choice of gauge due to complex number. Is it correct to assume that the near-degeneracy (or degeneracy) of singular values is also responsible for this gauge sensitivity? How to work around this problem?
The text was updated successfully, but these errors were encountered: