Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the gauge warning in svd AD #139

Open
Confusio opened this issue Feb 22, 2025 · 5 comments
Open

About the gauge warning in svd AD #139

Confusio opened this issue Feb 22, 2025 · 5 comments

Comments

@Confusio
Copy link

When running, I sometimes encounter the following warning:

Warning: `svd` cotangents sensitive to gauge choice: (|Δgauge| = 1.1322835304708984e10)

My understanding is that this warning indicates that the gradients (cotangents) produced by the SVD operation are extremely sensitive to the choice of gauge due to complex number. Is it correct to assume that the near-degeneracy (or degeneracy) of singular values is also responsible for this gauge sensitivity? How to work around this problem?

@pbrehmer
Copy link
Collaborator

I'm not yet very sure of the exact origins of this gauge sensitivity. I also suspect that it has to do with the (near-) degeneracy of the singular values. Typically, you work around this by applying Lorentzian broadening to singular value differences in the SVD reverse-rule. Currently, the TensorKit and KrylovKit SVD reverse-rules do not yet support this broadening. For smaller gauge sensitivities I could imagine that they are not super problematic and might be projected out later in the backpropagation.

I wanted to investigate this further for a while now since these gauge warnings really pop up in many cases. I have a custom SVD reverse-rule which supports broadening, so perhaps we can implement that as a workaround and see if that improves some things.

@lkdvos Is there any ideas on how to incorporate Lorentzian broadening in the TensorKit/KrylovKit adjoints? I would have to think how to do that for KrylovKit's vector-wise formulation and also in the Arnoldi case.

@lkdvos
Copy link
Member

lkdvos commented Feb 23, 2025

I think if there are degeneracies in the spectrum, this warning will always fire, since then I think even our math assumptions are wrong and I'm not sure the SVD rrule implementation is expected to work at all. Otherwise, indeed this warning means the cost function depends on the choice of gauge, which should not happen. However, the actual implementation simply projects out this contribution, so this might not necessarily be a problem.

It also depends a lot on the configuration of all the algorithms, for example if we are doing linear solvers with random initial guesses, I think we are indeed giving it components that have contributions along these "gauge directions", but we really do want to project them out so that would be completely okay, although it might lead to some stability loss because of finite precision things.

I'm a bit confused about the Lorentzian broadening helping here though, as that is typically designed to resolve the problem that changes in the smallest singular values get blown up by the 1 / (si^2 - sj^2) term. This does not actually alter the fact that your cost function depends on the gauge, and you can easily trigger this even with a completely well-behaved SVD: f(x) = (U, _ = tsvd(x); tr(U)) already has this problem, and is just not a derivable cost function since it is completely discontinuous: eps changes in x can cause large changes in the phases, so the derivative is ill-defined.

Of course, the entire PEPS stack is riddled with things that seem to be sensitive to numerical stability issues, so it is not unreasonable that one might affect the other.

Practically though, this whole svd thing has been something that I should have done for quite I while, and just don't find the time to get to. With some novel changes in TensorKit coming, based around MatrixAlgebraKit, it also does not seem likely I will find the time to get to it before that is flushed out.

@pbrehmer
Copy link
Collaborator

I'm a bit confused about the Lorentzian broadening helping here though, as that is typically designed to resolve the problem that changes in the smallest singular values get blown up by the 1 / (si^2 - sj^2) term. This does not actually alter the fact that your cost function depends on the gauge

I am also quite unsure about this. Certainly, for exact degeneracies the math of the SVD adjoint breaks down but maybe quasi degeneracies are equally problematic. For those we wouldn't get a warning in the CTMRG forward pass (we only check for relatively exact degeneracies) but they still might lead to instabilities in the reverse pass. This might lead to issues further down the PEPS stack. In any case, we shouldn't forget about the Lorentzian broadening because it seems that it really is a necessary thing to have in some cases.

Practically though, this whole svd thing has been something that I should have done for quite I while, and just don't find the time to get to. With some novel changes in TensorKit coming, based around MatrixAlgebraKit, it also does not seem likely I will find the time to get to it before that is flushed out.

I wouldn't stress it :-) But perhaps in the meanwhile we can try to get a better grasp on where these gauge dependency really come up and if/when they are relevant. I also find it hard to find time for these things currently, but perhaps I can allocate some time next week.

@Confusio
Copy link
Author

Many thanks for the helpful discussions. After several tests, I've gained some insights of these warnings. Aligh with above discussions, my findings suggest that nearly degenerate singular values are not the primary cause, and the broadening may not help resolve the issue.

I think the problem lies in the construction of PEPS states with specific symmetry constraints for my tests. My trail states are classified by both SU(2) and point group symmetries, and some of them become highly specialized and consequently more fragile during optimization, resulting in potentially large gradients.

I found that initializing from either random states or physically reasonable states significantly reduces the frequency of warnings. Although warnings still occasionally appear, their magnitudes typically are close to the specified tolerance value tol of order 1e-10 or smaller. In a word, this appears to be a physical issue inherent to the symmetry-constrained parameter space.

@pbrehmer
Copy link
Collaborator

Thanks a lot for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants