-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with CG_METHOD=0 [encountered in many different use cases] #3
Comments
Here we start to unify and further develop the recent modifications to the fitting routines. More specifically we start re-introducing the old distance definition: \chi_{ij} = |FG_{ij}-FG_{ij}^Anderson|^q / W, \chi = \sum_{ij} \chi_{ij} i.e. a generalized chi-square, computed element by element, and weighted with various schemes /on the matsubara axis/. We plan to extend this def to include weighting on the matrix structure, too. (more later...) This has been replaced with a global matricial norm (Frobenius) within commit 3ad6af7. We keep the Frobenius distance available through a new input flag: > CG_NORM={frobenius,elemental} ———————————————————————————————————————————————————————————————————————— RATIONALE The reason for bringing back the old elemental distance is rooted in the need for flexibility in defining different weights for different matrix elements. A previous attempt at this can be found in the 'weighted_fit' (unmerged) branch, look here: > https://github.com/QcmPlab/CDMFT-LANC-ED/commits/weighted_fit Hence we define a generic infrastructure to assign weights on a element- by-element basis, as \chi_{ij} = \sum_{iw} |FG(iw)_{ij}-FG(iw)_{ij}^Anderson|^q / Wmats(iw), \chi = \sum_{ij} \chi_{ij} / Wmtrx_{ij} For now two choices are available for Wmtrx: • CG_MATRIX=0, giving equal weights to all components ('flat') • CG_MATRIX=1, normalizing on the total spectral weight ('spectral') > More specifically the spectral option defines: Wmtrx_{ij} = - \sum_{iw} Im[FG(iw)_{ij}] / beta = ∫A_{ij}(iw)diw = W_{diag}δ_{ij} + W_{off-diag}(1-δ_{ij}) where in general we expect W_{off-diag} << W_{diag}, making abundantly clear the rationale behind this weighting choice. ———————————————————————————————————————————————————————————————————————— NOTES (in no particular order) • The actual value for W_{diag} is ≈1d0 for the Weiss field (we sum over all the matsubara frequencies, not only the first Lfit ones). This way we can easily ensure same normalization of the chi values if switching to the 'flat' matricial weights, thus allowing easier debug & testing. • The actual value for W_{diag} is NOT ≈1d0 for the bath hybridization, recall that ∆=(D/2)^2*Gloc, so for D=1 it would be ≈0.25d0, which is what we find in our Nlat=Nspin=Norb=1 test-runs (see below) on the 2d Hubbard model. For now I've hardcoded Wflat=0.25d0 but we should find a way to define it in terms of the hopping (not so trivial since the hopping value is model dependent and the name of the variable is not enforced by the solver, with the possibility of different choices in different drivers). • Speaking of normalization conventions I've actually changed the last line of the Frobenius implementation(s), so to divide also therein by Nlso = Nlat * Nspin * Norb, which corresponds to count(Hmask) in the elemental case. • Actually the Hmask implementation is totally different now, wrt what used to be before the Frobenius update (which totally dropped Hmask). This because we need the FGmatrix structure to be a whole NNN-array, since the Frobenius norm cannot in any (easy) way operate on a logical mask, being it a whole-matrix formula. So we just define the mask and pass it to the sum() fortran intrinsic when computing the final sum over matrix elements: \chi = \sum_{ij} \chi_{ij} / Wmtrx_{ij} • More on Hmask: for now I just defined an internal ed_all_g=.true. flag and imported the current implementation from LIB_DMFT_ED, so all tests have been performed with Hmask=.true. (no mask). This is of course the safest option, thus appropriate for development. We should discuss the actual mask implementation for production, since I deem that to be the true reason for the Frobenius implementation improved fit-quality over the old flat elemental chi-square: as far as I can tell, at least when CG_POW=2, the Frobenius norm has no way to produce different chi2 vals wrt the old implementation, as long as you don't define a mask. > do it really makes sense to build the mask basing on zeros in Hrepl? > why not just exploit hermiticity of ∆ and g0, so a naive uplo mask? > this has been already brought out a few times, e.g. I'm aware of a. 0e5c272b45eda6b7ff652e2473b9ecda09e5ba8b on LIB_DMFT_ED b. cb0af32 on CDMFT-LANC-ED so it might be time to discuss it all together. • There are also many whitespace changes and new comments/printings, in line with https://github.com/QcmPlab/LIB_DMFT_ED/tree/0.5.2 ———————————————————————————————————————————————————————————————————————— TESTING For now all possible input flag combinations have been tested on the 2d Hubbard model driver only (cdn_hm_2dsquare) with Nlat=Nspin=Norb=1, so to allow a cross-check with LIB_DMFT_ED. Everything tested with minimize algorithm (CG_METHOD=1) since I've still not written the gradients for the elemental implementation. I'll point out only a few crucial outcomes: • Frobenius norm and 'flat-weighted' elemental norm give the same fit, for CG_POW=2. I've not tested other powers, we might need to explore. • Frobenius norm and 'spectral-normalized' elemental norm give slightly different fits of the real part of the Weiss field. I could not catch the reason for now (I surely expected exact match with Nlso=1 and same overall normalizations of the chi-square…). It could just be that the ∫A(iw)diw it's not really 1d0 (something like 0.97d0), so we actually increase chi-square values and the provided tolerance changes scale. (but I thought it was a relative tolerance… I might return to it). • MOST IMPORTANTLY: Frobenius norm FAILS TO FIT the Weiss field, if the analytic gradient is used (the hybridization works fine instead). More info reported within issue #3. > As I said, all cross-checks are evaluated with numerical gradient, which is efficient only if using the minimize routine. > I've added an explicit warning in the code, so to alert users if they enter the function. (new lines 649-658 in ED_FIT_CHI2.f90) ———————————————————————————————————————————————————————————————————————— TODO 1. Write the analytical gradients for the elemental norm (ASAP). 2. Solve issue #3 for the Frobenius norm (I might defer it, sorry). 3. Test on true clusters (Nlat>1), where Wmtrx choice is relevant. 4. Test on different models (I'd delegate to relevant people here).
+ add debug printing of \grad{\chi^2} to CG_NORM=frobenius (to compare) ———————————————————————————————————————————————————————————————————————— TESTING [cdn_hm_2dsquare, Nlat=Nspin=Norb=1] • CG_SCHEME = WEISS We observe the very same problem reported in issue #3 for Frobenius distance: the shape of the fitted function it's not that of a Weiss field, but that of a hybridization function (there's a minimum, it goes to zero for iw -> 0). Does this imply that we have a problem within grad_g0and_replica()? > I believe not, cause it matches quite literally the DMFT_ED version. • CG_SCHEME = DELTA Recall that with Frobenius distance we got a correct fit... Now with CG_NORM=elemental we get... again a qualitatively wrong fit (similar situation really: the shape of the fitted function it's not that of a hybridization function, but that of a Weiss field). This is becoming interesting... we call the same grad_delta_replica() but with Frobenius gradient we get the right fit, while elemental grad makes for a qualitatively wrong result (with the implementation being a literal porting of the DMFT_ED one, which works totally fine!). The qualitative change of the function to me hints to a wrong /sign/ in the gradients, like if we are finding a maximum, instead of a min. > this appears to be indeed the case if we look at the printed dchi2 in two runs with everything equal but CG_norm: dchi2(elemental) at first print is exactly -dchi2(frobenius). We have a lead. ———————————————————————————————————————————————————————————————————————— >> TO BE FURTHER INVESTIGATED (todo: update the issue report)
Recap: with CG_SCHEME=delta and CG_GRAD=1 we had > a correct fit with CG_NORM=frobenius > a wrong fit with CG_NORM=elemental > dchi2(elemental) = -dchi2(frobenius) at first call. >> So we try changing sign to dchi2(elemental). What happens: we fix the fitted \Delta function with CG_NORM=elemental. Why this is suspicious: doing so we change sign wrt DMFT_ED code (which works just fine in this test!) • DMFT_ED (grad_chi2_delta_replica, line 363 of ED_FIT_REPLICA.f90) dchi2 = - cg_pow*sum(df,1) / Ldelta / totNso • CDMFT_ED (grad_chi2_delta_replica_elemental, lines 650-655) do ia=1,size(a) dchi2(ia) = + cg_pow * sum( df(:,:,:,:,:,:,ia) / Wmat, Hmask) dchi2(ia) = dchi2(ia) / Ldelta / count(Hmask) enddo > The change in sign has no clear justification! ———————————————————————————————————————————————————————————————————————— Similarly: • changing sign to dchi2 expression in grad_chi2_weiss_replica_elemental leads to much improved fit of the Weiss field: at least it has no min and correctly diverges for iw -> 0. > Again, there is no clear justification as for why the sign of dchi2 should change wrt the DMFT_ED implementation, which works fine. • I actually found an analogous sign discrepancy between grad_chi2_weiss and grad_chi2_delta in the Frobenius implementation. > Commit 38bb300 did swap Delta and FGmatrix in the expression defining df, effectively changing its sign (and no abs is taken downstream). But it left untouched the corresponding expression in grad_chi2_weiss_... > So I swapped G0and and FGmatrix too and got the very same results as with grad_chi2_weiss_replica_elemental (meaning that the norm of the difference between the two fitted Weiss fields is in the d-15 order) ———————————————————————————————————————————————————————————————————————— WRAPPING UP) So here I have swapped a few signs and pragmatically recovered decent fits of both Weiss and Delta, with both Frobenius and Elemental norm. But I find it very suspicious that this sign-changes make the elemental implementation diverge with respect to the analogous code in DMFT_ED, without a clear reason. One thing could be that the gradients for Delta and Weiss (not chi2, the Anderson functions themselves) introduce the wrong sign in their CDMFT_ED version, but I looked quite thoroughly at them and could not find the discrepancy. [Actually touched a bit grad_delta_replica, only to make it formally identical to DMFT_ED version, by just "compressing" some do loops...] ———————————————————————————————————————————————————————————————————————— NOTES) For both Weiss and Delta, with both codes (DMFT and CDMFT) the numerical gradients give *way better* fits. For numerical gradients DMFT_ED works fine with both CG_METHOD={0,1} but CG_METHOD=0 does *consistently* freeze (reach CG_NITER without exiting) within CDMFT_ED. Since both codes call SciFortran for this I cannot get why this happens. Again, it's not random: DMFT_ED consistently succeeds with NR-CG and CDMFT_ED consistently fails with it (but all goes well if calling minimize-CG). ———————————————————————————————————————————————————————————————————————— TODO) We may change title for issue #3, for its scope appears to be wider.
NEWS (relative to Commits 1bab32a and 3240d40 have shown that the issue is wider (hence the title change). Here I report some evidence and try to wrap a brief recap. RECAP
DETAILS
What I mean with "🔴" is
What I mean with "🟡" is
What I mean with "🟢" is
|
Relevant update in SciFortran: QcmPlab/SciFortran@c742471 Effects on this issue to be tested (it could probably fix the freezing with CG_METHOD=0 and CG_GRAD=1).
|
Please be aware that issue #3 is still open and details some instances of the serious problems we still have with CG_METHOD=0. For that reason here I switch the default to CG_METHOD=1 (the legacy minimize implementations), which has proven to be very much reliable for many different clusters, on the single-band square lattice. We'll return on the newer CG and on the analytic derivatives, but for now let's move on and merge the branch, which provides he new CG_NORM input parameter, either "elemental" or "frobenius". The latter amounts to what latest master implemented, the former generalized the old implementation, with the key difference of allowing different weights on different matrix elements for the chi evaluation. You can control which weights to use with the CG_MATRIX input variable, 'flat' for the legacy way, 'spectral' for a new definition that has proven very robust in all our test cases. Note that 'flat' CG_MATRIX weights would lead to the very same chi2 as with CG_NORM="frobenius", if CG_POW=2 (and it should be 2 for a Frobenius norm). Otherwise the two norms give different chi values. Eventual restoring of a mask to select which Weiss/Delta components to totally exclude from the chi would make the two norms completely different a priori (no way to apply any mask to a whole matrix operation, as the Frobenius norm). Note that we rise a warning if you request the Frobenius norm with CG_POW \= 2 and will do so for the mask too, when implemented (even error in that case maybe).
What's up
As I was performing some initial tests on the incoming changes on
fit_overhaul
branch, I noticed something is wrong with the Weiss field analytical gradient, as it is currently on master. By wrong I mean that the fitted field is qualitatively different from the original one, so seriously wrong.Note that this happens only if I request the analytic gradient: switching to
CG_METHOD=1
(minimize routine) solves everything, switching onlyCG_GRAD=1
but remaining with the NR routine gives the correct fit but takes forever to compute, basically freezing for several minutes (✨holy minimize, mighty va10a✨).Also note that the hybridization gradient appears to work fine, so maybe the actual problem lies within
grad_g0and_replica
. If that's the case I'll hit the wall very soon, when testing the gradients for the new 'elemental' norm. We'll see...Evidence
I did not include the real part in the figure for it is basically correct (but not totally: even with the numerical gradient I see some unwanted weight near the origin). Don't know if this could help, but I report it.
No restart file has been used, and here we are at one loop: we do not expect a brilliant fit, but a fair one yes.
I attach down here the input file, so that the problem could be reproduced with the current master head (dc046d8)
Notes
I've already added a warning for the user in commit b6351ff (
fit_overhaul
), we might want to copy those lines on master without waiting for merge, if this is considered urgent.The text was updated successfully, but these errors were encountered: