Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with CG_METHOD=0 [encountered in many different use cases] #3

Open
beddalumia opened this issue Jun 13, 2022 · 2 comments
Open

Comments

@beddalumia
Copy link
Member

beddalumia commented Jun 13, 2022

What's up

As I was performing some initial tests on the incoming changes on fit_overhaul branch, I noticed something is wrong with the Weiss field analytical gradient, as it is currently on master. By wrong I mean that the fitted field is qualitatively different from the original one, so seriously wrong.

Note that this happens only if I request the analytic gradient: switching to CG_METHOD=1 (minimize routine) solves everything, switching only CG_GRAD=1 but remaining with the NR routine gives the correct fit but takes forever to compute, basically freezing for several minutes (✨holy minimize, mighty va10a✨).

Also note that the hybridization gradient appears to work fine, so maybe the actual problem lies within grad_g0and_replica. If that's the case I'll hit the wall very soon, when testing the gradients for the new 'elemental' norm. We'll see...

Evidence

  • I did not include the real part in the figure for it is basically correct (but not totally: even with the numerical gradient I see some unwanted weight near the origin). Don't know if this could help, but I report it.

  • No restart file has been used, and here we are at one loop: we do not expect a brilliant fit, but a fair one yes.

  • I attach down here the input file, so that the problem could be reproduced with the current master head (dc046d8)

Notes

I've already added a warning for the user in commit b6351ff (fit_overhaul), we might want to copy those lines on master without waiting for merge, if this is considered urgent.

Link to the lines


inputHM2D.conf

 WMIXING=0.500000000                           !Mixing bath parameter
 TS=2.500000000E-01                            !hopping parameter
 NX=1                                          !Number of cluster sites in x direction
 NY=1                                          !Number of cluster sites in y direction
 NKX=30                                        !Number of kx point for BZ integration
 NKY=30                                        !Number of ky point for BZ integration
 NLAT=1                                        !Number of cluster sites
 NORB=1                                        !Number of impurity orbitals (max 5).
 NBATH=7                                       !Number of bath sites:(normal=>Nbath per orb)(hybrid=>Nbath total)(replica=>Nbath=Nreplica)
 NSPIN=1                                       !Number of spin degeneracy (max 2)
 ULOC=1.000000000,0.d0,0.d0,0.d0,0.d0          !Values of the local interaction per orbital (max 5)
 UST=0.d0                                      !Value of the inter-orbital interaction term
 JH=0.d0                                       !Hunds coupling
 JX=0.d0                                       !S-E coupling
 JP=0.d0                                       !P-H coupling
 BETA=300.000000000                            !Inverse temperature, at T=0 is used as a IR cut-off.
 XMU=0.d0                                      !Chemical potential. If HFMODE=T, xmu=0 indicates half-filling condition.
 NLOOP=1                                       !Max number of DMFT iterations.
 DMFT_ERROR=1.000000000E-05                    !Error threshold for DMFT convergence
 SB_FIELD=1.000000000E-01                      !Value of a symmetry breaking field for magnetic solutions.
 GF_FLAG=T                                     !flag to evaluate GFs and related quantities.
 DM_FLAG=F                                     !flag to evaluate the cluster density matrix \rho_IMP = Tr_BATH(\rho))
 ED_TWIN=F                                     !flag to reduce (T) or not (F,default) the number of visited sector using twin symmetry.
 ED_SECTORS=F                                  !flag to reduce sector scan for the spectrum to specific sectors +/- ed_sectors_shift.
 ED_SECTORS_SHIFT=1                            !shift to ed_sectors
 ED_SPARSE_H=T                                 !flag to select  storage of sparse matrix H (mem--, cpu++) if TRUE, or direct on-the-fly H*v product (mem++, cpu--) if FALSE
 ED_GF_SYMMETRIC=F                             !flag to assume Gij = Gji
 ED_PRINT_SIGMA=T                              !flag to print impurity Self-energies
 ED_PRINT_G=T                                  !flag to print impurity Greens function
 ED_PRINT_G0=T                                 !flag to print non-interacting impurity Greens function
 ED_VERBOSE=5                                  !Verbosity level: 0=almost nothing --> 5:all. Really: all
 NSUCCESS=1                                    !Number of successive iterations below threshold for convergence
 LMATS=5000                                    !Number of Matsubara frequencies.
 LREAL=5000                                    !Number of real-axis frequencies.
 LTAU=1024                                     !Number of imaginary time points.
 LFIT=1000                                     !Number of Matsubara frequencies used in the \Chi2 fit.
 NREAD=0.d0                                    !Objective density for fixed density calculations.
 NERR=1.000000000E-04                          !Error threshold for fixed density calculations.
 NDELTA=1.000000000E-01                        !Initial step for fixed density calculations.
 NCOEFF=1.000000000                            !multiplier for the initial ndelta read from a file (ndelta-->ndelta*ncoeff).
 WINI=-5.000000000                             !Smallest real-axis frequency
 WFIN=5.000000000                              !Largest real-axis frequency
 HFMODE=T                                      !Flag to set the Hartree form of the interaction (n-1/2). see xmu.
 EPS=1.000000000E-02                           !Broadening on the real-axis.
 CUTOFF=1.000000000E-09                        !Spectrum cut-off, used to determine the number states to be retained.
 GS_THRESHOLD=1.000000000E-09                  !Energy threshold for ground state degeneracy loop up
 HWBAND=2.000000000                            !half-bandwidth for the bath initialization: flat in -hwband:hwband
 LANC_METHOD=arpack                            !select the lanczos method to be used in the determination of the spectrum. ARPACK (default), LANCZOS (T=0 only), DVDSON (no MPI)
 LANC_NSTATES_SECTOR=2                         !Initial number of states per sector to be determined.
 LANC_NSTATES_TOTAL=1                          !Initial number of total states to be determined.
 LANC_NSTATES_STEP=2                           !Number of states added to the spectrum at each step.
 LANC_NCV_FACTOR=10                            !Set the size of the block used in Lanczos-Arpack by multiplying the required Neigen (Ncv=lanc_ncv_factor*Neigen+lanc_ncv_add)
 LANC_NCV_ADD=0                                !Adds up to the size of the block to prevent it to become too small (Ncv=lanc_ncv_factor*Neigen+lanc_ncv_add)
 LANC_NITER=512                                !Number of Lanczos iteration in spectrum determination.
 LANC_NGFITER=200                              !Number of Lanczos iteration in GF determination. Number of momenta.
 LANC_TOLERANCE=1.000000000E-12                !Tolerance for the Lanczos iterations as used in Arpack and plain lanczos.
 LANC_DIM_THRESHOLD=1024                       !Min dimension threshold to use Lanczos determination of the spectrum rather than Lapack based exact diagonalization.
 CG_METHOD=0                                   !Conjugate-Gradient method: 0=NR, 1=minimize.
 CG_GRAD=0                                     !Gradient evaluation method: 0=analytic (default), 1=numeric.
 CG_FTOL=1.000000000E-05                       !Conjugate-Gradient tolerance.
 CG_STOP=0                                     !Conjugate-Gradient stopping condition: 0-3, 0=C1.AND.C2, 1=C1, 2=C2 with C1=|F_n-1 -F_n|<tol*(1+F_n), C2=||x_n-1 -x_n||<tol*(1+||x_n||).
 CG_NITER=500                                  !Max. number of Conjugate-Gradient iterations.
 CG_WEIGHT=1                                   !Conjugate-Gradient weight form: 1=1.0, 2=1/n , 3=1/w_n.
 CG_SCHEME=weiss                               !Conjugate-Gradient fit scheme: delta or weiss.
 CG_POW=2                                      !Fit power for the calculation of the Chi distance function as 1/L*|G0 - G0and|**cg_pow
 CG_MINIMIZE_VER=F                             !Flag to pick old/.false. (Krauth) or new/.true. (Lichtenstein) version of the minimize CG routine
 CG_MINIMIZE_HH=1.000000000E-04                !Unknown parameter used in the CG minimize procedure.
 HFILE=hamiltonian                             !File where to retrieve/store the bath parameters.
 IMPHFILE=inputHLOC.in                         !File read the input local H.
 LOGFILE=6                                     !LOG unit.
beddalumia added a commit that referenced this issue Jun 13, 2022
Here we start to unify and further develop the recent modifications to
the fitting routines. More specifically we start re-introducing the old
distance definition:

\chi_{ij} = |FG_{ij}-FG_{ij}^Anderson|^q / W, \chi = \sum_{ij} \chi_{ij}

i.e. a generalized chi-square, computed element by element, and weighted
with various schemes /on the matsubara axis/. We plan to extend this def
to include weighting on the matrix structure, too. (more later...)

This has been replaced with a global matricial norm (Frobenius) within
commit 3ad6af7.

We keep the Frobenius distance available through a new input flag:
> CG_NORM={frobenius,elemental}

————————————————————————————————————————————————————————————————————————

RATIONALE

The reason for bringing back the old elemental distance is rooted in the
need for flexibility in defining different weights for different matrix
elements. A previous attempt at this can be found in the 'weighted_fit'
(unmerged) branch, look here:
> https://github.com/QcmPlab/CDMFT-LANC-ED/commits/weighted_fit

Hence we define a generic infrastructure to assign weights on a element-
by-element basis, as

\chi_{ij} = \sum_{iw} |FG(iw)_{ij}-FG(iw)_{ij}^Anderson|^q / Wmats(iw),
\chi = \sum_{ij} \chi_{ij} / Wmtrx_{ij}

For now two choices are available for Wmtrx:

• CG_MATRIX=0, giving equal weights to all components ('flat')

• CG_MATRIX=1, normalizing on the total spectral weight ('spectral')

> More specifically the spectral option defines:

  Wmtrx_{ij} = - \sum_{iw} Im[FG(iw)_{ij}] / beta = ∫A_{ij}(iw)diw

             = W_{diag}δ_{ij} + W_{off-diag}(1-δ_{ij})

  where in general we expect W_{off-diag} << W_{diag}, making abundantly
  clear the rationale behind this weighting choice.

————————————————————————————————————————————————————————————————————————

NOTES (in no particular order)

• The actual value for W_{diag} is ≈1d0 for the Weiss field (we sum over
  all the matsubara frequencies, not only the first Lfit ones). This way
  we can easily ensure same normalization of the chi values if switching
  to the 'flat' matricial weights, thus allowing easier debug & testing.

• The actual value for W_{diag} is NOT ≈1d0 for the bath hybridization,
  recall that ∆=(D/2)^2*Gloc, so for D=1 it would be ≈0.25d0, which is
  what we find in our Nlat=Nspin=Norb=1 test-runs (see below) on the 2d
  Hubbard model. For now I've hardcoded Wflat=0.25d0 but we should find
  a way to define it in terms of the hopping (not so trivial since the
  hopping value is model dependent and the name of the variable is not
  enforced by the solver, with the possibility of different choices in
  different drivers).

• Speaking of normalization conventions I've actually changed the last
  line of the Frobenius implementation(s), so to divide also therein by
  Nlso = Nlat * Nspin * Norb, which corresponds to count(Hmask) in the
  elemental case.

• Actually the Hmask implementation is totally different now, wrt what
  used to be before the Frobenius update (which totally dropped Hmask).
  This because we need the FGmatrix structure to be a whole NNN-array,
  since the Frobenius norm cannot in any (easy) way operate on a logical
  mask, being it a whole-matrix formula. So we just define the mask and
  pass it to the sum() fortran intrinsic when computing the final sum
  over matrix elements: \chi = \sum_{ij} \chi_{ij} / Wmtrx_{ij}

• More on Hmask: for now I just defined an internal ed_all_g=.true. flag
  and imported the current implementation from LIB_DMFT_ED, so all tests
  have been performed with Hmask=.true. (no mask). This is of course the
  safest option, thus appropriate for development. We should discuss the
  actual mask implementation for production, since I deem that to be the
  true reason for the Frobenius implementation improved fit-quality over
  the old flat elemental chi-square: as far as I can tell, at least when
  CG_POW=2, the Frobenius norm has no way to produce different chi2 vals
  wrt the old implementation, as long as you don't define a mask.

  > do it really makes sense to build the mask basing on zeros in Hrepl?

  > why not just exploit hermiticity of ∆ and g0, so a naive uplo mask?

  > this has been already brought out a few times, e.g. I'm aware of
    a. 0e5c272b45eda6b7ff652e2473b9ecda09e5ba8b on LIB_DMFT_ED
    b. cb0af32 on CDMFT-LANC-ED
    so it might be time to discuss it all together.

• There are also many whitespace changes and new comments/printings, in 
  line with https://github.com/QcmPlab/LIB_DMFT_ED/tree/0.5.2

————————————————————————————————————————————————————————————————————————

TESTING

For now all possible input flag combinations have been tested on the 2d
Hubbard model driver only (cdn_hm_2dsquare) with Nlat=Nspin=Norb=1, so
to allow a cross-check with LIB_DMFT_ED. Everything tested with minimize
algorithm (CG_METHOD=1) since I've still not written the gradients for
the elemental implementation.

I'll point out only a few crucial outcomes:

• Frobenius norm and 'flat-weighted' elemental norm give the same fit,
  for CG_POW=2. I've not tested other powers, we might need to explore.

• Frobenius norm and 'spectral-normalized' elemental norm give slightly
  different fits of the real part of the Weiss field. I could not catch
  the reason for now (I surely expected exact match with Nlso=1 and same
  overall normalizations of the chi-square…). It could just be that the
  ∫A(iw)diw it's not really 1d0 (something like 0.97d0), so we actually
  increase chi-square values and the provided tolerance changes scale.
  (but I thought it was a relative tolerance… I might return to it).

• MOST IMPORTANTLY: Frobenius norm FAILS TO FIT the Weiss field, if the
  analytic gradient is used (the hybridization works fine instead).
  More info reported within issue #3.
  > As I said, all cross-checks are evaluated with numerical gradient,
    which is efficient only if using the minimize routine.
  > I've added an explicit warning in the code, so to alert users if
    they enter the function. (new lines 649-658 in ED_FIT_CHI2.f90)

————————————————————————————————————————————————————————————————————————

TODO

1. Write the analytical gradients for the elemental norm (ASAP).
2. Solve issue #3 for the Frobenius norm (I might defer it, sorry).
3. Test on true clusters (Nlat>1), where Wmtrx choice is relevant.
4. Test on different models (I'd delegate to relevant people here).
beddalumia added a commit that referenced this issue Jun 17, 2022
+ add debug printing of \grad{\chi^2} to CG_NORM=frobenius (to compare)

————————————————————————————————————————————————————————————————————————

TESTING [cdn_hm_2dsquare, Nlat=Nspin=Norb=1]

• CG_SCHEME = WEISS

  We observe the very same problem reported in issue #3 for Frobenius
  distance: the shape of the fitted function it's not that of a Weiss
  field, but that of a hybridization function (there's a minimum, it
  goes to zero for iw -> 0).

  Does this imply that we have a problem within grad_g0and_replica()?
  > I believe not, cause it matches quite literally the DMFT_ED version.

• CG_SCHEME = DELTA

  Recall that with Frobenius distance we got a correct fit... Now with
  CG_NORM=elemental we get... again a qualitatively wrong fit (similar
  situation really: the shape of the fitted function it's not that of
  a hybridization function, but that of a Weiss field).

  This is becoming interesting... we call the same grad_delta_replica()
  but with Frobenius gradient we get the right fit, while elemental grad
  makes for a qualitatively wrong result (with the implementation being 
  a literal porting of the DMFT_ED one, which works totally fine!).
  The qualitative change of the function to me hints to a wrong /sign/
  in the gradients, like if we are finding a maximum, instead of a min.

  > this appears to be indeed the case if we look at the printed dchi2
    in two runs with everything equal but CG_norm: dchi2(elemental) at
    first print is exactly -dchi2(frobenius). We have a lead.

————————————————————————————————————————————————————————————————————————

>> TO BE FURTHER INVESTIGATED (todo: update the issue report)
beddalumia added a commit that referenced this issue Jun 17, 2022
Recap: with CG_SCHEME=delta and CG_GRAD=1 we had
       > a correct fit with CG_NORM=frobenius
       > a wrong fit with CG_NORM=elemental
       > dchi2(elemental) = -dchi2(frobenius) at first call.

       >> So we try changing sign to dchi2(elemental).

What happens: we fix the fitted \Delta function with CG_NORM=elemental.

Why this is suspicious: doing so we change sign wrt DMFT_ED code (which
                        works just fine in this test!)

  • DMFT_ED (grad_chi2_delta_replica, line 363 of ED_FIT_REPLICA.f90)
    dchi2 = - cg_pow*sum(df,1) / Ldelta / totNso

  • CDMFT_ED (grad_chi2_delta_replica_elemental, lines 650-655)
    do ia=1,size(a)
      dchi2(ia) = + cg_pow * sum( df(:,:,:,:,:,:,ia) / Wmat, Hmask)
      dchi2(ia) = dchi2(ia) / Ldelta / count(Hmask)
    enddo

  > The change in sign has no clear justification!

————————————————————————————————————————————————————————————————————————

Similarly:

• changing sign to dchi2 expression in grad_chi2_weiss_replica_elemental
  leads to much improved fit of the Weiss field: at least it has no min
  and correctly diverges for iw -> 0.

  > Again, there is no clear justification as for why the sign of dchi2
    should change wrt the DMFT_ED implementation, which works fine.

• I actually found an analogous sign discrepancy between grad_chi2_weiss
  and grad_chi2_delta in the Frobenius implementation.

  > Commit 38bb300 did swap Delta and
    FGmatrix in the expression defining df, effectively changing its
    sign (and no abs is taken downstream). But it left untouched the
    corresponding expression in grad_chi2_weiss_...

  > So I swapped G0and and FGmatrix too and got the very same results as
    with grad_chi2_weiss_replica_elemental (meaning that the norm of the
    difference between the two fitted Weiss fields is in the d-15 order)

————————————————————————————————————————————————————————————————————————

WRAPPING UP)

So here I have swapped a few signs and pragmatically recovered decent
fits of both Weiss and Delta, with both Frobenius and Elemental norm.

But I find it very suspicious that this sign-changes make the elemental
implementation diverge with respect to the analogous code in DMFT_ED,
without a clear reason. One thing could be that the gradients for Delta
and Weiss (not chi2, the Anderson functions themselves) introduce the
wrong sign in their CDMFT_ED version, but I looked quite thoroughly at
them and could not find the discrepancy.
[Actually touched a bit grad_delta_replica, only to make it formally
identical to DMFT_ED version, by just "compressing" some do loops...]

————————————————————————————————————————————————————————————————————————

NOTES)

For both Weiss and Delta, with both codes (DMFT and CDMFT) the numerical
gradients give *way better* fits.

For numerical gradients DMFT_ED works fine with both CG_METHOD={0,1} but
CG_METHOD=0 does *consistently* freeze (reach CG_NITER without exiting)
within CDMFT_ED. Since both codes call SciFortran for this I cannot get
why this happens. Again, it's not random: DMFT_ED consistently succeeds
with NR-CG and CDMFT_ED consistently fails with it (but all goes well if
calling minimize-CG).

————————————————————————————————————————————————————————————————————————

TODO)

We may change title for issue #3, for its scope appears to be wider.
@beddalumia beddalumia changed the title Wrong Frobenius gradient for the Weiss field [Nlat=Nspin=Norb=1, 2d Hubbard model] Problems with analytical gradients [Nlat=Nspin=Norb=1, 2d Hubbard model] Jun 17, 2022
@beddalumia
Copy link
Member Author

NEWS (relative to fit_overhaul branch)

Commits 1bab32a and 3240d40 have shown that the issue is wider (hence the title change). Here I report some evidence and try to wrap a brief recap.


RECAP

  1. With numerical gradients everything works, with both "Frobenius" and "Elemental" definition of $\chi^2$ distance.
Deltaˆ Weiss
Elemental 🟡 🟢
Frobenius 🟡 🟢

ˆThe Delta fits are tagged "yellow" for they have worse quality wrt the Weiss ones, if comparing with NR-CG results with DMFT_ED code (NR-CG freezes with CDMFT code, consistently. I don't know why). Yet minimize-CG results are all on the same level accross the two codes, and very similar to the "yellow" ones (so we can take them as "fairly good"). More info on the freezing in 3240d40 commit message; plots of fitted functions are reported below. Note that instead minimize and NR give the very same results for the Weiss field, hence tagged green.

  1. With analytical gradients we see some problems if we a) leave unchanged Frobenius gradients (wrt master branch) and b) port analytic gradients for elemental $\chi^2$ from current version in LIB_DMFT_ED. The situation is:
Delta Weiss
Elemental 🔴 🔴
Frobenius 🟡 🔴
  1. If we swap sign in the elemental implementation of $\nabla\chi^2$ (thus diverging from LIB_DMFT_ED!) we get:
Delta Weiss
Elemental 🟡 🟡
Frobenius 🟡 🔴
  1. If we further notice that commit 38bb300 had fixed the sign of \grad\chi^2(\Delta), but left untouched the sign of \grad\chi^2(g_0), so that applying the missing fix, we get:
Delta Weiss
Elemental 🟡 🟡
Frobenius 🟡 🟡

DETAILS

All plots with same input other than CG options, and at one loop.
Solid Line: FG
Dash-Dot: Fit

What I mean with "🔴" is

Delta Weiss
image image

What I mean with "🟡" is

Delta Weiss
image image

What I mean with "🟢" is

Deltaˆ Weiss
image image

ˆThis plot is the only one generated with DMFT_ED code, for this quality appears to be unreachable with minimize-CG, and NR-CG is de facto unavailable within CDMFT_ED.

@beddalumia
Copy link
Member Author

beddalumia commented Jun 28, 2022

Relevant update in SciFortran: QcmPlab/SciFortran@c742471

Effects on this issue to be tested (it could probably fix the freezing with CG_METHOD=0 and CG_GRAD=1).

edit: it does not.

@beddalumia beddalumia changed the title Problems with analytical gradients [Nlat=Nspin=Norb=1, 2d Hubbard model] Problems with CG_METHOD=0 [encountered in many different use cases] Nov 28, 2022
beddalumia added a commit that referenced this issue Nov 28, 2022
Please be aware that issue #3 is still open and details some instances of
the serious problems we still have with CG_METHOD=0. For that reason here
I switch the default to CG_METHOD=1 (the legacy minimize implementations),
which has proven to be very much reliable for many different clusters, on
the single-band square lattice.

We'll return on the newer CG and on the analytic derivatives, but for now
let's move on and merge the branch, which provides he new CG_NORM input
parameter, either "elemental" or "frobenius". The latter amounts to what
latest master implemented, the former generalized the old implementation,
with the key difference of allowing different weights on different matrix
elements for the chi evaluation. You can control which weights to use with
the CG_MATRIX input variable, 'flat' for the legacy way, 'spectral' for a
new definition that has proven very robust in all our test cases.

Note that 'flat' CG_MATRIX weights would lead to the very same chi2 as with
CG_NORM="frobenius", if CG_POW=2 (and it should be 2 for a Frobenius norm).
Otherwise the two norms give different chi values. Eventual restoring of a
mask to select which Weiss/Delta components to totally exclude from the chi
would make the two norms completely different a priori (no way to apply any
mask to a whole matrix operation, as the Frobenius norm). Note that we rise
a warning if you request the Frobenius norm with CG_POW \= 2 and will do so
for the mask too, when implemented (even error in that case maybe).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant