Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantum Espresso Geometry Optimization Error Code 3 #22

Open
ghost opened this issue Jul 6, 2022 · 18 comments
Open

Quantum Espresso Geometry Optimization Error Code 3 #22

ghost opened this issue Jul 6, 2022 · 18 comments

Comments

@ghost
Copy link

ghost commented Jul 6, 2022

Hi all,
when running geometry optimization with Quantum Espresso via the calculation = 'relax' command with a small maximum number of relaxation steps (e.g. 20) the calculation will result in an error code 3.
The QE output still terminates with

=------------------------------------------------------------------------------=
   JOB DONE.
=------------------------------------------------------------------------------=

as it should and includes all necessary information (i.e. energy and forces).
However, wfl will through an error and stop the iterative training.

Is it possible to prohibit certain error messages to halt the whole program?

Here is an example QE_run.tar.gz

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

There are various ways to control how remote jobs deal with errors, and this is a bit tricky. The current intended behavior is that if a remote job raises an exception, the python Exception object will be reraised by the local process. Since ExPyRe calls are often called from wfl, and both of these have to be instructed how to deal with failures, it gets extra tricky. Are you calling ExPyRe directly, or via wfl?

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

[edited]
I agree that this needs at least better documentation, and quite probably just more thinking about how to deal with this, but this is why it's complex: that initial issue is that the espresso executable returns an error code, and the ASE interface decided that this means that it should raise a python exception. Then the wfl calculator needs to decide how to deal with this error from the ASE Calculator. Then the pool parallelization running inside the remote job has to decide how to deal with this error from the wfl calculator wrapper. Then the remote job wrapper has to deal with the remote error when it retrieves the output of the job.

Note that if the ASE calculator refuses to try to read the QE output because the executable gave a non-zero return status, there's nothing ExPyRe/wfl can do about that.

Let me look over everything it's doing, make some suggestions for your particular case, and think about whether anythin can be changed to make the overall error handling more intuitive.

@ghost
Copy link
Author

ghost commented Jul 6, 2022

I just tracked down my script :)
I basically call the evaluate_dft function with some remote_info within the wfl/calculators/dft.py script, that then goes via espresso.py, wfl//autoparallelize/base.py and eventually do_remotely which then loads ExPyRe

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

Please reread the message above, since I was editing it to give more info while you were writing your followup.

@ghost
Copy link
Author

ghost commented Jul 6, 2022

Totally agree that this should not be an aggressive QE error code or at least there should be another flag that avoids this error code...
I see I will have some closer look into the ASE calculator.

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

If you know how to do it, I'd suggest setting up a little python script using the underlying ASE QE calculator, no expyre or wfl, to confirm that the ASE interface is even returning any useful data. Just set up a small system that you can run in serial and a tiny max steps. If it does return data but gives an error, then we can think about how to get expyre/wfl to handle that cleanly.

@ghost
Copy link
Author

ghost commented Jul 6, 2022

It is indeed an ASE calculator issue. :/
Let's see whether there is a simple fix in the ASE code or not.

@ghost
Copy link
Author

ghost commented Jul 6, 2022

I went through the ASE documentation and realized that ASE is doing optimization not like I was assuming. Instead of running one QE job with calculation='relax', the relaxation is done via the ase.optimize (e.g. applying BFGS) module stringing together several QE jobs with calculation='scf'.
As far as I know this is not yet integrated in the wfl/calculator/espresso.py package. Unfortunately I will be on vacation until Sunday and can not address this topic directly. I will start drafting some code changes by next Monday.
For now, I have created a (very) simple example running a QE relaxation within ase of a water molecule via the run.py script below to give a first impression and maybe get feedback whether this should be included.
All files can be found here (without pw.x file - path has to be specified has to be provided):
ase_geo_opt.tar.gz

 import ase, ase.io, os
from ase.calculators.espresso import Espresso
from ase.optimize import BFGS
import pickle

## Change to actual path
qe_binary = '/home/daisy/code/qe-7.0/bin/pw.x'
if not os.path.exists(qe_binary):
    raise FileNotFoundError('QE binary %s does not exist'%qe_binary)

at = ase.io.read('water.xyz')
with open('QE_settings.pkl', 'rb') as handle:
    kwargs_this_calc = pickle.load(handle)
kwargs_this_calc['command'] =  ' mpirun -np 4 %s -in PREFIX.pwi > PREFIX.pwo'%qe_binary

run_dir = './'
all_changes = ['positions', 'numbers', 'cell', 'pbc', 'initial_charges', 'initial_magmoms']
properties_use = ['energy','forces']

at.calc = Espresso(directory=run_dir, **kwargs_this_calc)
opt = BFGS(at)
opt.run(fmax=0.001,steps=4)

ase.io.write('final.xyz',at)

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

wfl wraps the ASE optimizer (only preconlbfgs for now). In principle it can use any ASE calculator, but I don't know if it's been tested with any DFT calculator (they're slightly different because they write lots of files).

@ghost
Copy link
Author

ghost commented Jul 6, 2022

I will do some testing on Monday 💪

@gelzinyte
Copy link
Contributor

I think the way to do DFT relaxations is to use the DFT code itself, not ASE optimiser with DFT single point evaluations. It's much more efficient. For example, for one or two examples I tested ASE-LBFGS & ORCA-single-point took over 100 steps, whereas ORCA itself took 20 steps and more efficient ones, because it re-uses some things from previous optimisation steps.

Relaxing structures with ORCA directly is implemented in workflow's ORCA calculator. If there's enough interest in implementing that for QE, it might be useful to look at that?

The gist: set task="opt" when initialising the wfl.calculators.orca.ORCA calculator (instead of default task="engrad", which corresponds to just single-point evaluations). Then the input file is written with Opt keyword in the input file, which is all that is needed for ORCA to relax geometries. Once ORCA itself is done, wfl.calculators.orca.ORCA reads in the relaxed positions from output file to wfl.calculators.orca.ORCA.extra_results dictionary in addition to the usual relaxed energies and forces. If workflow's ORCA calculator is called via the generic calculator, relaxed atoms' positions are set to the returned atoms' objects.

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

There's no good reason for the number of geometry steps to be different, unless the DFT calculator's minimization algorithm is better than ASE's. If there are really so much better minimizers, they should be added to ASE. [added] With VASP I'm confident that its minimizers are not much better than PreconLBFGS, with the possible exception of being less sensitive to noisy forces.

The step-to-step efficiency depends on the code, and on how the ASE interface works. For VASP it will reuse wavefunctions (if you don't turn off writing them), which helps a lot, but it won't do what I implemented for QUIP, which is to keep VASP running and just repeatedly feed it configurations, so you don't even have the program restart overhead.

There's also no guarantee that any particular ASE DFT calculator will read new positions at the end. In fact, It sort of violates the concept of the ASE Calculator object, although I think in practice at least some calculators do it correctly. But there's no consistency between calculators, and apparently little appetite in the ASE community to make them more consistent (despite the huge number of hoops Ask makes contributors jump through for practically anything new).

@gabor1
Copy link
Contributor

gabor1 commented Jul 6, 2022

Orca could be doing wavefunction extrapolation - could that possibly change the number of steps though? Maybe it's using force field preconditioning. We have that in ASE as well but it's a bit of a hack to set it up

@gabor1
Copy link
Contributor

gabor1 commented Jul 6, 2022

Look at the example in preconlbfgs

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

Yes, WF extrapolation could be making a difference (I do that, although not as well as VASP's internal algorithm, with my hacked interactive VASP driver for QUIP), but again, only changing the number of SCF steps, not geometry steps. I'm guessing for the geometry that perhaps it's doing something that makes sense for molecules, like redundant internal coordinates, which ASE indeed does not support, at least not in PreconLBFGS.

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

We could wrap https://databases.fysik.dtu.dk/ase/ase/optimize.html#pyberny, if it's useful

@gabor1
Copy link
Contributor

gabor1 commented Jul 6, 2022 via email

@bernstei
Copy link
Contributor

bernstei commented Jul 6, 2022

It might. I wonder if the current heuristic for turning it off for small systems is too conservative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants