Skip to content

Commit

Permalink
Merge branch 'main' into tut
Browse files Browse the repository at this point in the history
  • Loading branch information
jmmshn committed May 14, 2024
2 parents c595647 + 832fdbe commit 2c2dab5
Show file tree
Hide file tree
Showing 518 changed files with 47,562 additions and 1,569 deletions.
35 changes: 24 additions & 11 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ on:
tags: ["v*"]
pull_request:
workflow_dispatch:
repository_dispatch:
types: [pymatgen-ci-trigger]

jobs:
lint:
Expand Down Expand Up @@ -39,24 +41,35 @@ jobs:
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip
cache-dependency-path: pyproject.toml

- name: Install enumlib
run: |
cd ..
git clone --recursive https://github.com/msg-byu/enumlib.git
cd enumlib/symlib/src
export F90=gfortran
make
cd ../../src
make enum.x
sudo mv enum.x /usr/local/bin/
cd ..
sudo cp aux_src/makeStr.py /usr/local/bin/
continue-on-error: true # This is not critical to succeed.

- name: Install dependencies
# ERROR: Cannot install atomate2 and atomate2[strict,tests]==0.0.1 because these package versions have conflicting dependencies.
# The conflict is caused by:
# atomate2[strict,tests] 0.0.1 depends on pymatgen>=2023.10.4
# atomate2[strict,tests] 0.0.1 depends on pymatgen==2023.10.4; extra == "strict"
# ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
# chgnet 0.2.1 depends on pymatgen>=2023.5.31
# emmet-core 0.70.0 depends on pymatgen>=2023.10.11
run: |
python -m pip install --upgrade pip
mkdir -p ~/.abinit/pseudos
cp -r tests/test_data/abinit/pseudos/ONCVPSP-PBE-SR-PDv0.4 ~/.abinit/pseudos
# ase needed to get FrechetCellFilter used by ML force fields
pip install git+https://gitlab.com/ase/ase
pip install .[strict,tests]
pip install .[strict,tests,abinit]
pip install torch-runstats
pip install --no-deps nequip
pip install --no-deps nequip==0.5.6
- name: Install pymatgen from master if triggered by pymatgen repo dispatch
if: github.event_name == 'repository_dispatch' && github.event.action == 'pymatgen-ci-trigger'
run: pip install --upgrade 'git+https://github.com/materialsproject/pymatgen@${{ github.event.client_payload.pymatgen_ref }}'

- name: Test Notebooks
run: pytest --nbmake ./tutorials
Expand Down
7 changes: 4 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
default_language_version:
python: python3
exclude: ^(.github/|tests/test_data/abinit/)
repos:
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.3.0
rev: v0.4.2
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
rev: v4.6.0
hooks:
- id: check-yaml
- id: fix-encoding-pragma
Expand All @@ -29,7 +30,7 @@ repos:
- id: rst-directive-colons
- id: rst-inline-touching-normal
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.8.0
rev: v1.10.0
hooks:
- id: mypy
files: ^src/
Expand Down
4 changes: 2 additions & 2 deletions docs/about/contributors.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@ Lawrence Berkeley National Laboratory
[0000-0003-3439-4856]: https://orcid.org/0000-0003-3439-4856

**Matthew McDermott** [![gh]][mattmcdermott] [![orc]][0000-0002-4071-3000] \
PhD student \
University of California, Berkeley
Postdoctoral Researcher \
Lawrence Berkeley National Laboratory

[mattmcdermott]: https://github.com/mattmcdermott
[0000-0002-4071-3000]: https://orcid.org/0000-0002-4071-3000
Expand Down
165 changes: 165 additions & 0 deletions docs/dev/abinit_tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@

# Writing ABINIT Tests

## Considerations

Atomate2 includes tools to help write tests for ABINIT workflows. The primary
considerations with the atomate2 testing environment are listed below.

### Pseudopotentials

ABINIT heavily relies on pseudo potential tables accessible through abipy. These
tables are large in size or can be downloaded on the fly at input creation time.
Therefore, a smaller pseudopotential table is included for just a few elements.
Structures to be used for testing should be based on the missing elements should
be added to this pseudopotential table.

Note that information from the real pseudopotential files is used in the creation
of the jobs and flows, hence fake pseudopotentials are not an option here.


### File sizes

The files produced by ABINIT are generally large and would overwhelm the size of the
atomate2 repository if not managed carefully. For example, density (DEN) and
wavefunction (WFK) files can easily be ten's of megabytes which can quickly add up.

To overcome this, we only include essential ABINIT output files in the atomate2 test
folder. For example, DEN, WFK and other density information is not needed in most
instances. For these outputs files which can be required inputs for some jobs, fake
files are generated in the test folder and the linking copying of the files is checked
using these fake files. These fake files contain the information whether they are
a regular file or a symbolic link to another regular file.

### ABINIT execution

We cannot run ABINIT on the testing server due to the computational expense. Furthermore,
different versions/compilations of ABINIT may yield slightly different total energies
which are not important for our tests – we only test that (i) inputs are written
correctly, (ii) outputs are parsed correctly, and (iii) jobs are connected together
properly.

This is achieved by "mocking" ABINIT execution. Instead of running ABINIT, we copy reference
output files into the current directory and then proceed with running the workflow.

Note that it is still possible to run integration tests where ABINIT is executed by
passing the `--abinit-integration` option to pytest:

```bash
pytest --abinit-integration
```

When executing tests with the real abinit, larger deviations are expected depending on
ABINIT version, compilation options, etc.

## Generation of new tests

Atomate2 provides an automatic procedure to prepare ABINIT data (reference files)
for use in atomate2 tests. It does this by:

- Preparing a standard maker file that will be used to generate the reference files as
well as to run the tests.
- Create the flow or job using the maker file and a structure file (cif file or other).
- Copying ABINIT inputs and outputs into the correct directory structure and creating
the fake input and output files for large files when relevant.
- Providing a template unit test that is configured for the specific workflow.

There are four stages to generating the test data:

### 1. Create the maker file

The `atm dev abinit-script-maker` command allows to prepare a template `create_maker.py`
script in the current directory. You should adapt this file for the maker you intend
to test. Try to make sure to use parameters that allow the generation to be executed
relatively fast. Additionally, with the integration testing capability for ABINIT
workflows, the faster the workflows can run, the better.

After adapting the `create_maker.py` script for the maker to be tested, you should run
it:

```bash
python create_maker.py
```

This will generate a `maker.json` containing the serialized version of the maker together
with additional metadata information, inter alia the string of the `create_maker.py` script
itself, the author and author mail (extracted from git config information), date of
generation, ...

### 2. Generate the reference files

The `atm dev abinit-generate-reference` command runs the workflow for a given structure
in the current directory using `jobflow`'s `run_locally` option. This will execute the
different abinit jobs of the flow in separated run folders, and dump an `outputs.json`
file with all the outputs of the flow.

Note that the structure is specified either implicitly in an `initial_structure.json`
file:

```bash
atm dev abinit-generate-reference
```

or explicitly, e.g. as a path to a CIF file:

```bash
atm dev abinit-generate-reference /path/to/structure.cif
```

When an explicit structure file is passed to the `atm dev abinit-generate-reference`
command, the structure is dumped to an `initial_structure.json` file.

### 3. Copy files to the test_data folder

Now that the flow has been executed, the generated input and output files have to be
copied to the tests/test_data/abinit folder. This is achieved using the
`atm dev abinit-test-data` command:

```bash
atm dev abinit-test-data TEST_NAME
```

You should change `TEST_NAME` to be a name for the workflow test. Note, `TEST_NAME` should not
contain spaces or punctuation. For example, the band structure workflow test data was
genenerated using `atm dev vasp-test-data Si_band_structure`.

This will automatically detect whether the Maker is a Job Maker or a Flow Maker and
copy files in the corresponding `tests/test_data/abinit/jobs/NameOfMaker/TEST_NAME`
or `tests/test_data/abinit/flows/NameOfMaker/TEST_NAME` directory. It will create
the `NameOfMaker/TEST_NAME` directory structure and copy the information about the
Maker and initial structure, i.e. `maker.json`, `initial_structure.json` and
`make_info.json` if present.

Each job of the flow has its own directory in the `TEST_NAME` directory,
with one directory for each "restart" (i.e. index of the job). The directory
for a given ABINIT run, hereafter referenced as `REF_RUN_FOLDER` thus has the
following structure:

`tests/test_data/abinit/jobs_OR_flows/NameOfMaker/TEST_NAME/JOB_NAME/JOB_INDEX`

where `JOB_NAME` is the name of the job and `JOB_INDEX` is the index of the job
(usually "1" unless the job is restarted).

**Note:** For the script to run successfully, every job in the workflow must have
a unique name. For example, there cannot be two calculations called "relax".
Instead you should ensure they are named something like "relax 1" and "relax 2".

Each `REF_RUN_FOLDER` contains:
- A folder called "inputs" with the run.abi and abinit_input.json, as well as with the
indata, outdata and tmpdata directories. The indata directory potentially contains
the reference fake input files needed for the job to be executed (e.g. a fake link to a
previous DEN file).
- A folder called "outputs" with the run.abo, run.err, run.log, as well as with the
indata, outdata and tmpdata directories. In the indata, outdata and tmpdata directories,
the large files are replaced by fake reference files while the necessary files for the
workflow test execution are present.

### 4. Write the test

The `atm dev abinit-test-data` command also generates a template test method that is
configured to use the test data that was just generated. In this template test method,
the maker, the initial_structure and the reference paths (i.e. the mapping from the job
name and job index to the reference job folder) are automatically loaded from the test
folder.

Add `assert` statements to validate the workflow outputs.
21 changes: 11 additions & 10 deletions docs/dev/workflow_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

## Anatomy of an `atomate2` computational workflow (i.e., what do I need to write?)

Every `atomate2` workflow is an instance of jobflow's `Flow ` class, which is a collection of Job and/or other `Flow` objects. So your end goal is to produce a `Flow `.
Every `atomate2` workflow is an instance of jobflow's `Flow` class, which is a collection of Job and/or other `Flow` objects. So your end goal is to produce a `Flow`.

In the context of computational materials science, `Flow ` objects are most easily created by a `Maker`, which contains a factory method make() that produces a `Flow `, given certain inputs. Typically, the input to `Maker`.make() includes atomic coordinate information in the form of a `pymatgen` `Structure` or `Molecule` object. So the basic signature looks like this:
In the context of computational materials science, `Flow` objects are most easily created by a `Maker`, which contains a factory method make() that produces a `Flow`, given certain inputs. Typically, the input to `Maker`.make() includes atomic coordinate information in the form of a `pymatgen` `Structure` or `Molecule` object. So the basic signature looks like this:

```py
class ExampleMaker(Maker):
def make(self, coordinates: Structure) -> Flow:
# take the input coordinates and return a `Flow `
# take the input coordinates and return a `Flow`
return Flow(...)
```

Expand Down Expand Up @@ -49,15 +49,16 @@ Finally, most `atomate2` workflows return structured output in the form of "Task
**TODO - extend code block above to illustrate TaskDoc usage**

In summary, a new `atomate2` workflow consists of the following components:
- A `Maker` that actually generates the workflow
- One or more `Job` and/or `Flow ` classes that define the discrete steps in the workflow
- (optionally) an `InputGenerator` that produces a `pymatgen` `InputSet` for writing calculation input files
- (optionally) a `TaskDocument` that defines a schema for storing the output data

- A `Maker` that actually generates the workflow
- One or more `Job` and/or `Flow` classes that define the discrete steps in the workflow
- (optionally) an `InputGenerator` that produces a `pymatgen` `InputSet` for writing calculation input files
- (optionally) a `TaskDocument` that defines a schema for storing the output data

## Where do I put my code?

Because of the distributed design of the MP Software Ecosystem, writing a complete new workflow may involve making contributions to more than one GitHub repository. The following guidelines should help you understand where to put your contribution.

- All workflow code (`Job`, `Flow `, `Maker`) belongs in `atomate2`
- `InputSet` and `InputGenerator` code belongs in `pymatgen`. However, if you need to create these classes from scratch (i.e., you are working with a code that is not already supported in`pymatgen`), then it is recommended to include them in `atomate2` at first to facilitate rapid iteration. Once mature, they can be moved to `pymatgen` or to a `pymatgen` [addon package](https://pymatgen.org/addons).
- `TaskDocument` schemas should generally be developed in `atomate2` alongside the workflow code. We recommend that you first check emmet to see if there is an existing schema that matches what you need. If so, you can import it. If not, check [`cclib`](https://cclib.github.io/). `cclib` output can be imported via [`atomate2.common.schemas.TaskDocument`](https://github.com/materialsproject/atomate2/blob/main/src/atomate2/common/schemas/cclib.py). If neither code has what you need, then new schemas should be developed within `atomate2` (or `cclib`).
- All workflow code (`Job`, `Flow`, `Maker`) belongs in `atomate2`
- `InputSet` and `InputGenerator` code belongs in `pymatgen`. However, if you need to create these classes from scratch (i.e., you are working with a code that is not already supported in`pymatgen`), then it is recommended to include them in `atomate2` at first to facilitate rapid iteration. Once mature, they can be moved to `pymatgen` or to a `pymatgen` [addon package](https://pymatgen.org/addons).
- `TaskDocument` schemas should generally be developed in `atomate2` alongside the workflow code. We recommend that you first check emmet to see if there is an existing schema that matches what you need. If so, you can import it. If not, check [`cclib`](https://cclib.github.io/). `cclib` output can be imported via [`atomate2.common.schemas.TaskDocument`](https://github.com/materialsproject/atomate2/blob/main/src/atomate2/common/schemas/cclib.py). If neither code has what you need, then new schemas should be developed within `atomate2` (or `cclib`).
39 changes: 38 additions & 1 deletion docs/user/codes/vasp.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ adjust them if necessary. The default might not be strict enough
for your specific case.
```

### Lobster
### LOBSTER

Perform bonding analysis with [LOBSTER](http://cohp.de/) and [LobsterPy](https://github.com/jageo/lobsterpy)

Expand Down Expand Up @@ -367,6 +367,43 @@ for number, (key, cohp) in enumerate(
plotter.add_cohp(key, cohp)
plotter.save_plot(f"plots_cation_anion_bonds{number}.pdf")
```
# Running the LOBSTER workflow without database and with one job script only

It is also possible to run the VASP-LOBSTER workflow with a minimal setup.
In this case, you will run the VASP calculations on the same node as the LOBSTER calculations.
In between, the different computations you will switch from MPI to OpenMP parallelization.

For example, for a node with 48 cores, you could use an adapted version of the following SLURM script:

```bash
#!/bin/bash
#SBATCH -J vasplobsterjob
#SBATCH -o ./%x.%j.out
#SBATCH -e ./%x.%j.err
#SBATCH -D ./
#SBATCH --mail-type=END
#SBATCH [email protected]
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#This needs to be adapted if you run with different cores
#SBATCH --ntasks=48

# ensure you load the modules to run VASP, e.g., module load vasp
module load my_vasp_module
# please activate the required conda environment
conda activate my_environment
cd my_folder
# the following script needs to contain the workflow
python xyz.py
```

The `LOBSTER_CMD` now needs an additional export of the number of threads.

```yaml
VASP_CMD: <<VASP_CMD>>
LOBSTER_CMD: OMP_NUM_THREADS=48 <<LOBSTER_CMD>>
```
(modifying_input_sets)=
Modifying input sets
Expand Down
10 changes: 10 additions & 0 deletions docs/user/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,16 @@ To install the packages run:
pip install atomate2
```

If you would like to use more specialized capabilities of `atomate2` such as the phonon, Lobster or force field workflows, you would need to run one of

```bash
pip install atomate2[phonons]
pip install atomate2[lobster]
pip install atomate2[forcefields]
```

See [`pyproject.toml`](https://github.com/materialsproject/atomate2/blob/main/pyproject.toml) for all available optional dependency sets. More detailed instructions can be found under [dev installation](../dev/dev_install.md).

## Configure calculation output database

The next step is to configure your MongoDB database that will be used to store
Expand Down
Loading

0 comments on commit 2c2dab5

Please sign in to comment.