Merge branch 'main' into tut

materialsproject · May 14, 2024 · 2c2dab5 · 2c2dab5
2 parents c595647 + 832fdbe
commit 2c2dab5
Show file tree

Hide file tree

Showing 518 changed files with 47,562 additions and 1,569 deletions.
diff --git a/.github/workflows/testing.yml b/.github/workflows/testing.yml
@@ -6,6 +6,8 @@ on:
     tags: ["v*"]
   pull_request:
   workflow_dispatch:
+  repository_dispatch:
+    types: [pymatgen-ci-trigger]
 
 jobs:
   lint:
@@ -39,24 +41,35 @@ jobs:
       - uses: actions/setup-python@v5
         with:
           python-version: ${{ matrix.python-version }}
-          cache: pip
-          cache-dependency-path: pyproject.toml
+
+      - name: Install enumlib
+        run: |
+          cd ..
+          git clone --recursive https://github.com/msg-byu/enumlib.git
+          cd enumlib/symlib/src
+          export F90=gfortran
+          make
+          cd ../../src
+          make enum.x
+          sudo mv enum.x /usr/local/bin/
+          cd ..
+          sudo cp aux_src/makeStr.py /usr/local/bin/
+        continue-on-error: true # This is not critical to succeed.
 
       - name: Install dependencies
-        # ERROR: Cannot install atomate2 and atomate2[strict,tests]==0.0.1 because these package versions have conflicting dependencies.
-        # The conflict is caused by:
-        #   atomate2[strict,tests] 0.0.1 depends on pymatgen>=2023.10.4
-        #   atomate2[strict,tests] 0.0.1 depends on pymatgen==2023.10.4; extra == "strict"
-        # ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
-        #   chgnet 0.2.1 depends on pymatgen>=2023.5.31
-        #   emmet-core 0.70.0 depends on pymatgen>=2023.10.11
         run: |
           python -m pip install --upgrade pip
+          mkdir -p ~/.abinit/pseudos
+          cp -r tests/test_data/abinit/pseudos/ONCVPSP-PBE-SR-PDv0.4 ~/.abinit/pseudos
           # ase needed to get FrechetCellFilter used by ML force fields
           pip install git+https://gitlab.com/ase/ase
-          pip install .[strict,tests]
+          pip install .[strict,tests,abinit]
           pip install torch-runstats
-          pip install --no-deps nequip
+          pip install --no-deps nequip==0.5.6
+
+      - name: Install pymatgen from master if triggered by pymatgen repo dispatch
+        if: github.event_name == 'repository_dispatch' && github.event.action == 'pymatgen-ci-trigger'
+        run: pip install --upgrade 'git+https://github.com/materialsproject/pymatgen@${{ github.event.client_payload.pymatgen_ref }}'
 
       - name: Test Notebooks
         run: pytest --nbmake ./tutorials

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,14 +1,15 @@
 default_language_version:
   python: python3
+exclude: ^(.github/|tests/test_data/abinit/)
 repos:
 - repo: https://github.com/charliermarsh/ruff-pre-commit
-  rev: v0.3.0
+  rev: v0.4.2
   hooks:
   - id: ruff
     args: [--fix]
   - id: ruff-format
 - repo: https://github.com/pre-commit/pre-commit-hooks
-  rev: v4.5.0
+  rev: v4.6.0
   hooks:
   - id: check-yaml
   - id: fix-encoding-pragma
@@ -29,7 +30,7 @@ repos:
   - id: rst-directive-colons
   - id: rst-inline-touching-normal
 - repo: https://github.com/pre-commit/mirrors-mypy
-  rev: v1.8.0
+  rev: v1.10.0
   hooks:
   - id: mypy
     files: ^src/

diff --git a/docs/about/contributors.md b/docs/about/contributors.md
@@ -90,8 +90,8 @@ Lawrence Berkeley National Laboratory
 [0000-0003-3439-4856]: https://orcid.org/0000-0003-3439-4856
 
 **Matthew McDermott** [![gh]][mattmcdermott] [![orc]][0000-0002-4071-3000] \
-PhD student \
-University of California, Berkeley
+Postdoctoral Researcher \
+Lawrence Berkeley National Laboratory
 
 [mattmcdermott]: https://github.com/mattmcdermott
 [0000-0002-4071-3000]: https://orcid.org/0000-0002-4071-3000

diff --git a/docs/dev/abinit_tests.md b/docs/dev/abinit_tests.md
@@ -0,0 +1,165 @@
+
+# Writing ABINIT Tests
+
+## Considerations
+
+Atomate2 includes tools to help write tests for ABINIT workflows. The primary
+considerations with the atomate2 testing environment are listed below.
+
+### Pseudopotentials
+
+ABINIT heavily relies on pseudo potential tables accessible through abipy. These
+tables are large in size or can be downloaded on the fly at input creation time.
+Therefore, a smaller pseudopotential table is included for just a few elements.
+Structures to be used for testing should be based on the missing elements should
+be added to this pseudopotential table.
+
+Note that information from the real pseudopotential files is used in the creation
+of the jobs and flows, hence fake pseudopotentials are not an option here.
+
+
+### File sizes
+
+The files produced by ABINIT are generally large and would overwhelm the size of the
+atomate2 repository if not managed carefully. For example, density (DEN) and
+wavefunction (WFK) files can easily be ten's of megabytes which can quickly add up.
+
+To overcome this, we only include essential ABINIT output files in the atomate2 test
+folder. For example, DEN, WFK and other density information is not needed in most
+instances. For these outputs files which can be required inputs for some jobs, fake
+files are generated in the test folder and the linking copying of the files is checked
+using these fake files. These fake files contain the information whether they are
+a regular file or a symbolic link to another regular file.
+
+### ABINIT execution
+
+We cannot run ABINIT on the testing server due to the computational expense. Furthermore,
+different versions/compilations of ABINIT may yield slightly different total energies
+which are not important for our tests – we only test that (i) inputs are written
+correctly, (ii) outputs are parsed correctly, and (iii) jobs are connected together
+properly.
+
+This is achieved by "mocking" ABINIT execution. Instead of running ABINIT, we copy reference
+output files into the current directory and then proceed with running the workflow.
+
+Note that it is still possible to run integration tests where ABINIT is executed by
+passing the `--abinit-integration` option to pytest:
+
+```bash
+pytest --abinit-integration
+```
+
+When executing tests with the real abinit, larger deviations are expected depending on
+ABINIT version, compilation options, etc.
+
+## Generation of new tests
+
+Atomate2 provides an automatic procedure to prepare ABINIT data (reference files)
+for use in atomate2 tests. It does this by:
+
+- Preparing a standard maker file that will be used to generate the reference files as
+  well as to run the tests.
+- Create the flow or job using the maker file and a structure file (cif file or other).
+- Copying ABINIT inputs and outputs into the correct directory structure and creating
+  the fake input and output files for large files when relevant.
+- Providing a template unit test that is configured for the specific workflow.
+
+There are four stages to generating the test data:
+
+### 1. Create the maker file
+
+The `atm dev abinit-script-maker` command allows to prepare a template `create_maker.py`
+script in the current directory. You should adapt this file for the maker you intend
+to test. Try to make sure to use parameters that allow the generation to be executed
+relatively fast. Additionally, with the integration testing capability for ABINIT
+workflows, the faster the workflows can run, the better.
+
+After adapting the `create_maker.py` script for the maker to be tested, you should run
+it:
+
+```bash
+python create_maker.py
+```
+
+This will generate a `maker.json` containing the serialized version of the maker together
+with additional metadata information, inter alia the string of the `create_maker.py` script
+itself, the author and author mail (extracted from git config information), date of
+generation, ...
+
+### 2. Generate the reference files
+
+The `atm dev abinit-generate-reference` command runs the workflow for a given structure
+in the current directory using `jobflow`'s `run_locally` option. This will execute the
+different abinit jobs of the flow in separated run folders, and dump an `outputs.json`
+file with all the outputs of the flow.
+
+Note that the structure is specified either implicitly in an `initial_structure.json`
+file:
+
+```bash
+atm dev abinit-generate-reference
+```
+
+or explicitly, e.g. as a path to a CIF file:
+
+```bash
+atm dev abinit-generate-reference /path/to/structure.cif
+```
+
+When an explicit structure file is passed to the `atm dev abinit-generate-reference`
+command, the structure is dumped to an `initial_structure.json` file.
+
+### 3. Copy files to the test_data folder
+
+Now that the flow has been executed, the generated input and output files have to be
+copied to the tests/test_data/abinit folder. This is achieved using the
+`atm dev abinit-test-data` command:
+
+```bash
+atm dev abinit-test-data TEST_NAME
+```
+
+You should change `TEST_NAME` to be a name for the workflow test. Note, `TEST_NAME` should not
+contain spaces or punctuation. For example, the band structure workflow test data was
+genenerated using `atm dev vasp-test-data Si_band_structure`.
+
+This will automatically detect whether the Maker is a Job Maker or a Flow Maker and
+copy files in the corresponding `tests/test_data/abinit/jobs/NameOfMaker/TEST_NAME`
+or `tests/test_data/abinit/flows/NameOfMaker/TEST_NAME` directory. It will create
+the `NameOfMaker/TEST_NAME` directory structure and copy the information about the
+Maker and initial structure, i.e. `maker.json`, `initial_structure.json` and
+`make_info.json` if present.
+
+Each job of the flow has its own directory in the `TEST_NAME` directory,
+with one directory for each "restart" (i.e. index of the job). The directory
+for a given ABINIT run, hereafter referenced as `REF_RUN_FOLDER` thus has the
+following structure:
+
+`tests/test_data/abinit/jobs_OR_flows/NameOfMaker/TEST_NAME/JOB_NAME/JOB_INDEX`
+
+where `JOB_NAME` is the name of the job and `JOB_INDEX` is the index of the job
+(usually "1" unless the job is restarted).
+
+**Note:** For the script to run successfully, every job in the workflow must have
+a unique name. For example, there cannot be two calculations called "relax".
+Instead you should ensure they are named something like "relax 1" and "relax 2".
+
+Each `REF_RUN_FOLDER` contains:
+- A folder called "inputs" with the run.abi and abinit_input.json, as well as with the
+  indata, outdata and tmpdata directories. The indata directory potentially contains
+  the reference fake input files needed for the job to be executed (e.g. a fake link to a
+  previous DEN file).
+- A folder called "outputs" with the run.abo, run.err, run.log, as well as with the
+  indata, outdata and tmpdata directories. In the indata, outdata and tmpdata directories,
+  the large files are replaced by fake reference files while the necessary files for the
+  workflow test execution are present.
+
+### 4. Write the test
+
+The `atm dev abinit-test-data` command also generates a template test method that is
+configured to use the test data that was just generated. In this template test method,
+the maker, the initial_structure and the reference paths (i.e. the mapping from the job
+name and job index to the reference job folder) are automatically loaded from the test
+folder.
+
+Add `assert` statements to validate the workflow outputs.
diff --git a/docs/dev/workflow_tutorial.md b/docs/dev/workflow_tutorial.md
@@ -2,14 +2,14 @@
 
 ## Anatomy of an `atomate2` computational workflow (i.e., what do I need to write?)
 
-Every `atomate2` workflow is an instance of jobflow's `Flow ` class, which is a collection of Job and/or other `Flow` objects. So your end goal is to produce a `Flow `.
+Every `atomate2` workflow is an instance of jobflow's `Flow` class, which is a collection of Job and/or other `Flow` objects. So your end goal is to produce a `Flow`.
 
-In the context of computational materials science, `Flow ` objects are most easily created by a `Maker`, which contains a factory method make() that produces a `Flow `, given certain inputs. Typically, the input to `Maker`.make() includes atomic coordinate information in the form of a `pymatgen` `Structure` or `Molecule` object. So the basic signature looks like this:
+In the context of computational materials science, `Flow` objects are most easily created by a `Maker`, which contains a factory method make() that produces a `Flow`, given certain inputs. Typically, the input to `Maker`.make() includes atomic coordinate information in the form of a `pymatgen` `Structure` or `Molecule` object. So the basic signature looks like this:
 
 ```py
 class ExampleMaker(Maker):
     def make(self, coordinates: Structure) -> Flow:
-        # take the input coordinates and return a `Flow `
+        # take the input coordinates and return a `Flow`
         return Flow(...)
 ```
 
@@ -49,15 +49,16 @@ Finally, most `atomate2` workflows return structured output in the form of "Task
 **TODO - extend code block above to illustrate TaskDoc usage**
 
 In summary, a new `atomate2` workflow consists of the following components:
- - A `Maker` that actually generates the workflow
- - One or more `Job` and/or `Flow ` classes that define the discrete steps in the workflow
- - (optionally) an `InputGenerator` that produces a `pymatgen` `InputSet` for writing calculation input files
- - (optionally) a `TaskDocument` that defines a schema for storing the output data
+
+- A `Maker` that actually generates the workflow
+- One or more `Job` and/or `Flow` classes that define the discrete steps in the workflow
+- (optionally) an `InputGenerator` that produces a `pymatgen` `InputSet` for writing calculation input files
+- (optionally) a `TaskDocument` that defines a schema for storing the output data
 
 ## Where do I put my code?
 
 Because of the distributed design of the MP Software Ecosystem, writing a complete new workflow may involve making contributions to more than one GitHub repository. The following guidelines should help you understand where to put your contribution.
 
- - All workflow code (`Job`, `Flow `, `Maker`) belongs in `atomate2`
- - `InputSet` and `InputGenerator` code belongs in `pymatgen`. However, if you need to create these classes from scratch (i.e., you are working with a code that is not already supported in`pymatgen`), then it is recommended to include them in `atomate2` at first to facilitate rapid iteration. Once mature, they can be moved to `pymatgen` or to a `pymatgen` [addon package](https://pymatgen.org/addons).
- - `TaskDocument` schemas should generally be developed in `atomate2` alongside the workflow code. We recommend that you first check emmet to see if there is an existing schema that matches what you need. If so, you can import it. If not, check [`cclib`](https://cclib.github.io/). `cclib` output can be imported via [`atomate2.common.schemas.TaskDocument`](https://github.com/materialsproject/atomate2/blob/main/src/atomate2/common/schemas/cclib.py). If neither code has what you need, then new schemas should be developed within `atomate2` (or `cclib`).
+- All workflow code (`Job`, `Flow`, `Maker`) belongs in `atomate2`
+- `InputSet` and `InputGenerator` code belongs in `pymatgen`. However, if you need to create these classes from scratch (i.e., you are working with a code that is not already supported in`pymatgen`), then it is recommended to include them in `atomate2` at first to facilitate rapid iteration. Once mature, they can be moved to `pymatgen` or to a `pymatgen` [addon package](https://pymatgen.org/addons).
+- `TaskDocument` schemas should generally be developed in `atomate2` alongside the workflow code. We recommend that you first check emmet to see if there is an existing schema that matches what you need. If so, you can import it. If not, check [`cclib`](https://cclib.github.io/). `cclib` output can be imported via [`atomate2.common.schemas.TaskDocument`](https://github.com/materialsproject/atomate2/blob/main/src/atomate2/common/schemas/cclib.py). If neither code has what you need, then new schemas should be developed within `atomate2` (or `cclib`).
diff --git a/docs/user/codes/vasp.md b/docs/user/codes/vasp.md
@@ -247,7 +247,7 @@ adjust them if necessary. The default might not be strict enough
 for your specific case.
 ```
 
-### Lobster
+### LOBSTER
 
 Perform bonding analysis with [LOBSTER](http://cohp.de/) and [LobsterPy](https://github.com/jageo/lobsterpy)
 
@@ -367,6 +367,43 @@ for number, (key, cohp) in enumerate(
     plotter.add_cohp(key, cohp)
     plotter.save_plot(f"plots_cation_anion_bonds{number}.pdf")
 ```
+# Running the LOBSTER workflow without database and with one job script only
+
+It is also possible to run the VASP-LOBSTER workflow with a minimal setup.
+In this case, you will run the VASP calculations on the same node as the LOBSTER calculations.
+In between, the different computations you will switch from MPI to OpenMP parallelization.
+
+For example, for a node with 48 cores, you could use an adapted version of the following SLURM script:
+
+```bash
+#!/bin/bash
+#SBATCH -J vasplobsterjob
+#SBATCH -o ./%x.%j.out
+#SBATCH -e ./%x.%j.err
+#SBATCH -D ./
+#SBATCH --mail-type=END
+#SBATCH [email protected]
+#SBATCH --time=24:00:00
+#SBATCH --nodes=1
+#This needs to be adapted if you run with different cores
+#SBATCH --ntasks=48
+
+# ensure you load the modules to run VASP, e.g., module load vasp
+module load my_vasp_module
+# please activate the required conda environment
+conda activate my_environment
+cd my_folder
+# the following script needs to contain the workflow
+python xyz.py
+```
+
+The `LOBSTER_CMD` now needs an additional export of the number of threads.
+
+```yaml
+VASP_CMD: <<VASP_CMD>>
+LOBSTER_CMD: OMP_NUM_THREADS=48 <<LOBSTER_CMD>>
+```
+
 
 (modifying_input_sets)=
 Modifying input sets

diff --git a/docs/user/install.md b/docs/user/install.md
@@ -175,6 +175,16 @@ To install the packages run:
 pip install atomate2
 ```
 
+If you would like to use more specialized capabilities of `atomate2` such as the phonon, Lobster or force field workflows, you would need to run one of
+
+```bash
+pip install atomate2[phonons]
+pip install atomate2[lobster]
+pip install atomate2[forcefields]
+```
+
+See [`pyproject.toml`](https://github.com/materialsproject/atomate2/blob/main/pyproject.toml) for all available optional dependency sets. More detailed instructions can be found under [dev installation](../dev/dev_install.md).
+
 ## Configure calculation output database
 
 The next step is to configure your MongoDB database that will be used to store