Skip to content

Commit

Permalink
Update tutorial, use YAML options, set up linter (#2)
Browse files Browse the repository at this point in the history
  • Loading branch information
jwa7 authored Oct 4, 2024
1 parent b87b7b2 commit 8f09e41
Show file tree
Hide file tree
Showing 101 changed files with 3,918 additions and 123,089 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
**__pycache__/
**.DS_Store
**.tox/
**.pytest_cache/
build/
rholearn.egg-info

Expand All @@ -9,7 +11,7 @@ rholearn.egg-info
example/rholearn-aims-tutorial/part-1-dft/data
example/rholearn-aims-tutorial/part-2-ml/checkpoint
example/rholearn-aims-tutorial/part-2-ml/evaluation
example/rholearn-aims-tutorial/part-2-ml/logs
example/rholearn-aims-tutorial/part-2-ml/outputs

tests/rholearn/generate_example_data/*/data
tests/rholearn/generate_example_data/*/processed_data
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
include rholearn/options/*.yaml
exclude tox.ini
35 changes: 19 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,22 +43,28 @@ Leveraging the speed- and memory-efficient operations of `torch`, and using buil

### Installing `rholearn`

With a working `conda` installation, install as follows:
With a working `conda` installation, first set up an environment:
```bash
git clone https://github.com/lab-cosmo/rholearn
cd rholearn
conda env create --file install/environment.yaml
conda create -n rho python==3.11
conda activate rho
./install/extra-pip-packages.sh
pip install .
```
Then clone and install `rholearn`:
```bash
git clone https://github.com/lab-cosmo/rholearn.git
cd rholearn
# Specify CPU-only torch
pip install --extra-index-url https://download.pytorch.org/whl/cpu .
```

Run a few (currently limited) tests on loss functions with: `pytest tests/rholearn/loss.py`
Running `tox` from the top directory will run linting and formatting.
To run some tests (currently limited to testing `rholearn.loss`), run `pytest tests/rholearn/loss.py`.

### Installing `FHI-aims`

For generating reference data, using the `aims_interface` of `rholearn`, a working installation of **`FHIaims >= 240926`** is required. FHI-aims is not open source but is free for academic use. Follow the instructions on their website [fhi-aims.org/get-the-code](https://fhi-aims.org/get-the-code/) to get and build the code. The end result should be an executable, compiled for your specific system.

There are also useful tutorials on the basics of running `FHI-aims` [here](https://fhi-aims-club.gitlab.io/tutorials/basics-of-running-fhi-aims/).


### Basic usage

Expand All @@ -67,19 +73,16 @@ User defined settings are specified in settings modules that are locally importe
Basic usage is as follows:

```python
# Specify user options "dft-options.yaml", "hpc-options.yaml", and "ml-options.yaml"
# ...
# then:
import rholearn

# User settings
from dft_settings import DFT_SETTINGS
from hpc_settings import HPC_SETTINGS
from ml_settings import ML_SETTINGS # training settings
from net import NET # custom NN architecture

# Train a model
rholearn.train(DFT_SETTINGS, ML_SETTINGS, NET)
rholearn.train()

# Evaluate
rholearn.eval(DFT_SETTINGS, ML_SETTINGS, HPC_SETTINGS)
rholearn.eval()
```

**Tutorial:** for a more in-depth walkthrough of the functionality, see this [tutorial](example/rholearn-aims-tutorial/) on data generation using `FHI-aims` and model training using `rholearn`.
**Tutorial:** for a more in-depth walkthrough of the functionality, see this [tutorial](example/rholearn-aims-tutorial/README.md) on data generation using `FHI-aims` and model training using `rholearn`.
19 changes: 19 additions & 0 deletions example/rholearn-aims-tutorial/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Tutorial: predicting electron densities with `rholearn` and `FHI-aims`

## Overview

This tutorial follows two parts: 1) data generation with `FHI-aims` and 2) model training with `rholearn`. Follow the instructions in the README files in subdirectories [`part-1-dft`](part-1-dft/README.md) and [`part-2-ml`](part-2-ml/README.md). The data used is 128-molecule subset of the QM7 database that contain atom types H, C, O, N.

First, data is generated with `FHI-aims` in a two step process: a) converging SCF calculations to compute the self consistent electron density for each frame, then b) decomposing the electron density scalar field onto a fitted basis set.

Second, the reference data output from the first step, in the form of fitting coefficients, projections, and overlap matrices, form the dataset for training a machine learning model. In `rholearn`, arbitrary descriptor-based equivariant neural networks can be used to learn the mapping from nuclear coordinates to basis set expansion coefficients.

Typically, the descriptor is an equivariant power spectrum (or $\lambda$ -SOAP), which is passed through a linear layer or small multi-layer perceptron to transform it into a vector of predicted coefficients. A model is trained iteratively over a number of epochs, optimizing the NN weights by backpropagation and gradient descent.

## Supporting notebooks

Some basic and optional extras for each section of each tutorial README is provided in jupyter notebooks of the same name. These are intended to aid visualization and inspection of outputs.

## Setup

Follow the `rholearn` and `FHI-aims` installation instructions in the README of the main repository, [here](../../README.md).
86 changes: 48 additions & 38 deletions example/rholearn-aims-tutorial/part-1-dft/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,56 +2,57 @@

## 1.0: TLDR of requried commands

After modifying the appropriate user-settings files, the commands needed to generate data for training a model are below. For a full explanation of each, read on to the following sections.
After modifying the user options in `dft-options.yaml` and `hpc-options.yaml`, the commands needed to generate data for training a model are below. For a full explanation of each, read on to the following sections.

```bash
# Modify dft_settings.py and hpc_settings.py as appropriate
# Modify dft-options.yaml and hpc-options.yaml as appropriate
# ...

# Run SCF
python -c 'from rholearn.aims_interface import scf; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; scf.run_scf(DFT_SETTINGS, HPC_SETTINGS);'
python -c 'from rholearn.aims_interface import scf; scf.run_scf()'

# Process SCF
python -c 'from rholearn.aims_interface import scf; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; scf.process_scf(DFT_SETTINGS, HPC_SETTINGS);'
python -c 'from rholearn.aims_interface import scf; scf.process_scf()'

# Setup RI
python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.set_up_ri_fit_sbatch(DFT_SETTINGS, HPC_SETTINGS);'
python -c 'from rholearn.aims_interface import ri_fit; ri_fit.set_up_ri_fit_sbatch()'

# Run RI
python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.run_ri_fit(DFT_SETTINGS, HPC_SETTINGS);'
python -c 'from rholearn.aims_interface import ri_fit; ri_fit.run_ri_fit()'

# Process RI
python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.process_ri_fit(DFT_SETTINGS, HPC_SETTINGS)'
python -c 'from rholearn.aims_interface import ri_fit; ri_fit.process_ri_fit()'

# Optional: for a consistency check
python -c 'from rholearn.aims_interface import ri_rebuild; ri_rebuild.run_ri_rebuild()'
```

## 1.1: Specify DFT and HPC settings
## 1.1: Specify DFT and HPC options

Inspect the file `dft_settings.py` and edit the variables found there specific for your set up. `FRAME_IDXS` can be edited, though in the interest of brevity of the demonstration this can left alone.
Inspect the file `dft-options.yaml` and edit the variables found there, specific for your set up.

You can also inspect the default DFT settings, which can be printed with:
You can also inspect the default DFT options, which can be printed with:
```python
import pprint
from rholearn.settings.defaults import dft_defaults
from rholearn.options import get_defaults

pprint.pprint(dft_defaults.DFT_DEFAULTS)
pprint.pprint(get_defaults("dft"))
```
Any of these can be modified by specification in the local file `dft_settings.py`.
Any of these can be modified by specification in the local file `dft-options.yaml`.

**Note**: the settings in `hpc_settings.py` will also need to be changed, depending on your cluster. The way that `rholearn.aims_interface` creates run scripts for HPC resources has only been tested on a handful of clusters, all with slurm schedulers. It is certainly not general and may require some hacking if not compatible with your systems. The `"load_modules"` and `"export_vars"` attempt to allows generic loading of modules and exporting of environment variables, respectively, but something may be missing.
**Note**: the options in `hpc-options.yaml` will also need to be changed, depending on your cluster. The way that `rholearn.aims_interface` creates run scripts for HPC resources has only been tested on a handful of clusters, all with slurm schedulers. It is not completely general and may require some hacking if not compatible with your systems. The `"LOAD_MODULES"` and `"EXPORT_VARIABLES"` attempt to allows generic loading of modules and exporting of environment variables, respectively, but something may be missing.

## 1.2: Converge SCF

Run the SCF procedure. This submits a parallel array of SCF calculations for each structure in the dataset.

```python
from rholearn.aims_interface import scf
from dft_settings import DFT_SETTINGS
from hpc_settings import HPC_SETTINGS

scf.run_scf(DFT_SETTINGS, HPC_SETTINGS)
scf.run_scf()

# Alternatively: a one-liner for the command line
python -c 'from rholearn.aims_interface import scf; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; scf.run_scf(DFT_SETTINGS, HPC_SETTINGS);'
python -c 'from rholearn.aims_interface import scf; scf.run_scf();'
```
After the calculation has finished, the run directory for each structure contains the following files:

Expand All @@ -62,26 +63,25 @@ raw/ # Raw data directory
├── control.in # Input control file for FHI-aims SCF step
├── cube_001_total_density.cube # Cube file containing total electron density
├── D_spin_01_kpt_000001.csc # Density matrix restart file
├── dft_settings.py # Copy of python script with DFT settings
├── dft-options.yaml # Copy of DFT options
├── geometry.in # Input file with atomic coordinates and species
├── hpc-options.yaml # Copy of HPC options
└── slurm_*.out # Output file from SLURM job scheduler

└── 1/
...
```

The calculation has (hopefully) converged to the SCF solution for the given input settings, and saved the converged solution to the checkpoint density matrix file `D_*.csc`.
The calculation has (hopefully) converged to the SCF solution for the given input options, and saved the converged solution to the checkpoint density matrix file `D_*.csc`.

Now process the SCF outputs - this essentially just parses `aims.out` to extracts various information and pickles it to file `calc_info.pickle`.
```python
from rholearn.aims_interface import scf
from dft_settings import DFT_SETTINGS
from hpc_settings import HPC_SETTINGS

scf.process_scf(DFT_SETTINGS)
scf.process_scf()

# Alternatively: a one-liner for the command line
python -c 'from rholearn.aims_interface import scf; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; scf.process_scf(DFT_SETTINGS, HPC_SETTINGS);'
python -c 'from rholearn.aims_interface import scf; scf.process_scf()'
```

In the supporting notebook [part-1-dft](./part-1-dft.ipynb), SCF convergence can be checked and reference SCF electron densitites visualised.
Expand All @@ -97,25 +97,21 @@ Now RI fitting can be performed. In `FHI-aims`, the following steps are executed
First, **create the input files** for the RI calculation.
```python
from rholearn.aims_interface import ri_fit
from dft_settings import DFT_SETTINGS
from hpc_settings import HPC_SETTINGS

ri_fit.set_up_ri_fit_sbatch(DFT_SETTINGS, HPC_SETTINGS)
ri_fit.set_up_ri_fit_sbatch()

# Alternatively: a one-liner for the command line
python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.set_up_ri_fit_sbatch(DFT_SETTINGS, HPC_SETTINGS);'
python -c 'from rholearn.aims_interface import ri_fit; ri_fit.set_up_ri_fit_sbatch()'
```

Next, **run the RI fitting** procedure.
```python
from rholearn.aims_interface import ri_fit
from dft_settings import DFT_SETTINGS
from hpc_settings import HPC_SETTINGS

ri_fit.run_ri_fit(DFT_SETTINGS, HPC_SETTINGS)
ri_fit.run_ri_fit()

# Alternatively: a one-liner for the command line
python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.run_ri_fit(DFT_SETTINGS, HPC_SETTINGS);'
python -c 'from rholearn.aims_interface import ri_fit; ri_fit.run_ri_fit()'
```

After the calculation has completed, the directory structure looks like:
Expand All @@ -127,8 +123,9 @@ raw/ # Raw data directory
├── basis_info.out # The RI basis set definition
├── control.in # Input control file for FHI-aims RI step
├── D_*.csc # Symlink to the density matrix restart file
├── dft_settings.py # Copy of python script with DFT settings
├── dft-options.yaml # Copy DFT options
├── geometry.in # Input file with atomic coordinates and species
├── hpc-options.yaml # Copy HPC options
├── partition_tab.out # Output file with partitioning information
├── rho_rebuilt_ri.out # Reconstructed electron density from RI fitting
├── rho_scf.out # Electron density from SCF calculation
Expand All @@ -147,13 +144,11 @@ Finally, **process the RI outputs**.

```python
from rholearn.aims_interface import ri_fit
from dft_settings import DFT_SETTINGS
from hpc_settings import HPC_SETTINGS

ri_fit.process_ri_fit(DFT_SETTINGS, HPC_SETTINGS)
ri_fit.process_ri_fit()

# Alternatively: a one-liner for the command line
python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.process_ri_fit(DFT_SETTINGS, HPC_SETTINGS)'
python -c 'from rholearn.aims_interface import ri_fit; ri_fit.process_ri_fit()'
```

This creates a set of subdirectories, one for each frame, containing the following processed data:
Expand All @@ -172,4 +167,19 @@ processed/ # Processed data directory
...
```

The processed data contained in `processed/`, along with the `.xyz` file in `data/`, will be used as the reference data to train a surrogate model in the next step.
The processed data contained in `processed/`, along with the `.xyz` file in `data/`, will be used as the reference data to train a surrogate model in the next step, the instructions for which can be found in [the next README](../part-2-ml/README.md).

## 1.4: [Optional] Check the rebuild consistency

This step isn't required to generate data, but can be used as a consistency check. One can take the vector of RI coefficients `ri_restart_coeffs.out` and perform an RI rebuild calculation in `FHI-aims`. The field constructed should be exactly equivalent to the field `rho_rebuilt_ri.out` output in the RI step above.

```python
from rholearn.aims_interface import ri_fit

ri_fit.process_ri_fit()

# Alternatively: a one-liner for the command line
python -c 'from rholearn.aims_interface import ri_rebuild; ri_rebuild.run_ri_rebuild()'
```

One can check this consistency in the [supporting notebook](part-1-dft.ipynb).
Loading

0 comments on commit 8f09e41

Please sign in to comment.