Update tutorial, use YAML options, set up linter (#2)

lab-cosmo · Oct 4, 2024 · 8f09e41 · 8f09e41
1 parent b87b7b2
commit 8f09e41
Show file tree

Hide file tree

Showing 101 changed files with 3,918 additions and 123,089 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,7 @@
 **__pycache__/
 **.DS_Store
+**.tox/
+**.pytest_cache/
 build/
 rholearn.egg-info
 
@@ -9,7 +11,7 @@ rholearn.egg-info
 example/rholearn-aims-tutorial/part-1-dft/data
 example/rholearn-aims-tutorial/part-2-ml/checkpoint
 example/rholearn-aims-tutorial/part-2-ml/evaluation
-example/rholearn-aims-tutorial/part-2-ml/logs
+example/rholearn-aims-tutorial/part-2-ml/outputs
 
 tests/rholearn/generate_example_data/*/data
 tests/rholearn/generate_example_data/*/processed_data
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,2 @@
+include rholearn/options/*.yaml
+exclude tox.ini
diff --git a/README.md b/README.md
@@ -43,22 +43,28 @@ Leveraging the speed- and memory-efficient operations of `torch`, and using buil
 
 ### Installing `rholearn`
 
-With a working `conda` installation, install as follows:
+With a working `conda` installation, first set up an environment:
 ```bash
-git clone https://github.com/lab-cosmo/rholearn
-cd rholearn
-conda env create --file install/environment.yaml
+conda create -n rho python==3.11
 conda activate rho
-./install/extra-pip-packages.sh
-pip install .
+```
+Then clone and install `rholearn`:
+```bash
+git clone https://github.com/lab-cosmo/rholearn.git
+cd rholearn
+# Specify CPU-only torch
+pip install --extra-index-url https://download.pytorch.org/whl/cpu .
 ```
 
-Run a few (currently limited) tests on loss functions with: `pytest tests/rholearn/loss.py`
+Running `tox` from the top directory will run linting and formatting.
+To run some tests (currently limited to testing `rholearn.loss`), run `pytest tests/rholearn/loss.py`.
 
 ### Installing `FHI-aims`
 
 For generating reference data, using the `aims_interface` of `rholearn`, a working installation of **`FHIaims >= 240926`** is required. FHI-aims is not open source but is free for academic use. Follow the instructions on their website [fhi-aims.org/get-the-code](https://fhi-aims.org/get-the-code/) to get and build the code. The end result should be an executable, compiled for your specific system.
 
+There are also useful tutorials on the basics of running `FHI-aims` [here](https://fhi-aims-club.gitlab.io/tutorials/basics-of-running-fhi-aims/).
+
 
 ### Basic usage
 
@@ -67,19 +73,16 @@ User defined settings are specified in settings modules that are locally importe
 Basic usage is as follows:
 
 ```python
+# Specify user options "dft-options.yaml", "hpc-options.yaml", and "ml-options.yaml"
+# ...
+# then:
 import rholearn
 
-# User settings
-from dft_settings import DFT_SETTINGS
-from hpc_settings import HPC_SETTINGS
-from ml_settings import ML_SETTINGS  # training settings
-from net import NET  # custom NN architecture
-
 # Train a model
-rholearn.train(DFT_SETTINGS, ML_SETTINGS, NET)
+rholearn.train()
 
 # Evaluate
-rholearn.eval(DFT_SETTINGS, ML_SETTINGS, HPC_SETTINGS)
+rholearn.eval()
 ```
 
-**Tutorial:** for a more in-depth walkthrough of the functionality, see this [tutorial](example/rholearn-aims-tutorial/) on data generation using `FHI-aims` and model training using `rholearn`.
+**Tutorial:** for a more in-depth walkthrough of the functionality, see this [tutorial](example/rholearn-aims-tutorial/README.md) on data generation using `FHI-aims` and model training using `rholearn`.
diff --git a/example/rholearn-aims-tutorial/README.md b/example/rholearn-aims-tutorial/README.md
@@ -0,0 +1,19 @@
+# Tutorial: predicting electron densities with `rholearn` and `FHI-aims`
+
+## Overview
+
+This tutorial follows two parts: 1) data generation with `FHI-aims` and 2) model training with `rholearn`. Follow the instructions in the README files in subdirectories [`part-1-dft`](part-1-dft/README.md) and [`part-2-ml`](part-2-ml/README.md). The data used is 128-molecule subset of the QM7 database that contain atom types H, C, O, N. 
+
+First, data is generated with `FHI-aims` in a two step process: a) converging SCF calculations to compute the self consistent electron density for each frame, then b) decomposing the electron density scalar field onto a fitted basis set.
+
+Second, the reference data output from the first step, in the form of fitting coefficients, projections, and overlap matrices, form the dataset for training a machine learning model. In `rholearn`, arbitrary descriptor-based equivariant neural networks can be used to learn the mapping from nuclear coordinates to basis set expansion coefficients. 
+
+Typically, the descriptor is an equivariant power spectrum (or $\lambda$ -SOAP), which is passed through a linear layer or small multi-layer perceptron to transform it into a vector of predicted coefficients. A model is trained iteratively over a number of epochs, optimizing the NN weights by backpropagation and gradient descent.
+
+## Supporting notebooks
+
+Some basic and optional extras for each section of each tutorial README is provided in jupyter notebooks of the same name. These are intended to aid visualization and inspection of outputs.
+
+## Setup
+
+Follow the `rholearn` and `FHI-aims` installation instructions in the README of the main repository, [here](../../README.md).
diff --git a/example/rholearn-aims-tutorial/part-1-dft/README.md b/example/rholearn-aims-tutorial/part-1-dft/README.md
@@ -2,56 +2,57 @@
 
 ## 1.0: TLDR of requried commands
 
-After modifying the appropriate user-settings files, the commands needed to generate data for training a model are below. For a full explanation of each, read on to the following sections.
+After modifying the user options in `dft-options.yaml` and `hpc-options.yaml`, the commands needed to generate data for training a model are below. For a full explanation of each, read on to the following sections.
 
 ```bash
-# Modify dft_settings.py and hpc_settings.py as appropriate
+# Modify dft-options.yaml and hpc-options.yaml as appropriate
 # ...
 
 # Run SCF
-python -c 'from rholearn.aims_interface import scf; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; scf.run_scf(DFT_SETTINGS, HPC_SETTINGS);'
+python -c 'from rholearn.aims_interface import scf; scf.run_scf()'
 
 # Process SCF
-python -c 'from rholearn.aims_interface import scf; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; scf.process_scf(DFT_SETTINGS, HPC_SETTINGS);'
+python -c 'from rholearn.aims_interface import scf; scf.process_scf()'
 
 # Setup RI
-python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.set_up_ri_fit_sbatch(DFT_SETTINGS, HPC_SETTINGS);'
+python -c 'from rholearn.aims_interface import ri_fit; ri_fit.set_up_ri_fit_sbatch()'
 
 # Run RI
-python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.run_ri_fit(DFT_SETTINGS, HPC_SETTINGS);'
+python -c 'from rholearn.aims_interface import ri_fit; ri_fit.run_ri_fit()'
 
 # Process RI
-python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.process_ri_fit(DFT_SETTINGS, HPC_SETTINGS)'
+python -c 'from rholearn.aims_interface import ri_fit; ri_fit.process_ri_fit()'
+
+# Optional: for a consistency check
+python -c 'from rholearn.aims_interface import ri_rebuild; ri_rebuild.run_ri_rebuild()'
 ```
 
-## 1.1: Specify DFT and HPC settings
+## 1.1: Specify DFT and HPC options
 
-Inspect the file `dft_settings.py` and edit the variables found there specific for your set up. `FRAME_IDXS` can be edited, though in the interest of brevity of the demonstration this can left alone. 
+Inspect the file `dft-options.yaml` and edit the variables found there, specific for your set up.
 
-You can also inspect the default DFT settings, which can be printed with:
+You can also inspect the default DFT options, which can be printed with:
 ```python
 import pprint
-from rholearn.settings.defaults import dft_defaults
+from rholearn.options import get_defaults
 
-pprint.pprint(dft_defaults.DFT_DEFAULTS)
+pprint.pprint(get_defaults("dft"))
 ```
-Any of these can be modified by specification in the local file `dft_settings.py`.
+Any of these can be modified by specification in the local file `dft-options.yaml`.
 
-**Note**: the settings in `hpc_settings.py` will also need to be changed, depending on your cluster. The way that `rholearn.aims_interface` creates run scripts for HPC resources has only been tested on a handful of clusters, all with slurm schedulers. It is certainly not general and may require some hacking if not compatible with your systems. The `"load_modules"` and `"export_vars"` attempt to allows generic loading of modules and exporting of environment variables, respectively, but something may be missing.
+**Note**: the options in `hpc-options.yaml` will also need to be changed, depending on your cluster. The way that `rholearn.aims_interface` creates run scripts for HPC resources has only been tested on a handful of clusters, all with slurm schedulers. It is not completely general and may require some hacking if not compatible with your systems. The `"LOAD_MODULES"` and `"EXPORT_VARIABLES"` attempt to allows generic loading of modules and exporting of environment variables, respectively, but something may be missing.
 
 ## 1.2: Converge SCF
 
 Run the SCF procedure. This submits a parallel array of SCF calculations for each structure in the dataset.
 
 ```python
 from rholearn.aims_interface import scf
-from dft_settings import DFT_SETTINGS
-from hpc_settings import HPC_SETTINGS
 
-scf.run_scf(DFT_SETTINGS, HPC_SETTINGS)
+scf.run_scf()
 
 # Alternatively: a one-liner for the command line
-python -c 'from rholearn.aims_interface import scf; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; scf.run_scf(DFT_SETTINGS, HPC_SETTINGS);'
+python -c 'from rholearn.aims_interface import scf; scf.run_scf();'
 ```
 After the calculation has finished, the run directory for each structure contains the following files:
 
@@ -62,26 +63,25 @@ raw/                                # Raw data directory
     ├── control.in                  # Input control file for FHI-aims SCF step
     ├── cube_001_total_density.cube # Cube file containing total electron density
     ├── D_spin_01_kpt_000001.csc    # Density matrix restart file
-    ├── dft_settings.py             # Copy of python script with DFT settings
+    ├── dft-options.yaml            # Copy of DFT options
     ├── geometry.in                 # Input file with atomic coordinates and species
+    ├── hpc-options.yaml            # Copy of HPC options
     └── slurm_*.out                 # Output file from SLURM job scheduler
 
 └── 1/
     ...
 ```
 
-The calculation has (hopefully) converged to the SCF solution for the given input settings, and saved the converged solution to the checkpoint density matrix file `D_*.csc`.
+The calculation has (hopefully) converged to the SCF solution for the given input options, and saved the converged solution to the checkpoint density matrix file `D_*.csc`.
 
 Now process the SCF outputs - this essentially just parses `aims.out` to extracts various information and pickles it to file `calc_info.pickle`.
 ```python
 from rholearn.aims_interface import scf
-from dft_settings import DFT_SETTINGS
-from hpc_settings import HPC_SETTINGS
 
-scf.process_scf(DFT_SETTINGS)
+scf.process_scf()
 
 # Alternatively: a one-liner for the command line
-python -c 'from rholearn.aims_interface import scf; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; scf.process_scf(DFT_SETTINGS, HPC_SETTINGS);'
+python -c 'from rholearn.aims_interface import scf; scf.process_scf()'
 ```
 
 In the supporting notebook [part-1-dft](./part-1-dft.ipynb), SCF convergence can be checked and reference SCF electron densitites visualised.
@@ -97,25 +97,21 @@ Now RI fitting can be performed. In `FHI-aims`, the following steps are executed
 First, **create the input files** for the RI calculation.
 ```python
 from rholearn.aims_interface import ri_fit
-from dft_settings import DFT_SETTINGS
-from hpc_settings import HPC_SETTINGS
 
-ri_fit.set_up_ri_fit_sbatch(DFT_SETTINGS, HPC_SETTINGS)
+ri_fit.set_up_ri_fit_sbatch()
 
 # Alternatively: a one-liner for the command line
-python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.set_up_ri_fit_sbatch(DFT_SETTINGS, HPC_SETTINGS);'
+python -c 'from rholearn.aims_interface import ri_fit; ri_fit.set_up_ri_fit_sbatch()'
 ```
 
 Next, **run the RI fitting** procedure.
 ```python
 from rholearn.aims_interface import ri_fit
-from dft_settings import DFT_SETTINGS
-from hpc_settings import HPC_SETTINGS
 
-ri_fit.run_ri_fit(DFT_SETTINGS, HPC_SETTINGS)
+ri_fit.run_ri_fit()
 
 # Alternatively: a one-liner for the command line
-python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.run_ri_fit(DFT_SETTINGS, HPC_SETTINGS);'
+python -c 'from rholearn.aims_interface import ri_fit; ri_fit.run_ri_fit()'
 ```
 
 After the calculation has completed, the directory structure looks like:
@@ -127,8 +123,9 @@ raw/                                # Raw data directory
         ├── basis_info.out          # The RI basis set definition
         ├── control.in              # Input control file for FHI-aims RI step
         ├── D_*.csc                 # Symlink to the density matrix restart file
-        ├── dft_settings.py         # Copy of python script with DFT settings
+        ├── dft-options.yaml        # Copy DFT options
         ├── geometry.in             # Input file with atomic coordinates and species
+        ├── hpc-options.yaml        # Copy HPC options
         ├── partition_tab.out       # Output file with partitioning information
         ├── rho_rebuilt_ri.out      # Reconstructed electron density from RI fitting
         ├── rho_scf.out             # Electron density from SCF calculation
@@ -147,13 +144,11 @@ Finally, **process the RI outputs**.
 
 ```python
 from rholearn.aims_interface import ri_fit
-from dft_settings import DFT_SETTINGS
-from hpc_settings import HPC_SETTINGS
 
-ri_fit.process_ri_fit(DFT_SETTINGS, HPC_SETTINGS)
+ri_fit.process_ri_fit()
 
 # Alternatively: a one-liner for the command line
-python -c 'from rholearn.aims_interface import ri_fit; from dft_settings import DFT_SETTINGS; from hpc_settings import HPC_SETTINGS; ri_fit.process_ri_fit(DFT_SETTINGS, HPC_SETTINGS)'
+python -c 'from rholearn.aims_interface import ri_fit; ri_fit.process_ri_fit()'
 ```
 
 This creates a set of subdirectories, one for each frame, containing the following processed data:
@@ -172,4 +167,19 @@ processed/                            # Processed data directory
     ...
 ```
 
-The processed data contained in `processed/`, along with the `.xyz` file in `data/`, will be used as the reference data to train a surrogate model in the next step.
+The processed data contained in `processed/`, along with the `.xyz` file in `data/`, will be used as the reference data to train a surrogate model in the next step, the instructions for which can be found in [the next README](../part-2-ml/README.md).
+
+## 1.4: [Optional] Check the rebuild consistency
+
+This step isn't required to generate data, but can be used as a consistency check. One can take the vector of RI coefficients `ri_restart_coeffs.out` and perform an RI rebuild calculation in `FHI-aims`. The field constructed should be exactly equivalent to the field `rho_rebuilt_ri.out` output in the RI step above.
+
+```python
+from rholearn.aims_interface import ri_fit
+
+ri_fit.process_ri_fit()
+
+# Alternatively: a one-liner for the command line
+python -c 'from rholearn.aims_interface import ri_rebuild; ri_rebuild.run_ri_rebuild()'
+```
+
+One can check this consistency in the [supporting notebook](part-1-dft.ipynb).
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		include rholearn/options/*.yaml
		exclude tox.ini