Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ep/qmml #666

Closed
wants to merge 57 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
fb188e1
initial commit
epens94 Feb 7, 2024
384f778
commit includdes all files for
epens94 Feb 7, 2024
09fb039
electron embed for painn
epens94 Feb 12, 2024
bb27705
add several evaluation scripts but later refactor and remove if neces…
epens94 Feb 13, 2024
c1cfd98
cleanup
epens94 Feb 13, 2024
6c524c9
clean up and comments added
epens94 Feb 19, 2024
4266da0
add docstring to electron configuration py
epens94 Mar 6, 2024
b433573
clean up gitignore
epens94 Mar 6, 2024
4d4cc5c
fixing docstring in electronic embedding
epens94 Mar 6, 2024
f8494fd
adding further description to electron configuration
epens94 Mar 6, 2024
2d23890
add docstring to electronic embedding fix unclear naming
epens94 Mar 6, 2024
cd06b83
revert Z back to 100
epens94 Mar 6, 2024
800c3b0
fix docstring nuclear embedding
epens94 Mar 6, 2024
aff25bf
fix naming in nuclear embedding
epens94 Mar 6, 2024
f4ca4ee
move ssp to activations module and add docstring
epens94 Mar 6, 2024
c465ce4
change order to be equal in args in nn embedding
epens94 Mar 6, 2024
2156399
clear naming of vars and remove redundant code
epens94 Mar 6, 2024
c86b404
move all embedding classes into one module and delete not needed modules
epens94 Mar 6, 2024
1ebad7a
fix of init
epens94 Mar 6, 2024
f99a432
activation ssp trainable implement,pass nuclear embedding directly
epens94 Mar 6, 2024
3a399fa
bugfix nuclear embedding
epens94 Mar 6, 2024
64b5d2e
missed one replace string activation function
epens94 Mar 7, 2024
9517bd2
missed one replace string activation functionin elec embedding
epens94 Mar 7, 2024
c503f6b
fix docstring, problem with NaN in activation fn, write docstring mor…
epens94 Mar 7, 2024
ff969cf
update save model fn to work with wandb
epens94 Mar 7, 2024
68dcf26
add electronic embedding to so3 net and bugfix painn and schnet rep
epens94 Mar 12, 2024
16ea5ca
Merge pull request #6 from epens94/ep/electronicEmbeeding
epens94 Mar 23, 2024
ba6883c
added sampler.py file
jnsLs May 3, 2023
eb70cb8
stratified sampler works technically
jnsLs May 11, 2023
fcd5e2d
sampler is adapted to MOMONANO data
jnsLs May 11, 2023
916ccdd
updating qmml branch with master branch stuff
epens94 Sep 2, 2024
34c1fc1
Merge branch 'ep/qmml'
epens94 Sep 2, 2024
77571cb
add bernstein rbf to qmml branch
epens94 Sep 2, 2024
88c2064
adding adaptive loss fn module needed for qcml dataset
epens94 Sep 2, 2024
9706526
taken from branch https://github.com/atomistic-machine-learning/schne…
epens94 Sep 2, 2024
024d27d
just added for qcml dataset, because i wrote the dipole moment in wit…
epens94 Sep 2, 2024
9eafe7b
to use pretrained model weights
epens94 Sep 5, 2024
cea9927
simple callback to write out embeds
epens94 Sep 5, 2024
5097565
bugfix
epens94 Sep 5, 2024
0e51fbc
added for embedding analysis
epens94 Sep 6, 2024
5a57b74
add total charge and spin as key to rmd17 dataset, should be done for…
epens94 Sep 15, 2024
1bf6c56
adding needed stuff
epens94 Sep 20, 2024
f8f294e
splitting strategy logic implemented to filter out atomtype, visually…
epens94 Oct 7, 2024
92a795e
adding how many occurences of atomtype in splitting to keep percentag…
epens94 Oct 7, 2024
edb1491
bugfix
epens94 Oct 7, 2024
5289b03
bugfix to correctly account for groups of atomtypes. Right now logic …
epens94 Oct 7, 2024
23e4285
clean up of adaptive loss fn
epens94 Oct 14, 2024
9f65452
rewrite of AtomTypeSplit
epens94 Oct 14, 2024
4732a44
updated configs to work out of the box with cli
epens94 Oct 14, 2024
99911e1
bugfix
epens94 Oct 15, 2024
be62272
include package data
epens94 Oct 15, 2024
74f9388
add package data
epens94 Oct 15, 2024
242c883
add first draft qcml dataclass
epens94 Oct 24, 2024
1e1f2bf
add qcml config
epens94 Oct 24, 2024
0f6fc8b
changes which came up during PR
epens94 Oct 24, 2024
ca4acd7
changes after PR review
epens94 Oct 24, 2024
fe4cdb1
misc
epens94 Oct 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -125,4 +125,6 @@ interfaces/lammps/examples/*/*.dat
interfaces/lammps/examples/*/deployed_model

# batchwise optimizer examples
examples/howtos/howto_batchwise_relaxations_outputs/*
examples/howtos/howto_batchwise_relaxations_outputs/*
.vscode/launch.json
.vscode/*
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include src/schnetpack/train/ressources/partition_spline_for_robust_loss.npz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this file values are loaded which are used to approximate the partition function to for the adaptive loss fn

3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,5 +57,8 @@ script-files = [
"src/scripts/spkdeploy",
]

# Ensure package data such as resources are included
package-data = { "schnetpack.train" = ["ressources/partition_spline_for_robust_loss.npz"] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed to include the file when building the package


[tool.setuptools.dynamic]
version = {attr = "schnetpack.__version__"}
12 changes: 12 additions & 0 deletions src/schnetpack/configs/data/qcml.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
defaults:
- custom

_target_: schnetpack.datasets.QCML

datapath: ${run.data_dir}/qcml.db # data_dir is specified in train.yaml
batch_size: 50
num_train: 0.90
num_val: 0.05
load_properties: [formation_energy,forces,charge,multiplicity]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don`t think it is possible to pass a list like this. If I remember correctly it should work like

load properties:
    - formation_energy
    - forces
    - ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to pass a list like this.

version: 0.0.3

101 changes: 101 additions & 0 deletions src/schnetpack/configs/experiment/qcml.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# @package _global_

defaults:
- override /data: qcml
- override /model/representation: painn
- override /model/representation/radial_basis: bernstein ### NEW ADDED FEATURE
- override /task/scheduler: multistep ### NEW ADDED CONFIG

run:
experiment: qcml

seed: 0

globals:
cutoff: 10.
lr: 1e-3
energy_key: formation_energy
forces_key: forces
total_charge_key: charge ### NEW ADDED FEATURE
spin_key: multiplicity ### NEW ADDED FEATURE

data:
datapath: ???
load_properties: [formation_energy,forces,charge,multiplicity]
batch_size: 50
num_train: 0.90
num_val: 0.05
num_workers: 4
num_val_workers: 4
distance_unit: Bohr
property_units:
energy: Hartree
forces: Hartree/Bohr
transforms:
- _target_: schnetpack.transform.SubtractCenterOfMass
- _target_: schnetpack.transform.RemoveOffsets ### NEW ADDED FEATURE
property: ${globals.energy_key}
remove_mean: True
- _target_: schnetpack.transform.MatScipyNeighborList
cutoff: ${globals.cutoff}
- _target_: schnetpack.transform.CastTo32

model:
representation:
nuclear_embedding:
_target_: schnetpack.nn.embedding.NuclearEmbedding
max_z: 101
num_features: ${globals.representation_features} # same as n_atom_basis
electronic_embeddings: ### NEW ADDED FEATURE
- _target_: schnetpack.nn.embedding.ElectronicEmbedding
property_key: ${globals.total_charge_key}
num_features: ${model.representation.n_atom_basis}
is_charged: true
num_residual: 1
- _target_: schnetpack.nn.embedding.ElectronicEmbedding ### NEW ADDED FEATURE
property_key: ${globals.spin_key}
num_features: ${model.representation.n_atom_basis}
is_charged: false
num_residual: 1
output_modules:
- _target_: schnetpack.atomistic.Atomwise
output_key: ${globals.energy_key}
n_in: ${model.representation.n_atom_basis}
aggregation_mode: sum
- _target_: schnetpack.atomistic.Forces
energy_key: ${globals.energy_key}
force_key: ${globals.forces_key}
postprocessors:
- _target_: schnetpack.transform.CastTo64
- _target_: schnetpack.transform.AddOffsets
property: ${globals.energy_key}
add_mean: True

task:
scheduler_args:
milestones: [3,9,15,18,24,30,36]
outputs:
- _target_: schnetpack.task.ModelOutput
name: ${globals.energy_key}
loss_fn:
_target_: schnetpack.train.AdaptiveLossFunction ### NEW ADDED FEATURE
num_dims: 1
metrics:
mae:
_target_: torchmetrics.regression.MeanAbsoluteError
rmse:
_target_: torchmetrics.regression.MeanSquaredError
squared: False
loss_weight: 0.05
- _target_: schnetpack.task.ModelOutput
name: ${globals.forces_key}
loss_fn:
_target_: schnetpack.train.AdaptiveLossFunction ### NEW ADDED FEATURE
num_dims: 3
metrics:
mae:
_target_: torchmetrics.regression.MeanAbsoluteError
rmse:
_target_: torchmetrics.regression.MeanSquaredError
squared: False
loss_weight: 0.95
3 changes: 2 additions & 1 deletion src/schnetpack/configs/model/representation/painn.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ shared_interactions: False
shared_filters: False
cutoff_fn:
_target_: schnetpack.nn.cutoff.CosineCutoff
cutoff: ${globals.cutoff}
cutoff: ${globals.cutoff}
nuclear_embedding: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
_target_: schnetpack.nn.radial.BernsteinRBF
n_rbf: 32
cutoff: ${globals.cutoff}
init_alpha: 0.95
7 changes: 7 additions & 0 deletions src/schnetpack/configs/task/scheduler/multistep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# @package task
scheduler_cls: torch.optim.lr_scheduler.MultiStepLR
scheduler_monitor: val_loss
scheduler_args:
milestones: ???
gamma: 0.5
last_epoch: -1
2 changes: 1 addition & 1 deletion src/schnetpack/data/atoms.py
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ def _get_properties(
properties[structure.idx] = torch.tensor([idx])
for pname in load_properties:
properties[pname] = (
torch.tensor(row.data[pname].copy()) * self.conversions[pname]
torch.tensor(row.data[pname].copy()) * self.conversions[pname]
)

Z = row["numbers"].copy()
Expand Down
18 changes: 17 additions & 1 deletion src/schnetpack/data/datamodule.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
BaseAtomsData,
AtomsLoader,
calculate_stats,
estimate_atomrefs,
SplittingStrategy,
RandomSplit,
)
Expand Down Expand Up @@ -127,6 +128,7 @@ def __init__(
self.property_units = property_units
self.distance_unit = distance_unit
self._stats = {}
self._atomrefs = {}
self._is_setup = False
self.data_workdir = data_workdir
self.cleanup_workdir_stage = cleanup_workdir_stage
Expand Down Expand Up @@ -359,6 +361,20 @@ def get_stats(
self._stats[key] = stats
return stats

def get_atomrefs(
self, property: str, is_extensive: bool
) -> Tuple[torch.Tensor, torch.Tensor]:
key = (property, is_extensive)
if key in self._atomrefs:
return {property: self._atomrefs[key]}

atomrefs = estimate_atomrefs(
self.train_dataloader(),
is_extensive={property: is_extensive},
)[property]
self._atomrefs[key] = atomrefs
return {property: atomrefs}

@property
def train_dataset(self) -> BaseAtomsData:
return self._train_dataset
Expand Down Expand Up @@ -408,4 +424,4 @@ def test_dataloader(self) -> AtomsLoader:
num_workers=self.num_test_workers,
pin_memory=self._pin_memory,
)
return self._test_dataloader
return self._test_dataloader
87 changes: 86 additions & 1 deletion src/schnetpack/data/splitting.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
import math
import torch
import numpy as np
import scipy.sparse as sp

__all__ = ["SplittingStrategy", "RandomSplit", "SubsamplePartitions"]
__all__ = ["SplittingStrategy", "RandomSplit", "SubsamplePartitions","AtomTypeSplit"]


def absolute_split_sizes(dsize: int, split_sizes: List[int]) -> List[int]:
Expand Down Expand Up @@ -96,6 +97,90 @@ def split(self, dataset, *split_sizes) -> List[torch.tensor]:
return partition_sizes_idx


class AtomTypeSplit(SplittingStrategy):

"""
Strategy that filters out a specific atom type or multiple atom types from the database.
And then performs the splitting on the filtered dataset.
The remaining dataset are all molecules, except the ones that contain the atom type(s) to be filtered out.

The data are read from the metadata.
Data should be saved as sparse array, where the data,indices,pointer,shape are provided in metadata
The keys in the metadata are of structure "atom_type_count_{indices OR indptr OR shape OR data}"
Filter array is binary, where 1 means the atom type is present in the molecule and 0 means it is not.
"""

def __init__(
self,
atomtypes: List[int],
num_keep: Union[int,float] = None):
"""
Args:
atomtypes: list of atom types to be filtered out.
num_keep: percentage of the to be filtered out atomtypes to keep.
For now the percentage is applied to all atomtypes.
Values below 1 are interpreted as percentage, values above as absolute number.
Conversion is done automatically.
"""
self.atomtypes = atomtypes
self.num_keep = num_keep

def split(self, dataset, *split_sizes):

# binary array of NxZ, where N is the number of molecules and Z is the number of atom types
# 1 means the atom type is present in the molecule and 0 means it is not
# the atom array count can be calculated with estimate_atomrefs code
atom_type_count = sp.csr_matrix(
(dataset.conn.metadata["atom_type_count_data"],
dataset.conn.metadata["atom_type_count_indices"],
dataset.conn.metadata["atom_type_count_indptr"]),
shape=dataset.conn.metadata["atom_type_count_shape"]).toarray()

# mask to keep all molecules without requested atomtypes
keep = (atom_type_count[:,self.atomtypes] == 0).all(axis=1)
indices = np.where(keep)[0]
# mask to exclude all molecules with requested atomtypes
exclude = (~keep)
exclude_indices = np.where(exclude)[0]
# random indices of exclude to choose from
random_iter_indices = np.random.permutation(len(exclude_indices)).tolist()

# adding requested percentage or absolute value of exclude to keep
# if num keep for exclude requested, cumulative keeping is done
# e.g first run 3% num keep and second run 5% num keep
# the 3% of first run are included in the 5% of the second run
if self.num_keep:
if self.num_keep < 1:
num_keep = int(math.floor(self.num_keep * exclude_indices.shape[0]))
else:
num_keep = self.num_keep
indices = np.concatenate([indices,exclude_indices[random_iter_indices[:num_keep]]])

# split the dataset
partition_sizes_idx = self.random_split(np.array(indices), *split_sizes)
return partition_sizes_idx

def random_split(self,indices, *split_sizes: Union[int, float]) -> List[torch.tensor]:
"""
Randomly split the dataset

Args:
dsize - Size of dataset.
split_sizes - Sizes for each split. One can be set to -1 to assign all
remaining data. Values in [0, 1] can be used to give relative partition
sizes.
"""
dsize = len(indices)
split_sizes = absolute_split_sizes(dsize, split_sizes)
offsets = torch.cumsum(torch.tensor(split_sizes), dim=0)
indices = indices[torch.randperm(len(indices)).tolist()].tolist()
partition_sizes_idx = [
indices[offset - length : offset]
for offset, length in zip(offsets, split_sizes)
]
return partition_sizes_idx


class SubsamplePartitions(SplittingStrategy):
"""
Strategy that splits the atoms dataset into predefined partitions as defined in the
Expand Down
Loading
Loading