From f74704df6f46b16300fef68a72479025a4962f0b Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 10:57:00 -0700 Subject: [PATCH 01/11] docs: added note on using overfit_batches Signed-off-by: Kin Long Kelvin Lee --- docs/source/best-practices.rst | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/source/best-practices.rst b/docs/source/best-practices.rst index 194fab6d..3dd1c858 100644 --- a/docs/source/best-practices.rst +++ b/docs/source/best-practices.rst @@ -223,6 +223,20 @@ inspired by observations made in LLM training research, where the breakdown of assumptions in the convergent properties of ``Adam``-like optimizers causes large spikes in the training loss. This callback can help identify these occurrences. +The ``devset``/``fast_dev_run`` approach detailed above is also useful for testing +engineering/infrastructure (e.g. accelerator offload and logging), but not necessarily +for probing training dynamics. Instead, we recommend using the ``overfit_batches`` +argument in ``pl.Trainer`` + +.. code-block:: python + import pytorch_lightning as pl + + trainer = pl.Trainer(overfit_batches=100) + + +This will disable shuffling in the training and validation splits (per the PyTorch Lightning +documentation), and ensure that the same batches are being reused every epoch. + .. _e3nn documentation: https://docs.e3nn.org/en/latest/ .. _IPEX installation: https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu From e8a21aef212fa1786c0b6b3c72474c6d931ac919 Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 13:00:40 -0700 Subject: [PATCH 02/11] docs: added note on predict versus forward Signed-off-by: Kin Long Kelvin Lee --- docs/source/inference.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/source/inference.rst b/docs/source/inference.rst index 08032fcc..4b100d0e 100644 --- a/docs/source/inference.rst +++ b/docs/source/inference.rst @@ -4,6 +4,24 @@ Inference "Inference" can be a bit of an overloaded term, and this page is broken down into different possible downstream use cases for trained models. +Task ``predict`` and ``forward`` methods +---------------------------------------- + +``matsciml`` tasks implement separate ``forward`` and ``predict`` methods. Both take a +``BatchDict`` as input, and the latter wraps the former. The difference, however, is that +``predict`` is intended for inference use primarily because it will also take care of +reversing the normalization procedure, if they were provided during training, *and* perhaps +more importantly, will ensure that the exponential moving average weights are used instead +of the training ones. + +In the special case of force prediction (as a derivative of the energy) tasks, you should +only need to specify normalization ``kwargs`` for energy: the scale value is taking automatically +from the energy value, and applied to forces. + +In short, if you are writing functionality that requires unnormalized outputs (e.g. ``ase`` calculators), +please ensure you are using ``predict`` instead of ``forward`` directly. + + Parity plots and model evaluations ---------------------------------- From 896af346180b3f888d6bcdf915fef0f42fc03b69 Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 13:45:02 -0700 Subject: [PATCH 03/11] docs: adding training documentation Signed-off-by: Kin Long Kelvin Lee --- docs/source/training.rst | 153 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 147 insertions(+), 6 deletions(-) diff --git a/docs/source/training.rst b/docs/source/training.rst index da4c6a7a..35375a3f 100644 --- a/docs/source/training.rst +++ b/docs/source/training.rst @@ -1,14 +1,155 @@ Training pipeline ================= -Training with the Open MatSci ML Toolkit utilizes—for the most part—the -PyTorch Lightning abstractions. +Task abstraction +================ + +The Open MatSciML Toolkit uses PyTorch Lightning abstractions for managing the flow +of training: how data from a datamodule gets mapped, to what loss terms are calculated, +to what gets logged is defined in a base task class. From start to finish, this module +will take in the definition of an encoding architecture (through ``encoder_class`` and +``encoder_kwargs`` keyword arguments), construct it, and in concrete task implementations, +initialize the respective output heads a set of provided or task-specific target keys. +The ``encoder_kwargs`` specification makes things a bit more verbose, but this ensures +that the hyperparameters are saved appropriately per the ``save_hyperparameters`` method +in PyTorch Lightning. + + +``BaseTaskModule`` API reference +-------------------------------- + +.. autoclass:: matsciml.models.base.BaseTaskModule + :members: + + +Multi task reference +-------------------------------- + +One core functionality for ``matsciml`` is the ability to compose multiple tasks +together, in an (almost) seamless fashion from the single task case. + +.. important:: + The ``MultiTaskLitModule`` is not written in a particularly friendly way at + the moment, and may be subject to a significant refactor later! + + +.. autoclass:: matsciml.models.base.MultiTaskLitModule + :members: + + +``OutputHead`` API reference +---------------------------- + +While there is a singular ``OutputHead`` definition, the blocks that constitute +an ``OutputHead`` can be specified depending on the type of model architecture +being used. The default stack is based on simple ``nn.Linear`` layers, however, +for architectures like MACE which may depend on preserving irreducible representations, +the ``IrrepOutputBlock`` allows users to specify transformations per-representation. + +.. autoclass:: matsciml.models.common.OutputHead + :members: + + +.. autoclass:: matsciml.models.common.OutputBlock + :members: + + +.. autoclass:: matsciml.models.common.IrrepOutputBlock + :members: + Task API reference ################## -.. autosummary:: - :toctree: generated - :recursive: +Scalar regression +----------------- + +This task is primarily designed for tasks adjacent to property prediction: you can +predict an arbitrary number of properties (per output head), based on a shared +embedding (i.e. one structure maps to a single embedding, which is used by each head). + +A special case for using this class would be in tandem (as a multitask setup) with +the :ref:`_gradfree_force`, which treats energy/force prediction as two +separate output heads, albeit with the same shared embedding. + +Please use continuous valued (e.g. ``nn.MSELoss``) loss metrics for this task. + + +.. autoclass:: matsciml.models.base.ScalarRegressionTask + :members: + + +Binary classification +----------------------- + +This task, as the name suggests, uses the embedding to perform one or more binary +classifications with a shared embedding. This can be something like a ``stability`` +label like in the Materials Project. Keep in mind, however, that a special class +exists for crystal symmetry classification. + +.. _crystal_symmetry: + +Crystal symmetry classification +------------------------------- + +This task is a specialized class for what is essentially multiclass classification, +where given an embedding, we predict which crystal space group the structure belongs +to using ``nn.CrossEntropyLoss``. This can be a good potential pretraining task. + + +.. note:: + This task expects that your data includes ``spacegroup`` target key. + +.. autoclass:: matsciml.models.base.CrystalSymmetryClassificationTask + :members: + + +Force regression task +--------------------- + +This task implements energy/force regression, where an ``OutputHead`` is used to first +predict the energy, followed by taking its derivative with respect to the input coordinates. +From a developer perspective, this task is quite mechanically different due to the need +for manual ``autograd``, which is not normally supported by PyTorch Lightning workflows. + + +.. note:: + This task expects that your data includes ``force`` target key. + +.. autoclass:: matsciml.models.base.ForceRegressionTask + :members: + + +.. _gradfree_force: + +Gradient-free force regression task +----------------------------------- + +This task implements a force prediction task, albeit as a direct output head property +prediction as opposed to the derivative of an energy value using ``autograd``. + +.. note:: + This task expects that your data includes ``force`` target key. + +.. autoclass:: matsciml.models.base.GradFreeForceRegression + :members: + + +Node denoising task +------------------- + +This task implements a powerful, and recently becoming more popular, pre-training strategy +for graph neural networks. The premise is quite simple: an encoder learns as a denoising +autoencoder by taking in a perturbed structure, and attempting to predict the amount of +noise in the 3D coordinates. + +As a requirement, this task requires the following data transform; you are able to specify +the scale of the noise added to the positions and intuitively the large the scale, the higher +potential difficulty in the task. + +.. autoclass:: matsciml.datasets.transforms.pretraining.NoisyPositions + :members: + - matsciml.models.base +.. autoclass:: matsciml.models.base.NodeDenoisingTask + :members: From a6b9b2c1c7d258faeb39d6cad309783a95ffa166 Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 15:12:04 -0700 Subject: [PATCH 04/11] docs: added note on target normalization Signed-off-by: Kin Long Kelvin Lee --- docs/source/best-practices.rst | 74 ++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) diff --git a/docs/source/best-practices.rst b/docs/source/best-practices.rst index 3dd1c858..293d8ec6 100644 --- a/docs/source/best-practices.rst +++ b/docs/source/best-practices.rst @@ -181,6 +181,80 @@ the accelerator. Training -------- +Target normalization +^^^^^^^^^^^^^^^^^^^^ + +Tasks can be provided with ``normalize_kwargs``, which are key/value mappings +that specify the mean and standard deviation of a target; an example is given below. + +.. code-block: python + + Task( + ..., + normalize_kwargs={ + "energy_mean": 0.0, + "energy_std": 1.0, + } + ) + +The example above will normalize ``energy`` labelsm and can be substituted with +any of target key of interest (e.g. ``force``, ``bandgap``, etc.) + +Target loss scaling +^^^^^^^^^^^^^^^^^^^ + +A generally common practice is to scale some targets relative to others (e.g. force over +energy, etc). To specify this, you can pass a ``task_loss_scaling`` dictionary to +any task module, which maps target keys to a floating point value that will be used +to multiply the corresponding target loss value before summation and backpropagation. + +.. code-block: python + Task( + ..., + task_loss_scaling={ + "energy": 1.0, + "force": 10.0 + } + ) + + +A related, but alternative way to specify target scaling is to apply a *schedule* to +the training loss contributions: essentially, this provides a way to smoothly ramp +up (or down) different targets, i.e. to allow for more complex training curricula. +To achieve this, you will need to use the ``LossScalingScheduler`` callback, + +.. autoclass:: matsciml.lightning.callbacks.LossScalingScheduler + :members: + + +To specify this callback, you must pass subclasses of ``BaseScalingSchedule`` as arguments. +Each schedule type implements the functional form of a schedule, and currently +there are two concrete schedules; + +.. autoclass:: matsciml.lightning.loss_scaling.BaseScalingSchedule + :members: + :inherited-members: + + +Composed together, an example would look like this + +.. code-block: python + + import pytorch_lightning as pl + from matsciml.lightning.callbacks import LossScalingScheduler + from matsciml.lightning.loss_scaling import LinearScalingSchedule + + scheduler = LossScalingScheduler( + LinearScalingSchedule("energy", initial_value=1.0, end_value=5.0, step_frequency="epoch") + ) + trainer = pl.Trainer(callbacks=[scheduler]) + + +The stepping schedule is determined during ``setup`` (as training begins), where the callback will +inspect ``Trainer`` arguments to determine how many steps will be taken. The ``step_frequency`` +just specifies how often the learning rate is updated. + + Quick debugging ^^^^^^^^^^^^^^^ From ea4d23f0985775122346143c8a1a78cb7eb8db35 Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 15:20:04 -0700 Subject: [PATCH 05/11] docs: correcting class reference to GradFreeForceRegression Signed-off-by: Kin Long Kelvin Lee --- docs/source/training.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/training.rst b/docs/source/training.rst index 35375a3f..5b95057b 100644 --- a/docs/source/training.rst +++ b/docs/source/training.rst @@ -131,7 +131,7 @@ prediction as opposed to the derivative of an energy value using ``autograd``. .. note:: This task expects that your data includes ``force`` target key. -.. autoclass:: matsciml.models.base.GradFreeForceRegression +.. autoclass:: matsciml.models.base.GradFreeForceRegressionTask :members: From aa49a96c7f2e1a0a5797a617a8e2923a1e6f8242 Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 15:58:29 -0700 Subject: [PATCH 06/11] docs: updating loss scaling docs Signed-off-by: Kin Long Kelvin Lee --- docs/source/best-practices.rst | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/docs/source/best-practices.rst b/docs/source/best-practices.rst index 293d8ec6..f80ab497 100644 --- a/docs/source/best-practices.rst +++ b/docs/source/best-practices.rst @@ -229,14 +229,7 @@ To achieve this, you will need to use the ``LossScalingScheduler`` callback, To specify this callback, you must pass subclasses of ``BaseScalingSchedule`` as arguments. Each schedule type implements the functional form of a schedule, and currently -there are two concrete schedules; - -.. autoclass:: matsciml.lightning.loss_scaling.BaseScalingSchedule - :members: - :inherited-members: - - -Composed together, an example would look like this +there are two concrete schedules. Composed together, an example would look like this .. code-block: python @@ -255,6 +248,14 @@ inspect ``Trainer`` arguments to determine how many steps will be taken. The ``s just specifies how often the learning rate is updated. +.. autoclass:: matsciml.lightning.loss_scaling.LinearScalingSchedule + :members: + + +.. autoclass:: matsciml.lightning.loss_scaling.SigmoidScalingSchedule + :members: + + Quick debugging ^^^^^^^^^^^^^^^ From 0ceac7a442e15d197d54bfa0421ddb8ac7e9ea0f Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 16:01:43 -0700 Subject: [PATCH 07/11] docs: add missing binary classification API Signed-off-by: Kin Long Kelvin Lee --- docs/source/training.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/training.rst b/docs/source/training.rst index 5b95057b..ab469273 100644 --- a/docs/source/training.rst +++ b/docs/source/training.rst @@ -87,6 +87,9 @@ classifications with a shared embedding. This can be something like a ``stabilit label like in the Materials Project. Keep in mind, however, that a special class exists for crystal symmetry classification. +.. autoclass:: matsciml.models.base.BinaryClassificationTask + :members: + .. _crystal_symmetry: Crystal symmetry classification From ecb2946f5dc5b7a3819ce7dedd16c691e5167139 Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 16:02:34 -0700 Subject: [PATCH 08/11] docs: removing erroneous header Signed-off-by: Kin Long Kelvin Lee --- docs/source/training.rst | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/source/training.rst b/docs/source/training.rst index ab469273..edb962ae 100644 --- a/docs/source/training.rst +++ b/docs/source/training.rst @@ -1,6 +1,3 @@ -Training pipeline -================= - Task abstraction ================ From 590948c590604e71f9501647e6457932a7f418a7 Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 16:05:03 -0700 Subject: [PATCH 09/11] docs: removing erroneous task API reference Signed-off-by: Kin Long Kelvin Lee --- docs/source/training.rst | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/source/training.rst b/docs/source/training.rst index edb962ae..339d8be6 100644 --- a/docs/source/training.rst +++ b/docs/source/training.rst @@ -55,9 +55,6 @@ the ``IrrepOutputBlock`` allows users to specify transformations per-representat :members: -Task API reference -################## - Scalar regression ----------------- From 923164122cee903ace01d17db2d28f2b2b11d8c3 Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 16:32:34 -0700 Subject: [PATCH 10/11] docs: fixing labels typo Signed-off-by: Kin Long Kelvin Lee --- docs/source/best-practices.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/best-practices.rst b/docs/source/best-practices.rst index f80ab497..a619dedb 100644 --- a/docs/source/best-practices.rst +++ b/docs/source/best-practices.rst @@ -197,7 +197,7 @@ that specify the mean and standard deviation of a target; an example is given be } ) -The example above will normalize ``energy`` labelsm and can be substituted with +The example above will normalize ``energy`` labels and can be substituted with any of target key of interest (e.g. ``force``, ``bandgap``, etc.) Target loss scaling From 17d75825146bd8ebb9966184a80d0604a58ba2de Mon Sep 17 00:00:00 2001 From: Kin Long Kelvin Lee Date: Mon, 30 Sep 2024 16:33:11 -0700 Subject: [PATCH 11/11] docs: removing statement about logging in fast_dev_run Signed-off-by: Kin Long Kelvin Lee --- docs/source/best-practices.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/best-practices.rst b/docs/source/best-practices.rst index a619dedb..4f0b5525 100644 --- a/docs/source/best-practices.rst +++ b/docs/source/best-practices.rst @@ -299,7 +299,7 @@ assumptions in the convergent properties of ``Adam``-like optimizers causes larg spikes in the training loss. This callback can help identify these occurrences. The ``devset``/``fast_dev_run`` approach detailed above is also useful for testing -engineering/infrastructure (e.g. accelerator offload and logging), but not necessarily +engineering/infrastructure (e.g. accelerator offload), but not necessarily for probing training dynamics. Instead, we recommend using the ``overfit_batches`` argument in ``pl.Trainer``