diff --git a/docs/source/best-practices.rst b/docs/source/best-practices.rst index 194fab6d..4f0b5525 100644 --- a/docs/source/best-practices.rst +++ b/docs/source/best-practices.rst @@ -181,6 +181,81 @@ the accelerator. Training -------- +Target normalization +^^^^^^^^^^^^^^^^^^^^ + +Tasks can be provided with ``normalize_kwargs``, which are key/value mappings +that specify the mean and standard deviation of a target; an example is given below. + +.. code-block: python + + Task( + ..., + normalize_kwargs={ + "energy_mean": 0.0, + "energy_std": 1.0, + } + ) + +The example above will normalize ``energy`` labels and can be substituted with +any of target key of interest (e.g. ``force``, ``bandgap``, etc.) + +Target loss scaling +^^^^^^^^^^^^^^^^^^^ + +A generally common practice is to scale some targets relative to others (e.g. force over +energy, etc). To specify this, you can pass a ``task_loss_scaling`` dictionary to +any task module, which maps target keys to a floating point value that will be used +to multiply the corresponding target loss value before summation and backpropagation. + +.. code-block: python + Task( + ..., + task_loss_scaling={ + "energy": 1.0, + "force": 10.0 + } + ) + + +A related, but alternative way to specify target scaling is to apply a *schedule* to +the training loss contributions: essentially, this provides a way to smoothly ramp +up (or down) different targets, i.e. to allow for more complex training curricula. +To achieve this, you will need to use the ``LossScalingScheduler`` callback, + +.. autoclass:: matsciml.lightning.callbacks.LossScalingScheduler + :members: + + +To specify this callback, you must pass subclasses of ``BaseScalingSchedule`` as arguments. +Each schedule type implements the functional form of a schedule, and currently +there are two concrete schedules. Composed together, an example would look like this + +.. code-block: python + + import pytorch_lightning as pl + from matsciml.lightning.callbacks import LossScalingScheduler + from matsciml.lightning.loss_scaling import LinearScalingSchedule + + scheduler = LossScalingScheduler( + LinearScalingSchedule("energy", initial_value=1.0, end_value=5.0, step_frequency="epoch") + ) + trainer = pl.Trainer(callbacks=[scheduler]) + + +The stepping schedule is determined during ``setup`` (as training begins), where the callback will +inspect ``Trainer`` arguments to determine how many steps will be taken. The ``step_frequency`` +just specifies how often the learning rate is updated. + + +.. autoclass:: matsciml.lightning.loss_scaling.LinearScalingSchedule + :members: + + +.. autoclass:: matsciml.lightning.loss_scaling.SigmoidScalingSchedule + :members: + + Quick debugging ^^^^^^^^^^^^^^^ @@ -223,6 +298,20 @@ inspired by observations made in LLM training research, where the breakdown of assumptions in the convergent properties of ``Adam``-like optimizers causes large spikes in the training loss. This callback can help identify these occurrences. +The ``devset``/``fast_dev_run`` approach detailed above is also useful for testing +engineering/infrastructure (e.g. accelerator offload), but not necessarily +for probing training dynamics. Instead, we recommend using the ``overfit_batches`` +argument in ``pl.Trainer`` + +.. code-block:: python + import pytorch_lightning as pl + + trainer = pl.Trainer(overfit_batches=100) + + +This will disable shuffling in the training and validation splits (per the PyTorch Lightning +documentation), and ensure that the same batches are being reused every epoch. + .. _e3nn documentation: https://docs.e3nn.org/en/latest/ .. _IPEX installation: https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu diff --git a/docs/source/inference.rst b/docs/source/inference.rst index 08032fcc..4b100d0e 100644 --- a/docs/source/inference.rst +++ b/docs/source/inference.rst @@ -4,6 +4,24 @@ Inference "Inference" can be a bit of an overloaded term, and this page is broken down into different possible downstream use cases for trained models. +Task ``predict`` and ``forward`` methods +---------------------------------------- + +``matsciml`` tasks implement separate ``forward`` and ``predict`` methods. Both take a +``BatchDict`` as input, and the latter wraps the former. The difference, however, is that +``predict`` is intended for inference use primarily because it will also take care of +reversing the normalization procedure, if they were provided during training, *and* perhaps +more importantly, will ensure that the exponential moving average weights are used instead +of the training ones. + +In the special case of force prediction (as a derivative of the energy) tasks, you should +only need to specify normalization ``kwargs`` for energy: the scale value is taking automatically +from the energy value, and applied to forces. + +In short, if you are writing functionality that requires unnormalized outputs (e.g. ``ase`` calculators), +please ensure you are using ``predict`` instead of ``forward`` directly. + + Parity plots and model evaluations ---------------------------------- diff --git a/docs/source/training.rst b/docs/source/training.rst index da4c6a7a..339d8be6 100644 --- a/docs/source/training.rst +++ b/docs/source/training.rst @@ -1,14 +1,152 @@ -Training pipeline -================= +Task abstraction +================ -Training with the Open MatSci ML Toolkit utilizes—for the most part—the -PyTorch Lightning abstractions. +The Open MatSciML Toolkit uses PyTorch Lightning abstractions for managing the flow +of training: how data from a datamodule gets mapped, to what loss terms are calculated, +to what gets logged is defined in a base task class. From start to finish, this module +will take in the definition of an encoding architecture (through ``encoder_class`` and +``encoder_kwargs`` keyword arguments), construct it, and in concrete task implementations, +initialize the respective output heads a set of provided or task-specific target keys. +The ``encoder_kwargs`` specification makes things a bit more verbose, but this ensures +that the hyperparameters are saved appropriately per the ``save_hyperparameters`` method +in PyTorch Lightning. -Task API reference -################## -.. autosummary:: - :toctree: generated - :recursive: +``BaseTaskModule`` API reference +-------------------------------- - matsciml.models.base +.. autoclass:: matsciml.models.base.BaseTaskModule + :members: + + +Multi task reference +-------------------------------- + +One core functionality for ``matsciml`` is the ability to compose multiple tasks +together, in an (almost) seamless fashion from the single task case. + +.. important:: + The ``MultiTaskLitModule`` is not written in a particularly friendly way at + the moment, and may be subject to a significant refactor later! + + +.. autoclass:: matsciml.models.base.MultiTaskLitModule + :members: + + +``OutputHead`` API reference +---------------------------- + +While there is a singular ``OutputHead`` definition, the blocks that constitute +an ``OutputHead`` can be specified depending on the type of model architecture +being used. The default stack is based on simple ``nn.Linear`` layers, however, +for architectures like MACE which may depend on preserving irreducible representations, +the ``IrrepOutputBlock`` allows users to specify transformations per-representation. + +.. autoclass:: matsciml.models.common.OutputHead + :members: + + +.. autoclass:: matsciml.models.common.OutputBlock + :members: + + +.. autoclass:: matsciml.models.common.IrrepOutputBlock + :members: + + +Scalar regression +----------------- + +This task is primarily designed for tasks adjacent to property prediction: you can +predict an arbitrary number of properties (per output head), based on a shared +embedding (i.e. one structure maps to a single embedding, which is used by each head). + +A special case for using this class would be in tandem (as a multitask setup) with +the :ref:`_gradfree_force`, which treats energy/force prediction as two +separate output heads, albeit with the same shared embedding. + +Please use continuous valued (e.g. ``nn.MSELoss``) loss metrics for this task. + + +.. autoclass:: matsciml.models.base.ScalarRegressionTask + :members: + + +Binary classification +----------------------- + +This task, as the name suggests, uses the embedding to perform one or more binary +classifications with a shared embedding. This can be something like a ``stability`` +label like in the Materials Project. Keep in mind, however, that a special class +exists for crystal symmetry classification. + +.. autoclass:: matsciml.models.base.BinaryClassificationTask + :members: + +.. _crystal_symmetry: + +Crystal symmetry classification +------------------------------- + +This task is a specialized class for what is essentially multiclass classification, +where given an embedding, we predict which crystal space group the structure belongs +to using ``nn.CrossEntropyLoss``. This can be a good potential pretraining task. + + +.. note:: + This task expects that your data includes ``spacegroup`` target key. + +.. autoclass:: matsciml.models.base.CrystalSymmetryClassificationTask + :members: + + +Force regression task +--------------------- + +This task implements energy/force regression, where an ``OutputHead`` is used to first +predict the energy, followed by taking its derivative with respect to the input coordinates. +From a developer perspective, this task is quite mechanically different due to the need +for manual ``autograd``, which is not normally supported by PyTorch Lightning workflows. + + +.. note:: + This task expects that your data includes ``force`` target key. + +.. autoclass:: matsciml.models.base.ForceRegressionTask + :members: + + +.. _gradfree_force: + +Gradient-free force regression task +----------------------------------- + +This task implements a force prediction task, albeit as a direct output head property +prediction as opposed to the derivative of an energy value using ``autograd``. + +.. note:: + This task expects that your data includes ``force`` target key. + +.. autoclass:: matsciml.models.base.GradFreeForceRegressionTask + :members: + + +Node denoising task +------------------- + +This task implements a powerful, and recently becoming more popular, pre-training strategy +for graph neural networks. The premise is quite simple: an encoder learns as a denoising +autoencoder by taking in a perturbed structure, and attempting to predict the amount of +noise in the 3D coordinates. + +As a requirement, this task requires the following data transform; you are able to specify +the scale of the noise added to the positions and intuitively the large the scale, the higher +potential difficulty in the task. + +.. autoclass:: matsciml.datasets.transforms.pretraining.NoisyPositions + :members: + + +.. autoclass:: matsciml.models.base.NodeDenoisingTask + :members: