Skip to content

Commit

Permalink
Merge branch 'develop' into pre-training-check
Browse files Browse the repository at this point in the history
  • Loading branch information
HCookie authored Nov 14, 2024
2 parents c5ce9ba + 76d3ef6 commit 7b9d359
Show file tree
Hide file tree
Showing 44 changed files with 863 additions and 319 deletions.
14 changes: 4 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ repos:
- id: python-check-blanket-noqa # Check for # noqa: all
- id: python-no-log-warn # Check for log.warn
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.8.0
rev: 24.10.0
hooks:
- id: black
args: [--line-length=120]
Expand All @@ -40,16 +40,15 @@ repos:
- --force-single-line-imports
- --profile black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
rev: v0.7.2
hooks:
- id: ruff
# Next line if for documenation cod snippets
exclude: '^[^_].*_\.py$'
args:
- --line-length=120
- --fix
- --exit-non-zero-on-fix
- --preview
- --exclude=docs/**/*_.py
- repo: https://github.com/sphinx-contrib/sphinx-lint
rev: v1.0.0
hooks:
Expand All @@ -60,13 +59,8 @@ repos:
hooks:
- id: rstfmt
exclude: 'cli/.*' # Because we use argparse
- repo: https://github.com/b8raoult/pre-commit-docconvert
rev: "0.1.5"
hooks:
- id: docconvert
args: ["numpy"]
- repo: https://github.com/tox-dev/pyproject-fmt
rev: "2.2.4"
rev: "v2.5.0"
hooks:
- id: pyproject-fmt
- repo: https://github.com/jshwi/docsig # Check docstrings against function sig
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,30 @@ Keep it human-readable, your future self will thank you!
## [Unreleased](https://github.com/ecmwf/anemoi-training/compare/0.2.2...HEAD)

### Fixed
- Rename loss_scaling to variable_loss_scaling [#138](https://github.com/ecmwf/anemoi-training/pull/138)
- Refactored callbacks. [#60](https://github.com/ecmwf/anemoi-training/pulls/60)
- Updated docs [#115](https://github.com/ecmwf/anemoi-training/pull/115)
- Fix enabling LearningRateMonitor [#119](https://github.com/ecmwf/anemoi-training/pull/119)
- Refactored rollout [#87](https://github.com/ecmwf/anemoi-training/pulls/87)
- Enable longer validation rollout than training
- Expand iterables in logging [#91](https://github.com/ecmwf/anemoi-training/pull/91)
- Save entire config in mlflow
### Added
- Included more loss functions and allowed configuration [#70](https://github.com/ecmwf/anemoi-training/pull/70)
- Fix that applies the metric_ranges in the post-processed variable space [#116](https://github.com/ecmwf/anemoi-training/pull/116)
- Allow updates to scalars [#137](https://github.com/ecmwf/anemoi-training/pulls/137)
- Add without subsetting in ScaleTensor
- Sub-hour datasets [#63](https://github.com/ecmwf/anemoi-training/pull/63)
- Add synchronisation workflow [#92](https://github.com/ecmwf/anemoi-training/pull/92)
- Feat: Anemoi Profiler compatible with mlflow and using Pytorch (Kineto) Profiler for memory report [38](https://github.com/ecmwf/anemoi-training/pull/38/)
- Added a check for the variable sorting on pre-trained/finetuned models [#120](https://github.com/ecmwf/anemoi-training/pull/120)
- New limited area config file added, limited_area.yaml. [#134](https://github.com/ecmwf/anemoi-training/pull/134/)
- New stretched grid config added, stretched_grid.yaml [#133](https://github.com/ecmwf/anemoi-training/pull/133)

### Changed
- Renamed frequency keys in callbacks configuration. [#118](https://github.com/ecmwf/anemoi-training/pull/118)
- Modified training configuration to support max_steps and tied lr iterations to max_steps by default [#67](https://github.com/ecmwf/anemoi-training/pull/67)
- Merged node & edge trainable feature callbacks into one. [#135](https://github.com/ecmwf/anemoi-training/pull/135)

### Removed
- Removed the resolution config entry [#120](https://github.com/ecmwf/anemoi-training/pull/120)
Expand All @@ -49,6 +58,8 @@ Keep it human-readable, your future self will thank you!
- Feature: New `Boolean1DMask` class. Enables rollout training for limited area models. [#79](https://github.com/ecmwf/anemoi-training/pulls/79)

### Fixed

- Fix pre-commit regex
- Mlflow-sync to handle creation of new experiments in the remote server [#83] (https://github.com/ecmwf/anemoi-training/pull/83)
- Fix for multi-gpu when using mlflow due to refactoring of _get_mlflow_run_params function [#99] (https://github.com/ecmwf/anemoi-training/pull/99)
- ci: fix pyshtools install error (#100) https://github.com/ecmwf/anemoi-training/pull/100
Expand Down Expand Up @@ -104,6 +115,7 @@ Keep it human-readable, your future self will thank you!
- Updated configuration examples in documentation and corrected links - [#46](https://github.com/ecmwf/anemoi-training/pull/46)
- Remove credential prompt from mlflow login, replace with seed refresh token via web - [#78](https://github.com/ecmwf/anemoi-training/pull/78)
- Update CODEOWNERS
- Change how mlflow measures CPU Memory usage - [94](https://github.com/ecmwf/anemoi-training/pull/94)

## [0.1.0 - Anemoi training - First release](https://github.com/ecmwf/anemoi-training/releases/tag/0.1.0) - 2024-08-16

Expand Down
21 changes: 15 additions & 6 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
# (C) Copyright 2024 Anemoi contributors.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.

# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
Expand All @@ -13,10 +22,11 @@
import datetime
import os
import sys
from pathlib import Path

read_the_docs_build = os.environ.get("READTHEDOCS", None) == "True"

sys.path.insert(0, os.path.join(os.path.abspath(".."), "src"))
sys.path.insert(0, Path("..").absolute() / "src")


source_suffix = ".rst"
Expand All @@ -30,13 +40,12 @@

project = "Anemoi Training"

author = "ECMWF"
author = "Anemoi contributors"

year = datetime.datetime.now().year
year = datetime.datetime.now(tz="UTC").year
years = "2024" if year == 2024 else f"2024-{year}"

copyright = f"{years}, ECMWF"

copyright = f"{years}, Anemoi contributors" # noqa: A001

try:
from anemoi.training._version import __version__
Expand Down Expand Up @@ -64,7 +73,7 @@
]

# Add any paths that contain templates here, relative to this directory.
# templates_path = ["_templates"]
# templates_path = ["_templates"] # noqa: ERA001

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/losses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ define whether to include them in the loss function by setting
Currently, the following scalars are available for use:

- ``variable``: Scale by the feature/variable weights as defined in the
config ``config.training.loss_scaling``.
config ``config.training.variable_loss_scaling``.

********************
Validation Metrics
Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide/training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,8 @@ by setting ``config.data.normaliser``, such that:

It is possible to change the weighting given to each of the variables in
the loss function by changing
``config.training.loss_scaling.pl.<pressure level variable>`` and
``config.training.loss_scaling.sfc.<surface variable>``.
``config.training.variable_loss_scaling.pl.<pressure level variable>``
and ``config.training.variable_loss_scaling.sfc.<surface variable>``.

It is also possible to change the scaling given to the pressure levels
using ``config.training.pressure_level_scaler``. For almost all
Expand Down
8 changes: 3 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
#!/usr/bin/env python
# (C) Copyright 2024 ECMWF.
# (C) Copyright 2024 Anemoi contributors.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.

# https://packaging.python.org/en/latest/guides/writing-pyproject-toml/

[build-system]
requires = [ "setuptools>=60", "setuptools-scm>=8" ]

Expand Down Expand Up @@ -43,7 +41,7 @@ dynamic = [ "version" ]

dependencies = [
"anemoi-datasets>=0.4",
"anemoi-graphs",
"anemoi-graphs>=0.4",
"anemoi-models>=0.3",
"anemoi-utils[provenance]>=0.4.4",
"einops>=0.6.1",
Expand Down
6 changes: 4 additions & 2 deletions src/anemoi/training/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# (C) Copyright 2024 European Centre for Medium-Range Weather Forecasts.
# (C) Copyright 2024 Anemoi contributors.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.
Expand All @@ -9,7 +11,7 @@
try:
# NOTE: the `_version.py` file must not be present in the git repository
# as it is generated by setuptools at install time
from ._version import __version__ # type: ignore
from ._version import __version__
except ImportError: # pragma: no cover
# Local copy or not installed with setuptools
__version__ = "999"
4 changes: 2 additions & 2 deletions src/anemoi/training/__main__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# (C) Copyright 2024 ECMWF.
# (C) Copyright 2024 Anemoi contributors.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.
#

from anemoi.utils.cli import cli_main
from anemoi.utils.cli import make_parser
Expand Down
4 changes: 2 additions & 2 deletions src/anemoi/training/commands/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# (C) Copyright 2024 ECMWF.
# (C) Copyright 2024 Anemoi contributors.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.
#

from pathlib import Path

Expand Down
4 changes: 3 additions & 1 deletion src/anemoi/training/config/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# (C) Copyright 2024 European Centre for Medium-Range Weather Forecasts.
# (C) Copyright 2024 Anemoi contributors.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Add callbacks here
- _target_: anemoi.training.diagnostics.callbacks.evaluation.RolloutEval
rollout: ${dataloader.validation_rollout}
frequency: 20
every_n_batches: 20
2 changes: 2 additions & 0 deletions src/anemoi/training/config/diagnostics/evaluation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ log:
terminal: True
run_name: null # If set to null, the run name will be the a random UUID
on_resume_create_child: True
expand_hyperparams: # Which keys in hyperparams to expand
- config
interval: 100

enable_progress_bar: True
Expand Down
7 changes: 3 additions & 4 deletions src/anemoi/training/config/diagnostics/plot/detailed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,8 @@ precip_and_related_fields: [tp, cp]

callbacks:
# Add plot callbacks here
- _target_: anemoi.training.diagnostics.callbacks.plot.GraphNodeTrainableFeaturesPlot
- _target_: anemoi.training.diagnostics.callbacks.plot.GraphEdgeTrainableFeaturesPlot
epoch_frequency: 5
- _target_: anemoi.training.diagnostics.callbacks.plot.GraphTrainableFeaturesPlot
every_n_epochs: 5
- _target_: anemoi.training.diagnostics.callbacks.plot.PlotLoss
# group parameters by categories when visualizing contributions to the loss
# one-parameter groups are possible to highlight individual parameters
Expand All @@ -43,7 +42,7 @@ callbacks:
precip_and_related_fields: ${diagnostics.plot.precip_and_related_fields}

- _target_: anemoi.training.diagnostics.callbacks.plot.PlotSpectrum
# batch_frequency: 100 # Override for batch frequency
# every_n_batches: 100 # Override for batch frequency
sample_idx: ${diagnostics.plot.sample_idx}
parameters:
- z_500
Expand Down
9 changes: 4 additions & 5 deletions src/anemoi/training/config/diagnostics/plot/rollout_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,8 @@ precip_and_related_fields: [tp, cp]

callbacks:
# Add plot callbacks here
- _target_: anemoi.training.diagnostics.callbacks.plot.GraphNodeTrainableFeaturesPlot
- _target_: anemoi.training.diagnostics.callbacks.plot.GraphEdgeTrainableFeaturesPlot
epoch_frequency: 5
- _target_: anemoi.training.diagnostics.callbacks.plot.GraphTrainableFeaturesPlot
every_n_epochs: 5
- _target_: anemoi.training.diagnostics.callbacks.plot.PlotLoss
# group parameters by categories when visualizing contributions to the loss
# one-parameter groups are possible to highlight individual parameters
Expand All @@ -43,7 +42,7 @@ callbacks:
precip_and_related_fields: ${diagnostics.plot.precip_and_related_fields}

- _target_: anemoi.training.diagnostics.callbacks.plot.PlotSpectrum
# batch_frequency: 100 # Override for batch frequency
# every_n_batches: 100 # Override for batch frequency
sample_idx: ${diagnostics.plot.sample_idx}
parameters:
- z_500
Expand All @@ -63,6 +62,6 @@ callbacks:
- _target_: anemoi.training.diagnostics.callbacks.plot.LongRolloutPlots
rollout:
- ${dataloader.validation_rollout}
epoch_frequency: 20
every_n_epochs: 20
sample_idx: ${diagnostics.plot.sample_idx}
parameters: ${diagnostics.plot.parameters}
60 changes: 60 additions & 0 deletions src/anemoi/training/config/graph/limited_area.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
overwrite: True

data: "data"
hidden: "hidden"

nodes:
# Data nodes
data:
node_builder:
_target_: anemoi.graphs.nodes.ZarrDatasetNodes
dataset: ${dataloader.training.dataset}
attributes: ${graph.attributes.nodes}
# Hidden nodes
hidden:
node_builder:
_target_: anemoi.graphs.nodes.LimitedAreaTriNodes # options: ZarrDatasetNodes, NPZFileNodes, TriNodes
resolution: 5 # grid resolution for npz (o32, o48, ...)
reference_node_name: ${graph.data}
mask_attr_name: cutout

edges:
# Encoder configuration
- source_name: ${graph.data}
target_name: ${graph.hidden}
edge_builder:
_target_: anemoi.graphs.edges.CutOffEdges # options: KNNEdges, CutOffEdges
cutoff_factor: 0.6 # only for cutoff method
attributes: ${graph.attributes.edges}
# Processor configuration
- source_name: ${graph.hidden}
target_name: ${graph.hidden}
edge_builder:
_target_: anemoi.graphs.edges.MultiScaleEdges
x_hops: 1
attributes: ${graph.attributes.edges}
# Decoder configuration
- source_name: ${graph.hidden}
target_name: ${graph.data}
target_mask_attr_name: cutout
edge_builder:
_target_: anemoi.graphs.edges.KNNEdges # options: KNNEdges, CutOffEdges
num_nearest_neighbours: 3 # only for knn method
attributes: ${graph.attributes.edges}


attributes:
nodes:
area_weight:
_target_: anemoi.graphs.nodes.attributes.AreaWeights # options: Area, Uniform
norm: unit-max # options: l1, l2, unit-max, unit-sum, unit-std
cutout:
_target_: anemoi.graphs.nodes.attributes.CutOutMask
edges:
edge_length:
_target_: anemoi.graphs.edges.attributes.EdgeLength
norm: unit-std
edge_dirs:
_target_: anemoi.graphs.edges.attributes.EdgeDirection
norm: unit-std
Loading

0 comments on commit 7b9d359

Please sign in to comment.