Skip to content

Update Iris benchmarking to align with templating #6421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/benchmarks_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,4 @@ jobs:
- name: Post reports
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python benchmarks/bm_runner.py _gh_post
run: benchmarks/bm_runner.py _gh_post
17 changes: 16 additions & 1 deletion .github/workflows/benchmarks_run.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ on:

jobs:
pre-checks:
# This workflow supports two different scenarios (overnight and branch).
# The pre-checks job determines which scenario is being run.
runs-on: ubuntu-latest
if: github.repository == 'SciTools/iris'
outputs:
Expand All @@ -36,9 +38,11 @@ jobs:
# SEE ALSO .github/labeler.yml .
paths: requirements/locks/*.lock setup.py
- id: overnight
name: Check overnight scenario
if: github.event_name != 'pull_request'
run: echo "check=true" >> "$GITHUB_OUTPUT"
- id: branch
name: Check branch scenario
if: >
github.event_name == 'pull_request'
&&
Expand Down Expand Up @@ -67,7 +71,8 @@ jobs:

steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v4
- name: Checkout repo
uses: actions/checkout@v4
with:
fetch-depth: 0

Expand Down Expand Up @@ -107,6 +112,8 @@ jobs:
echo "OVERRIDE_TEST_DATA_REPOSITORY=${GITHUB_WORKSPACE}/${IRIS_TEST_DATA_PATH}/test_data" >> $GITHUB_ENV

- name: Benchmark this pull request
# If the 'branch' condition(s) are met: use the bm_runner to compare
# the proposed merge with the base branch.
if: needs.pre-checks.outputs.branch == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand All @@ -115,10 +122,14 @@ jobs:
nox -s benchmarks -- branch origin/${{ github.base_ref }}

- name: Run overnight benchmarks
# If the 'overnight' condition(s) are met: use the bm_runner to compare
# each of the last 24 hours' commits to their parents.
id: overnight
if: needs.pre-checks.outputs.overnight == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# The first_commit argument allows a custom starting point - useful
# for manual re-running.
run: |
first_commit=${{ inputs.first_commit }}
if [ "$first_commit" == "" ]
Expand All @@ -132,6 +143,8 @@ jobs:
fi

- name: Warn of failure
# The overnight run is not on a pull request, so a failure could go
# unnoticed without being actively advertised.
if: >
failure() &&
steps.overnight.outcome == 'failure'
Expand All @@ -143,13 +156,15 @@ jobs:
gh issue create --title "$title" --body "$body" --label "Bot" --label "Type: Performance" --repo $GITHUB_REPOSITORY

- name: Upload any benchmark reports
# Uploading enables more downstream processing e.g. posting a PR comment.
if: success() || steps.overnight.outcome == 'failure'
uses: actions/upload-artifact@v4
with:
name: benchmark_reports
path: .github/workflows/benchmark_reports

- name: Archive asv results
# Store the raw ASV database(s) to help manual investigations.
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
Expand Down
70 changes: 41 additions & 29 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Iris Performance Benchmarking
# SciTools Performance Benchmarking

Iris uses an [Airspeed Velocity](https://github.com/airspeed-velocity/asv)
SciTools uses an [Airspeed Velocity](https://github.com/airspeed-velocity/asv)
(ASV) setup to benchmark performance. This is primarily designed to check for
performance shifts between commits using statistical analysis, but can also
be easily repurposed for manual comparative and scalability analyses.
Expand All @@ -21,25 +21,30 @@ by the PR. (This run is managed by
[the aforementioned GitHub Action](../.github/workflows/benchmark.yml)).

To run locally: the **benchmark runner** provides conveniences for
common benchmark setup and run tasks, including replicating the automated
overnight run locally. This is accessed via the Nox `benchmarks` session - see
`nox -s benchmarks -- --help` for detail (_see also:
[bm_runner.py](./bm_runner.py)_). Alternatively you can directly run `asv ...`
commands from this directory (you will still need Nox installed - see
[Benchmark environments](#benchmark-environments)).
common benchmark setup and run tasks, including replicating the benchmarking
performed by GitHub Actions workflows. This can be accessed by:

- The Nox `benchmarks` session - (use
`nox -s benchmarks -- --help` for details).
- `benchmarks/bm_runner.py` (use the `--help` argument for details).
- Directly running `asv` commands from the `benchmarks/` directory (check
whether environment setup has any extra dependencies - see
[Benchmark environments](#benchmark-environments)).

### Reducing run time

A significant portion of benchmark run time is environment management. Run-time
can be reduced by placing the benchmark environment on the same file system as
your
[Conda package cache](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-pkg-directories),
if it is not already. You can achieve this by either:

- Temporarily reconfiguring `ENV_PARENT` in `delegated_env_commands`
in [asv.conf.json](asv.conf.json) to reference a location on the same file
system as the Conda package cache.
can be reduced by co-locating the benchmark environment and your
[Conda package cache](https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/custom-env-and-pkg-locations.html)
on the same [file system](https://en.wikipedia.org/wiki/File_system), if they
are not already. This can be done in several ways:

- Temporarily reconfiguring `env_parent` in
[`_asv_delegated_abc`](_asv_delegated_abc.py) to reference a location on the same
file system as the Conda package cache.
- Using an alternative Conda package cache location during the benchmark run,
e.g. via the `$CONDA_PKGS_DIRS` environment variable.
- Moving your Iris repo to the same file system as the Conda package cache.
- Moving your repo checkout to the same file system as the Conda package cache.

### Environment variables

Expand Down Expand Up @@ -73,7 +78,8 @@ requirements will not be delayed by repeated environment setup - especially
relevant given the [benchmark runner](bm_runner.py)'s use of
[--interleave-rounds](https://asv.readthedocs.io/en/stable/commands.html?highlight=interleave-rounds#asv-run),
or any time you know you will repeatedly benchmark the same commit. **NOTE:**
Iris environments are large so this option can consume a lot of disk space.
SciTools environments tend to large so this option can consume a lot of disk
space.

## Writing benchmarks

Expand All @@ -97,6 +103,7 @@ for manual investigations; and consider committing any useful benchmarks as
[on-demand benchmarks](#on-demand-benchmarks) for future developers to use.

### Data generation

**Important:** be sure not to use the benchmarking environment to generate any
test objects/files, as this environment changes with each commit being
benchmarked, creating inconsistent benchmark 'conditions'. The
Expand All @@ -106,7 +113,7 @@ solution; read more detail there.
### ASV re-run behaviour

Note that ASV re-runs a benchmark multiple times between its `setup()` routine.
This is a problem for benchmarking certain Iris operations such as data
This is a problem for benchmarking certain SciTools operations such as data
realisation, since the data will no longer be lazy after the first run.
Consider writing extra steps to restore objects' original state _within_ the
benchmark itself.
Expand All @@ -117,10 +124,13 @@ maintain result accuracy this should be accompanied by increasing the number of
repeats _between_ `setup()` calls using the `repeat` attribute.
`warmup_time = 0` is also advisable since ASV performs independent re-runs to
estimate run-time, and these will still be subject to the original problem.
The `@disable_repeat_between_setup` decorator in
[`benchmarks/__init__.py`](benchmarks/__init__.py) offers a convenience for
all this.

### Custom benchmarks

Iris benchmarking implements custom benchmark types, such as a `tracemalloc`
SciTools benchmarking implements custom benchmark types, such as a `tracemalloc`
benchmark to measure memory growth. See [custom_bms/](./custom_bms) for more
detail.

Expand All @@ -131,10 +141,10 @@ limited available runtime and risk of false-positives. It remains useful for
manual investigations).**

When comparing performance between commits/file-type/whatever it can be helpful
to know if the differences exist in scaling or non-scaling parts of the Iris
functionality in question. This can be done using a size parameter, setting
one value to be as small as possible (e.g. a scalar `Cube`), and the other to
be significantly larger (e.g. a 1000x1000 `Cube`). Performance differences
to know if the differences exist in scaling or non-scaling parts of the
operation under test. This can be done using a size parameter, setting
one value to be as small as possible (e.g. a scalar value), and the other to
be significantly larger (e.g. a 1000x1000 array). Performance differences
might only be seen for the larger value, or the smaller, or both, getting you
closer to the root cause.

Expand All @@ -151,13 +161,15 @@ suites for the UK Met Office NG-VAT project.
## Benchmark environments

We have disabled ASV's standard environment management, instead using an
environment built using the same Nox scripts as Iris' test environments. This
is done using ASV's plugin architecture - see
[asv_delegated_conda.py](asv_delegated_conda.py) and the extra config items in
[asv.conf.json](asv.conf.json).
environment built using the same scripts that set up the package test
environments.
This is done using ASV's plugin architecture - see
[`asv_delegated.py`](asv_delegated.py) and associated
references in [`asv.conf.json`](asv.conf.json) (`environment_type` and
`plugins`).

(ASV is written to control the environment(s) that benchmarks are run in -
minimising external factors and also allowing it to compare between a matrix
of dependencies (each in a separate environment). We have chosen to sacrifice
these features in favour of testing each commit with its intended dependencies,
controlled by Nox + lock-files).
controlled by the test environment setup script(s)).
Loading
Loading