Skip to content

Commit ab6a255

Browse files
TomNicholaspre-commit-ci[bot]Zac-HDkeewis
authored
Hypothesis strategy for generating Variable objects (#8404)
* copied files defining strategies over to this branch * placed testing functions in their own directory * moved hypothesis strategies into new testing directory * begin type hinting strategies * renamed strategies for consistency with hypothesis conventions * added strategies to public API (with experimental warning) * strategies for chunking patterns * rewrote variables strategy to have same signature as Variable constructor * test variables strategy * fixed most tests * added helpers so far to API docs * add hypothesis to docs CI env * add todo about attrs * draft of new user guide page on testing * types for dataarrays strategy * draft for chained chunking example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * only accept strategy objects * fixed failure with passing in two custom strategies that must be compatible * syntax error in example * allow sizes dict as argument to variables * copied subsequences_of strategy * coordinate_variables generates non-dimensional coords * dataarrays strategy given nothing working! * improved docstrings * datasets strategy works (given nothing) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pass dims or data to dataarrays() strategy * importorskip hypothesis in tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added warning about inefficient example generation * remove TODO about deterministic examples in docs * un-restrict names strategy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed convert kwarg * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * avoid using subsequences_of * refactored into separate function for unique subset of dims * removed subsequences_of * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix draw(st.booleans()) * remove all references to chunking until chunks strategy merged upstream in dask * added example of complicated strategy for dims dict * remove superfluous utils file * removed elements strategy * removed np_arrays strategy from public API * min_ndims -> min_dims * forbid non-matching dims and data completely * simple test for data_variables strategy * passing arguments to datasets strategy * whatsnew * add attrs strategy * autogenerate attrs for all objects * attempt to make attrs strategy quicker * extend deadline * attempt to speed up attrs strategy * promote all strategies to be functions * valid_dtypes -> numeric_dtypes * changed hypothesis error type * make all strategies keyword-arg only * min_length -> min_side * correct error type * remove coords kwarg * test different types of coordinates are sometimes generated * zip dict Co-authored-by: Zac Hatfield-Dodds <[email protected]> * add dim_names kwarg to dimension_sizes strategy * return a dict from _alignable_variables * add coord_names arg to coordinate_variables strategy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change typing of dims arg * support dims as list to datasets strat when data not given * put coord and data var generation in optional branch to try to improve shrinking * improve simple test example * add documentation on creating duck arrays * okexcept for sparse examples * fix sparse dataarrays example * todo about building a duck array dataset * fix imports and cross-links * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add hypothesis library to intersphinx mapping * fix many links * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed all local mypy errors * move numpy strategies import * reduce sizes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix some api links in docs * remove every strategy beyond variables * variable strategy now accepts callable generating array strategies * use only readable unicode characters in names * examples * only use unicode characters that docs can deal with * docs: dataarrays -> variables * update tests for variables strategy * test values in attrs dict * duck array type examples * altered whatsnew * maybe fix mypy * fix some mypy errors * more typing changes * fix import * skip doctests in docstrings * fix link to duckarrays page * don't actually try to run cupy in docs env * missed a skip * okwarning * just remove the cupy example * ensure shape is always passed to array_strategy_fn * test using make_strategies_namespace * test catching array_strategy_fn that returns different dtype * test catching array_strategy_fn that returns different shape * generalise test of attrs strategy * remove misguided comments * save working version of test_mean * expose unique_subset_of * generalize unique_subset_of to handle iterables * type hint unique_subset_of using overloads * use iterables in test_mean example * test_mean example in docs now uses iterable of dimension_names * fix some warnings in docs build * example of passing list to unique_subset_of * fix import in docs page * try to satisfy sphinx * Minor corrections to docs * Add supported_dtypes to list of public strategies in docs * Generate number of dimensions in test_given_arbitrary_dims_list Co-authored-by: Zac Hatfield-Dodds <[email protected]> * Update minimum version of hypothesis Co-authored-by: Zac Hatfield-Dodds <[email protected]> * fix incorrect indentation in autosummary * link to docs page on testing * use warning imperative for array API non-compliant dtypes * fix bugs in sparse examples * add tag for array API standard info * move no-dependencies-on-other-values-inputs to given decorator * generate everything that can be generated * fix internal link to page on strategies * split up TypeError messages for each arg * use hypothesis.errors.InvalidArgument * generalize tests for generating specific number of dimensions * fix some typing errors * test that reduction example in docs actually works * fix typing errors * simply generation of sparse arrays in example * fix impot in docs example * correct type hints in sparse example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use .copy in convert_to_sparse Co-authored-by: Justus Magin <[email protected]> * Use st.builds in sparse example Co-authored-by: Justus Magin <[email protected]> * correct intersphinx link in whatsnew * rename module containing assertion functions * clarify sentence * add general ImportError if hypothesis not installed * add See Also link to strategies docs page from docstring of every strategy * typo in ImportError message * remove extra blank lines in examples * remove smallish_arrays --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zac Hatfield-Dodds <[email protected]> Co-authored-by: Justus Magin <[email protected]>
1 parent 1f94829 commit ab6a255

File tree

14 files changed

+1077
-10
lines changed

14 files changed

+1077
-10
lines changed

ci/requirements/doc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ dependencies:
99
- cartopy
1010
- cfgrib
1111
- dask-core>=2022.1
12+
- hypothesis>=6.75.8
1213
- h5netcdf>=0.13
1314
- ipykernel
1415
- ipywidgets # silence nbsphinx warning

doc/api.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1069,6 +1069,27 @@ Testing
10691069
testing.assert_allclose
10701070
testing.assert_chunks_equal
10711071

1072+
Hypothesis Testing Strategies
1073+
=============================
1074+
1075+
.. currentmodule:: xarray
1076+
1077+
See the :ref:`documentation page on testing <testing.hypothesis>` for a guide on how to use these strategies.
1078+
1079+
.. warning::
1080+
These strategies should be considered highly experimental, and liable to change at any time.
1081+
1082+
.. autosummary::
1083+
:toctree: generated/
1084+
1085+
testing.strategies.supported_dtypes
1086+
testing.strategies.names
1087+
testing.strategies.dimension_names
1088+
testing.strategies.dimension_sizes
1089+
testing.strategies.attrs
1090+
testing.strategies.variables
1091+
testing.strategies.unique_subset_of
1092+
10721093
Exceptions
10731094
==========
10741095

doc/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -326,6 +326,7 @@
326326
"dask": ("https://docs.dask.org/en/latest", None),
327327
"cftime": ("https://unidata.github.io/cftime", None),
328328
"sparse": ("https://sparse.pydata.org/en/latest/", None),
329+
"hypothesis": ("https://hypothesis.readthedocs.io/en/latest/", None),
329330
"cubed": ("https://tom-e-white.com/cubed/", None),
330331
"datatree": ("https://xarray-datatree.readthedocs.io/en/latest/", None),
331332
"xarray-tutorial": ("https://tutorial.xarray.dev/", None),

doc/internals/duck-arrays-integration.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ property needs to obey `numpy's broadcasting rules <https://numpy.org/doc/stable
3131
(see also the `Python Array API standard's explanation <https://data-apis.org/array-api/latest/API_specification/broadcasting.html>`_
3232
of these same rules).
3333

34+
.. _internals.duckarrays.array_api_standard:
35+
3436
Python Array API standard support
3537
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3638

doc/user-guide/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,5 @@ examples that describe many common tasks that you can accomplish with xarray.
2525
dask
2626
plotting
2727
options
28+
testing
2829
duckarrays

doc/user-guide/testing.rst

Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
.. _testing:
2+
3+
Testing your code
4+
=================
5+
6+
.. ipython:: python
7+
:suppress:
8+
9+
import numpy as np
10+
import pandas as pd
11+
import xarray as xr
12+
13+
np.random.seed(123456)
14+
15+
.. _testing.hypothesis:
16+
17+
Hypothesis testing
18+
------------------
19+
20+
.. note::
21+
22+
Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look
23+
at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in
24+
`pytest <https://docs.pytest.org/>`_, and have seen the
25+
`hypothesis library documentation <https://hypothesis.readthedocs.io/>`_.
26+
27+
`The hypothesis library <https://hypothesis.readthedocs.io/>`_ is a powerful tool for property-based testing.
28+
Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many
29+
dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set
30+
of all possible integers via :py:func:`hypothesis.strategies.integers()`.
31+
32+
Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs
33+
that you did not even think to look for!
34+
35+
Strategies
36+
~~~~~~~~~~
37+
38+
Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray
39+
data structures containing arbitrary data. You can use these to efficiently test downstream code,
40+
quickly ensuring that your code can handle xarray objects of all possible structures and contents.
41+
42+
These strategies are accessible in the :py:mod:`xarray.testing.strategies` module, which provides
43+
44+
.. currentmodule:: xarray
45+
46+
.. autosummary::
47+
48+
testing.strategies.supported_dtypes
49+
testing.strategies.names
50+
testing.strategies.dimension_names
51+
testing.strategies.dimension_sizes
52+
testing.strategies.attrs
53+
testing.strategies.variables
54+
testing.strategies.unique_subset_of
55+
56+
These build upon the numpy and array API strategies offered in :py:mod:`hypothesis.extra.numpy` and :py:mod:`hypothesis.extra.array_api`:
57+
58+
.. ipython:: python
59+
60+
import hypothesis.extra.numpy as npst
61+
62+
Generating Examples
63+
~~~~~~~~~~~~~~~~~~~
64+
65+
To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method,
66+
which is a general hypothesis method valid for all strategies.
67+
68+
.. ipython:: python
69+
70+
import xarray.testing.strategies as xrst
71+
72+
xrst.variables().example()
73+
xrst.variables().example()
74+
xrst.variables().example()
75+
76+
You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide
77+
range of data that the xarray strategies can generate.
78+
79+
In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the
80+
:py:func:`hypothesis.given` decorator:
81+
82+
.. ipython:: python
83+
84+
from hypothesis import given
85+
86+
.. ipython:: python
87+
88+
@given(xrst.variables())
89+
def test_function_that_acts_on_variables(var):
90+
assert func(var) == ...
91+
92+
93+
Chaining Strategies
94+
~~~~~~~~~~~~~~~~~~~
95+
96+
Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated
97+
examples.
98+
99+
.. ipython:: python
100+
101+
# generate a Variable containing an array with a complex number dtype, but all other details still arbitrary
102+
from hypothesis.extra.numpy import complex_number_dtypes
103+
104+
xrst.variables(dtype=complex_number_dtypes()).example()
105+
106+
This also works with custom strategies, or strategies defined in other packages.
107+
For example you could imagine creating a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array.
108+
109+
Fixing Arguments
110+
~~~~~~~~~~~~~~~~
111+
112+
If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples
113+
over all other aspects, then use :py:func:`hypothesis.strategies.just()`.
114+
115+
.. ipython:: python
116+
117+
import hypothesis.strategies as st
118+
119+
# Generates only variable objects with dimensions ["x", "y"]
120+
xrst.variables(dims=st.just(["x", "y"])).example()
121+
122+
(This is technically another example of chaining strategies - :py:func:`hypothesis.strategies.just()` is simply a
123+
special strategy that just contains a single example.)
124+
125+
To fix the length of dimensions you can instead pass ``dims`` as a mapping of dimension names to lengths
126+
(i.e. following xarray objects' ``.sizes()`` property), e.g.
127+
128+
.. ipython:: python
129+
130+
# Generates only variables with dimensions ["x", "y"], of lengths 2 & 3 respectively
131+
xrst.variables(dims=st.just({"x": 2, "y": 3})).example()
132+
133+
You can also use this to specify that you want examples which are missing some part of the data structure, for instance
134+
135+
.. ipython:: python
136+
137+
# Generates a Variable with no attributes
138+
xrst.variables(attrs=st.just({})).example()
139+
140+
Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the
141+
objects your chained strategy will generate.
142+
143+
.. ipython:: python
144+
145+
fixed_x_variable_y_maybe_z = st.fixed_dictionaries(
146+
{"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)}
147+
)
148+
fixed_x_variable_y_maybe_z.example()
149+
150+
special_variables = xrst.variables(dims=fixed_x_variable_y_maybe_z)
151+
152+
special_variables.example()
153+
special_variables.example()
154+
155+
Here we have used one of hypothesis' built-in strategies :py:func:`hypothesis.strategies.fixed_dictionaries` to create a
156+
strategy which generates mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want).
157+
This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of
158+
length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2.
159+
By feeding this strategy for dictionaries into the ``dims`` argument of xarray's :py:func:`~st.variables` strategy,
160+
we can generate arbitrary :py:class:`~xarray.Variable` objects whose dimensions will always match these specifications.
161+
162+
Generating Duck-type Arrays
163+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
164+
165+
Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a
166+
numpy array (so-called "duck array wrapping", see :ref:`wrapping numpy-like arrays <internals.duckarrays>`).
167+
168+
Imagine we want to write a strategy which generates arbitrary ``Variable`` objects, each of which wraps a
169+
:py:class:`sparse.COO` array instead of a ``numpy.ndarray``. How could we do that? There are two ways:
170+
171+
1. Create a xarray object with numpy data and use the hypothesis' ``.map()`` method to convert the underlying array to a
172+
different type:
173+
174+
.. ipython:: python
175+
176+
import sparse
177+
178+
.. ipython:: python
179+
180+
def convert_to_sparse(var):
181+
return var.copy(data=sparse.COO.from_numpy(var.to_numpy()))
182+
183+
.. ipython:: python
184+
185+
sparse_variables = xrst.variables(dims=xrst.dimension_names(min_dims=1)).map(
186+
convert_to_sparse
187+
)
188+
189+
sparse_variables.example()
190+
sparse_variables.example()
191+
192+
2. Pass a function which returns a strategy which generates the duck-typed arrays directly to the ``array_strategy_fn`` argument of the xarray strategies:
193+
194+
.. ipython:: python
195+
196+
def sparse_random_arrays(shape: tuple[int]) -> sparse._coo.core.COO:
197+
"""Strategy which generates random sparse.COO arrays"""
198+
if shape is None:
199+
shape = npst.array_shapes()
200+
else:
201+
shape = st.just(shape)
202+
density = st.integers(min_value=0, max_value=1)
203+
# note sparse.random does not accept a dtype kwarg
204+
return st.builds(sparse.random, shape=shape, density=density)
205+
206+
207+
def sparse_random_arrays_fn(
208+
*, shape: tuple[int, ...], dtype: np.dtype
209+
) -> st.SearchStrategy[sparse._coo.core.COO]:
210+
return sparse_random_arrays(shape=shape)
211+
212+
213+
.. ipython:: python
214+
215+
sparse_random_variables = xrst.variables(
216+
array_strategy_fn=sparse_random_arrays_fn, dtype=st.just(np.dtype("float64"))
217+
)
218+
sparse_random_variables.example()
219+
220+
Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you
221+
want to wrap.
222+
223+
Compatibility with the Python Array API Standard
224+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
225+
226+
Xarray aims to be compatible with any duck-array type that conforms to the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_
227+
(see our :ref:`docs on Array API Standard support <internals.duckarrays.array_api_standard>`).
228+
229+
.. warning::
230+
231+
The strategies defined in :py:mod:`testing.strategies` are **not** guaranteed to use array API standard-compliant
232+
dtypes by default.
233+
For example arrays with the dtype ``np.dtype('float16')`` may be generated by :py:func:`testing.strategies.variables`
234+
(assuming the ``dtype`` kwarg was not explicitly passed), despite ``np.dtype('float16')`` not being in the
235+
array API standard.
236+
237+
If the array type you want to generate has an array API-compliant top-level namespace
238+
(e.g. that which is conventionally imported as ``xp`` or similar),
239+
you can use this neat trick:
240+
241+
.. ipython:: python
242+
:okwarning:
243+
244+
from numpy import array_api as xp # available in numpy 1.26.0
245+
246+
from hypothesis.extra.array_api import make_strategies_namespace
247+
248+
xps = make_strategies_namespace(xp)
249+
250+
xp_variables = xrst.variables(
251+
array_strategy_fn=xps.arrays,
252+
dtype=xps.scalar_dtypes(),
253+
)
254+
xp_variables.example()
255+
256+
Another array API-compliant duck array library would replace the import, e.g. ``import cupy as cp`` instead.
257+
258+
Testing over Subsets of Dimensions
259+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
260+
261+
A common task when testing xarray user code is checking that your function works for all valid input dimensions.
262+
We can chain strategies to achieve this, for which the helper strategy :py:func:`~testing.strategies.unique_subset_of`
263+
is useful.
264+
265+
It works for lists of dimension names
266+
267+
.. ipython:: python
268+
269+
dims = ["x", "y", "z"]
270+
xrst.unique_subset_of(dims).example()
271+
xrst.unique_subset_of(dims).example()
272+
273+
as well as for mappings of dimension names to sizes
274+
275+
.. ipython:: python
276+
277+
dim_sizes = {"x": 2, "y": 3, "z": 4}
278+
xrst.unique_subset_of(dim_sizes).example()
279+
xrst.unique_subset_of(dim_sizes).example()
280+
281+
This is useful because operations like reductions can be performed over any subset of the xarray object's dimensions.
282+
For example we can write a pytest test that tests that a reduction gives the expected result when applying that reduction
283+
along any possible valid subset of the Variable's dimensions.
284+
285+
.. code-block:: python
286+
287+
import numpy.testing as npt
288+
289+
290+
@given(st.data(), xrst.variables(dims=xrst.dimension_names(min_dims=1)))
291+
def test_mean(data, var):
292+
"""Test that the mean of an xarray Variable is always equal to the mean of the underlying array."""
293+
294+
# specify arbitrary reduction along at least one dimension
295+
reduction_dims = data.draw(xrst.unique_subset_of(var.dims, min_size=1))
296+
297+
# create expected result (using nanmean because arrays with Nans will be generated)
298+
reduction_axes = tuple(var.get_axis_num(dim) for dim in reduction_dims)
299+
expected = np.nanmean(var.data, axis=reduction_axes)
300+
301+
# assert property is always satisfied
302+
result = var.mean(dim=reduction_dims).data
303+
npt.assert_equal(expected, result)

doc/whats-new.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@ v2023.11.1 (unreleased)
2323
New Features
2424
~~~~~~~~~~~~
2525

26+
- Added hypothesis strategies for generating :py:class:`xarray.Variable` objects containing arbitrary data, useful for parametrizing downstream tests.
27+
Accessible under :py:mod:`testing.strategies`, and documented in a new page on testing in the User Guide.
28+
(:issue:`6911`, :pull:`8404`)
29+
By `Tom Nicholas <https://github.com/TomNicholas>`_.
2630
- :py:meth:`rolling` uses numbagg <https://github.com/numbagg/numbagg>`_ for
2731
most of its computations by default. Numbagg is up to 5x faster than bottleneck
2832
where parallelization is possible. Where parallelization isn't possible — for

xarray/core/types.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,8 @@ def copy(
173173

174174
# Temporary placeholder for indicating an array api compliant type.
175175
# hopefully in the future we can narrow this down more:
176-
T_DuckArray = TypeVar("T_DuckArray", bound=Any)
176+
T_DuckArray = TypeVar("T_DuckArray", bound=Any, covariant=True)
177+
177178

178179
ScalarOrArray = Union["ArrayLike", np.generic, np.ndarray, "DaskArray"]
179180
VarCompatible = Union["Variable", "ScalarOrArray"]

0 commit comments

Comments
 (0)