|
| 1 | +.. _testing: |
| 2 | + |
| 3 | +Testing your code |
| 4 | +================= |
| 5 | + |
| 6 | +.. ipython:: python |
| 7 | + :suppress: |
| 8 | +
|
| 9 | + import numpy as np |
| 10 | + import pandas as pd |
| 11 | + import xarray as xr |
| 12 | +
|
| 13 | + np.random.seed(123456) |
| 14 | +
|
| 15 | +.. _testing.hypothesis: |
| 16 | + |
| 17 | +Hypothesis testing |
| 18 | +------------------ |
| 19 | + |
| 20 | +.. note:: |
| 21 | + |
| 22 | + Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look |
| 23 | + at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in |
| 24 | + `pytest <https://docs.pytest.org/>`_, and have seen the |
| 25 | + `hypothesis library documentation <https://hypothesis.readthedocs.io/>`_. |
| 26 | + |
| 27 | +`The hypothesis library <https://hypothesis.readthedocs.io/>`_ is a powerful tool for property-based testing. |
| 28 | +Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many |
| 29 | +dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set |
| 30 | +of all possible integers via :py:func:`hypothesis.strategies.integers()`. |
| 31 | + |
| 32 | +Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs |
| 33 | +that you did not even think to look for! |
| 34 | + |
| 35 | +Strategies |
| 36 | +~~~~~~~~~~ |
| 37 | + |
| 38 | +Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray |
| 39 | +data structures containing arbitrary data. You can use these to efficiently test downstream code, |
| 40 | +quickly ensuring that your code can handle xarray objects of all possible structures and contents. |
| 41 | + |
| 42 | +These strategies are accessible in the :py:mod:`xarray.testing.strategies` module, which provides |
| 43 | + |
| 44 | +.. currentmodule:: xarray |
| 45 | + |
| 46 | +.. autosummary:: |
| 47 | + |
| 48 | + testing.strategies.supported_dtypes |
| 49 | + testing.strategies.names |
| 50 | + testing.strategies.dimension_names |
| 51 | + testing.strategies.dimension_sizes |
| 52 | + testing.strategies.attrs |
| 53 | + testing.strategies.variables |
| 54 | + testing.strategies.unique_subset_of |
| 55 | + |
| 56 | +These build upon the numpy and array API strategies offered in :py:mod:`hypothesis.extra.numpy` and :py:mod:`hypothesis.extra.array_api`: |
| 57 | + |
| 58 | +.. ipython:: python |
| 59 | +
|
| 60 | + import hypothesis.extra.numpy as npst |
| 61 | +
|
| 62 | +Generating Examples |
| 63 | +~~~~~~~~~~~~~~~~~~~ |
| 64 | + |
| 65 | +To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method, |
| 66 | +which is a general hypothesis method valid for all strategies. |
| 67 | + |
| 68 | +.. ipython:: python |
| 69 | +
|
| 70 | + import xarray.testing.strategies as xrst |
| 71 | +
|
| 72 | + xrst.variables().example() |
| 73 | + xrst.variables().example() |
| 74 | + xrst.variables().example() |
| 75 | +
|
| 76 | +You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide |
| 77 | +range of data that the xarray strategies can generate. |
| 78 | + |
| 79 | +In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the |
| 80 | +:py:func:`hypothesis.given` decorator: |
| 81 | + |
| 82 | +.. ipython:: python |
| 83 | +
|
| 84 | + from hypothesis import given |
| 85 | +
|
| 86 | +.. ipython:: python |
| 87 | +
|
| 88 | + @given(xrst.variables()) |
| 89 | + def test_function_that_acts_on_variables(var): |
| 90 | + assert func(var) == ... |
| 91 | +
|
| 92 | +
|
| 93 | +Chaining Strategies |
| 94 | +~~~~~~~~~~~~~~~~~~~ |
| 95 | + |
| 96 | +Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated |
| 97 | +examples. |
| 98 | + |
| 99 | +.. ipython:: python |
| 100 | +
|
| 101 | + # generate a Variable containing an array with a complex number dtype, but all other details still arbitrary |
| 102 | + from hypothesis.extra.numpy import complex_number_dtypes |
| 103 | +
|
| 104 | + xrst.variables(dtype=complex_number_dtypes()).example() |
| 105 | +
|
| 106 | +This also works with custom strategies, or strategies defined in other packages. |
| 107 | +For example you could imagine creating a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array. |
| 108 | + |
| 109 | +Fixing Arguments |
| 110 | +~~~~~~~~~~~~~~~~ |
| 111 | + |
| 112 | +If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples |
| 113 | +over all other aspects, then use :py:func:`hypothesis.strategies.just()`. |
| 114 | + |
| 115 | +.. ipython:: python |
| 116 | +
|
| 117 | + import hypothesis.strategies as st |
| 118 | +
|
| 119 | + # Generates only variable objects with dimensions ["x", "y"] |
| 120 | + xrst.variables(dims=st.just(["x", "y"])).example() |
| 121 | +
|
| 122 | +(This is technically another example of chaining strategies - :py:func:`hypothesis.strategies.just()` is simply a |
| 123 | +special strategy that just contains a single example.) |
| 124 | + |
| 125 | +To fix the length of dimensions you can instead pass ``dims`` as a mapping of dimension names to lengths |
| 126 | +(i.e. following xarray objects' ``.sizes()`` property), e.g. |
| 127 | + |
| 128 | +.. ipython:: python |
| 129 | +
|
| 130 | + # Generates only variables with dimensions ["x", "y"], of lengths 2 & 3 respectively |
| 131 | + xrst.variables(dims=st.just({"x": 2, "y": 3})).example() |
| 132 | +
|
| 133 | +You can also use this to specify that you want examples which are missing some part of the data structure, for instance |
| 134 | + |
| 135 | +.. ipython:: python |
| 136 | +
|
| 137 | + # Generates a Variable with no attributes |
| 138 | + xrst.variables(attrs=st.just({})).example() |
| 139 | +
|
| 140 | +Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the |
| 141 | +objects your chained strategy will generate. |
| 142 | + |
| 143 | +.. ipython:: python |
| 144 | +
|
| 145 | + fixed_x_variable_y_maybe_z = st.fixed_dictionaries( |
| 146 | + {"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)} |
| 147 | + ) |
| 148 | + fixed_x_variable_y_maybe_z.example() |
| 149 | +
|
| 150 | + special_variables = xrst.variables(dims=fixed_x_variable_y_maybe_z) |
| 151 | +
|
| 152 | + special_variables.example() |
| 153 | + special_variables.example() |
| 154 | +
|
| 155 | +Here we have used one of hypothesis' built-in strategies :py:func:`hypothesis.strategies.fixed_dictionaries` to create a |
| 156 | +strategy which generates mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want). |
| 157 | +This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of |
| 158 | +length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2. |
| 159 | +By feeding this strategy for dictionaries into the ``dims`` argument of xarray's :py:func:`~st.variables` strategy, |
| 160 | +we can generate arbitrary :py:class:`~xarray.Variable` objects whose dimensions will always match these specifications. |
| 161 | + |
| 162 | +Generating Duck-type Arrays |
| 163 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 164 | + |
| 165 | +Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a |
| 166 | +numpy array (so-called "duck array wrapping", see :ref:`wrapping numpy-like arrays <internals.duckarrays>`). |
| 167 | + |
| 168 | +Imagine we want to write a strategy which generates arbitrary ``Variable`` objects, each of which wraps a |
| 169 | +:py:class:`sparse.COO` array instead of a ``numpy.ndarray``. How could we do that? There are two ways: |
| 170 | + |
| 171 | +1. Create a xarray object with numpy data and use the hypothesis' ``.map()`` method to convert the underlying array to a |
| 172 | +different type: |
| 173 | + |
| 174 | +.. ipython:: python |
| 175 | +
|
| 176 | + import sparse |
| 177 | +
|
| 178 | +.. ipython:: python |
| 179 | +
|
| 180 | + def convert_to_sparse(var): |
| 181 | + return var.copy(data=sparse.COO.from_numpy(var.to_numpy())) |
| 182 | +
|
| 183 | +.. ipython:: python |
| 184 | +
|
| 185 | + sparse_variables = xrst.variables(dims=xrst.dimension_names(min_dims=1)).map( |
| 186 | + convert_to_sparse |
| 187 | + ) |
| 188 | +
|
| 189 | + sparse_variables.example() |
| 190 | + sparse_variables.example() |
| 191 | +
|
| 192 | +2. Pass a function which returns a strategy which generates the duck-typed arrays directly to the ``array_strategy_fn`` argument of the xarray strategies: |
| 193 | + |
| 194 | +.. ipython:: python |
| 195 | +
|
| 196 | + def sparse_random_arrays(shape: tuple[int]) -> sparse._coo.core.COO: |
| 197 | + """Strategy which generates random sparse.COO arrays""" |
| 198 | + if shape is None: |
| 199 | + shape = npst.array_shapes() |
| 200 | + else: |
| 201 | + shape = st.just(shape) |
| 202 | + density = st.integers(min_value=0, max_value=1) |
| 203 | + # note sparse.random does not accept a dtype kwarg |
| 204 | + return st.builds(sparse.random, shape=shape, density=density) |
| 205 | +
|
| 206 | +
|
| 207 | + def sparse_random_arrays_fn( |
| 208 | + *, shape: tuple[int, ...], dtype: np.dtype |
| 209 | + ) -> st.SearchStrategy[sparse._coo.core.COO]: |
| 210 | + return sparse_random_arrays(shape=shape) |
| 211 | +
|
| 212 | +
|
| 213 | +.. ipython:: python |
| 214 | +
|
| 215 | + sparse_random_variables = xrst.variables( |
| 216 | + array_strategy_fn=sparse_random_arrays_fn, dtype=st.just(np.dtype("float64")) |
| 217 | + ) |
| 218 | + sparse_random_variables.example() |
| 219 | +
|
| 220 | +Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you |
| 221 | +want to wrap. |
| 222 | + |
| 223 | +Compatibility with the Python Array API Standard |
| 224 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 225 | + |
| 226 | +Xarray aims to be compatible with any duck-array type that conforms to the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_ |
| 227 | +(see our :ref:`docs on Array API Standard support <internals.duckarrays.array_api_standard>`). |
| 228 | + |
| 229 | +.. warning:: |
| 230 | + |
| 231 | + The strategies defined in :py:mod:`testing.strategies` are **not** guaranteed to use array API standard-compliant |
| 232 | + dtypes by default. |
| 233 | + For example arrays with the dtype ``np.dtype('float16')`` may be generated by :py:func:`testing.strategies.variables` |
| 234 | + (assuming the ``dtype`` kwarg was not explicitly passed), despite ``np.dtype('float16')`` not being in the |
| 235 | + array API standard. |
| 236 | + |
| 237 | +If the array type you want to generate has an array API-compliant top-level namespace |
| 238 | +(e.g. that which is conventionally imported as ``xp`` or similar), |
| 239 | +you can use this neat trick: |
| 240 | + |
| 241 | +.. ipython:: python |
| 242 | + :okwarning: |
| 243 | +
|
| 244 | + from numpy import array_api as xp # available in numpy 1.26.0 |
| 245 | +
|
| 246 | + from hypothesis.extra.array_api import make_strategies_namespace |
| 247 | +
|
| 248 | + xps = make_strategies_namespace(xp) |
| 249 | +
|
| 250 | + xp_variables = xrst.variables( |
| 251 | + array_strategy_fn=xps.arrays, |
| 252 | + dtype=xps.scalar_dtypes(), |
| 253 | + ) |
| 254 | + xp_variables.example() |
| 255 | +
|
| 256 | +Another array API-compliant duck array library would replace the import, e.g. ``import cupy as cp`` instead. |
| 257 | + |
| 258 | +Testing over Subsets of Dimensions |
| 259 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 260 | + |
| 261 | +A common task when testing xarray user code is checking that your function works for all valid input dimensions. |
| 262 | +We can chain strategies to achieve this, for which the helper strategy :py:func:`~testing.strategies.unique_subset_of` |
| 263 | +is useful. |
| 264 | + |
| 265 | +It works for lists of dimension names |
| 266 | + |
| 267 | +.. ipython:: python |
| 268 | +
|
| 269 | + dims = ["x", "y", "z"] |
| 270 | + xrst.unique_subset_of(dims).example() |
| 271 | + xrst.unique_subset_of(dims).example() |
| 272 | +
|
| 273 | +as well as for mappings of dimension names to sizes |
| 274 | + |
| 275 | +.. ipython:: python |
| 276 | +
|
| 277 | + dim_sizes = {"x": 2, "y": 3, "z": 4} |
| 278 | + xrst.unique_subset_of(dim_sizes).example() |
| 279 | + xrst.unique_subset_of(dim_sizes).example() |
| 280 | +
|
| 281 | +This is useful because operations like reductions can be performed over any subset of the xarray object's dimensions. |
| 282 | +For example we can write a pytest test that tests that a reduction gives the expected result when applying that reduction |
| 283 | +along any possible valid subset of the Variable's dimensions. |
| 284 | + |
| 285 | +.. code-block:: python |
| 286 | +
|
| 287 | + import numpy.testing as npt |
| 288 | +
|
| 289 | +
|
| 290 | + @given(st.data(), xrst.variables(dims=xrst.dimension_names(min_dims=1))) |
| 291 | + def test_mean(data, var): |
| 292 | + """Test that the mean of an xarray Variable is always equal to the mean of the underlying array.""" |
| 293 | +
|
| 294 | + # specify arbitrary reduction along at least one dimension |
| 295 | + reduction_dims = data.draw(xrst.unique_subset_of(var.dims, min_size=1)) |
| 296 | +
|
| 297 | + # create expected result (using nanmean because arrays with Nans will be generated) |
| 298 | + reduction_axes = tuple(var.get_axis_num(dim) for dim in reduction_dims) |
| 299 | + expected = np.nanmean(var.data, axis=reduction_axes) |
| 300 | +
|
| 301 | + # assert property is always satisfied |
| 302 | + result = var.mean(dim=reduction_dims).data |
| 303 | + npt.assert_equal(expected, result) |
0 commit comments