Skip to content

Commit

Permalink
Fix map_blocks for DataArray with cftime and sdba_encode_cf (#1674)
Browse files Browse the repository at this point in the history
### What kind of change does this PR introduced
**tl;dr  : me fix issue.**

I think this issue stems from a change in recent xarray, but the case
was untested by xclim. The solution was to iterate over a list that is
"disconnected" from the dictionary.

The issue occurs when a _DataArray_ that has a coordinate made of
`cftime` objects is passed to a `map_blocks`-wrapped function and
xclim's global option `sdba_encode_cf` is True.

The previous encoding code was iterating over the coordinates with
`coords.items()`. When it would modify a the cftime coord, this would
fail as you can't modify a dictionary while it is being iterated upon.

Only happens if the option to encode is activated, only happens if there
is a cftime coord and only happens on `DataArray` objects. I'm not 100%
sure why, but I think `Dataset.coords.items` does not map directly to a
dict. We usually use `Dataset` objects in function wrapped by
`map_blocks` (or `map_groups`). The culprit here was the polynomial's
trend computation which uses a `DataArray`.

### Reason for the apparent dask-dependency
The interesting part is that this bug does not depend on the fact that
the data is using dask or not. But it led me to discover another "issue"
or, rather, "edgecase" .

In `xscen.adjust` we wrap the xclim call with options:
```python3
        with xc.set_options(sdba_encode_cf=True, sdba_extra_output=False):
            out = ADJ.adjust(sim_sel, **xclim_adjust_args)
```
Here, for DQM, the `adjust` method is wrapped by `map_blocks`. When
`sim_sel` uses dask, the actual adjustment is, of course, not executed
here. Only when the computation is triggered will the adjustment
function be run, and only then will it call the polynomial detrending.
But at that moment, the `sdba_encode_cf` option is not set anymore! So
the issue was not raised for the "dask" case.

When `sim_sel` was not using dask, all the code was executed under the
option statement and thus the issue was raised. Which is a bit dumb
because the encoding is a micro-optimization made for dask, it's no use
otherwise.

### Does this PR introduce a breaking change?
No
  • Loading branch information
Zeitsperre authored Mar 12, 2024
2 parents c16a8dc + b733b04 commit bea276a
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 4 deletions.
5 changes: 5 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,15 @@ Announcements
^^^^^^^^^^^^^
* `xclim` has migrated its development branch name from `master` to `main`. (:issue:`1667`, :pull:`1669`).

Bug fixes
^^^^^^^^^
* Fixed an bug in sdba's ``map_groups`` that prevented passing DataArrays with cftime coordinates if the ``sdba_encode_cf`` option was True. (:issue:`1673`, :pull:`1674`).

Internal changes
^^^^^^^^^^^^^^^^
* Added "doymin" and "doymax" to the possible operations of ``generic.stats``. Fixed a warning issue when ``op`` was "integral". (:pull:`1672`).


v0.48.2 (2024-02-26)
--------------------
Contributors to this version: Juliette Lavoie (:user:`juliettelavoie`).
Expand Down
15 changes: 15 additions & 0 deletions tests/test_sdba/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import pytest
import xarray as xr

from xclim import set_options
from xclim.sdba.base import Grouper, Parametrizable, map_blocks, map_groups


Expand Down Expand Up @@ -234,3 +235,17 @@ def func(ds, *, group, lon=None):

with pytest.raises(ValueError, match="cannot be chunked"):
func(xr.Dataset(dict(tas=tas)), group="time")

@pytest.mark.parametrize("use_dask", [True, False])
def test_dataarray_cfencode(self, use_dask, open_dataset):
ds = open_dataset("sdba/CanESM2_1950-2100.nc")
if use_dask:
ds = ds.chunk()

@map_blocks(reduces=["location"], data=[])
def func(ds, *, group):
d = ds.mean("location")
return d.rename("data").to_dataset()

with set_options(sdba_encode_cf=True):
func(ds.convert_calendar("noleap").tasmax, group=Grouper("time"))
10 changes: 6 additions & 4 deletions xclim/sdba/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,6 @@ def _map_blocks(ds, **kwargs): # noqa: C901
f"Dimension {dim} is meant to be added by the "
"computation but it is already on one of the inputs."
)

if uses_dask(ds):
# Use dask if any of the input is dask-backed.
chunks = (
Expand Down Expand Up @@ -673,9 +672,12 @@ def _map_blocks(ds, **kwargs): # noqa: C901
if OPTIONS[SDBA_ENCODE_CF]:
ds = ds.copy()
# Optimization to circumvent the slow pickle.dumps(cftime_array)
for name, crd in ds.coords.items():
if xr.core.common._contains_cftime_datetimes(crd.variable): # noqa
ds[name] = xr.conventions.encode_cf_variable(crd.variable)
# List of the keys to avoid changing the coords dict while iterating over it.
for crd in list(ds.coords.keys()):
if xr.core.common._contains_cftime_datetimes(
ds[crd].variable
): # noqa
ds[crd] = xr.conventions.encode_cf_variable(ds[crd].variable)

def _call_and_transpose_on_exit(dsblock, **f_kwargs):
"""Call the decorated func and transpose to ensure the same dim order as on the template."""
Expand Down

0 comments on commit bea276a

Please sign in to comment.