Skip to content

Commit 299abd6

Browse files
Deprecate ds.dims returning dict (#8500)
* raise FutureWarning * change some internal instances of ds.dims -> ds.sizes * improve clarity of which unexpected errors were raised * whatsnew * return a class which warns if treated like a Mapping * fix failing tests * avoid some warnings in the docs * silence warning caused by #8491 * fix another warning * typing of .get * fix various uses of ds.dims in tests * fix some warnings * add test that FutureWarnings are correctly raised * more fixes to avoid warnings * update tests to avoid warnings * yet more fixes to avoid warnings * also warn in groupby.dims * change groupby tests to match * update whatsnew to include groupby deprecation * filter warning when we actually test ds.dims * remove error I used for debugging --------- Co-authored-by: Deepak Cherian <[email protected]>
1 parent 3fc0ee5 commit 299abd6

20 files changed

+170
-65
lines changed

doc/gallery/plot_cartopy_facetgrid.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
transform=ccrs.PlateCarree(), # the data's projection
3131
col="time",
3232
col_wrap=1, # multiplot settings
33-
aspect=ds.dims["lon"] / ds.dims["lat"], # for a sensible figsize
33+
aspect=ds.sizes["lon"] / ds.sizes["lat"], # for a sensible figsize
3434
subplot_kws={"projection": map_proj}, # the plot's projection
3535
)
3636

doc/user-guide/interpolation.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -292,8 +292,8 @@ Let's see how :py:meth:`~xarray.DataArray.interp` works on real data.
292292
axes[0].set_title("Raw data")
293293
294294
# Interpolated data
295-
new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)
296-
new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)
295+
new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.sizes["lon"] * 4)
296+
new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.sizes["lat"] * 4)
297297
dsi = ds.interp(lat=new_lat, lon=new_lon)
298298
dsi.air.plot(ax=axes[1])
299299
@savefig interpolation_sample3.png width=8in

doc/user-guide/terminology.rst

+4-5
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,9 @@ complete examples, please consult the relevant documentation.*
4747
all but one of these degrees of freedom is fixed. We can think of each
4848
dimension axis as having a name, for example the "x dimension". In
4949
xarray, a ``DataArray`` object's *dimensions* are its named dimension
50-
axes, and the name of the ``i``-th dimension is ``arr.dims[i]``. If an
51-
array is created without dimension names, the default dimension names are
52-
``dim_0``, ``dim_1``, and so forth.
50+
axes ``da.dims``, and the name of the ``i``-th dimension is ``da.dims[i]``.
51+
If an array is created without specifying dimension names, the default dimension
52+
names will be ``dim_0``, ``dim_1``, and so forth.
5353

5454
Coordinate
5555
An array that labels a dimension or set of dimensions of another
@@ -61,8 +61,7 @@ complete examples, please consult the relevant documentation.*
6161
``arr.coords[x]``. A ``DataArray`` can have more coordinates than
6262
dimensions because a single dimension can be labeled by multiple
6363
coordinate arrays. However, only one coordinate array can be a assigned
64-
as a particular dimension's dimension coordinate array. As a
65-
consequence, ``len(arr.dims) <= len(arr.coords)`` in general.
64+
as a particular dimension's dimension coordinate array.
6665

6766
Dimension coordinate
6867
A one-dimensional coordinate array assigned to ``arr`` with both a name

doc/whats-new.rst

+10-3
Original file line numberDiff line numberDiff line change
@@ -66,10 +66,17 @@ Deprecations
6666
currently ``PendingDeprecationWarning``, which are silenced by default. We'll
6767
convert these to ``DeprecationWarning`` in a future release.
6868
By `Maximilian Roos <https://github.com/max-sixty>`_.
69-
- :py:meth:`Dataset.drop` &
70-
:py:meth:`DataArray.drop` are now deprecated, since pending deprecation for
69+
- Raise a ``FutureWarning`` warning that the type of :py:meth:`Dataset.dims` will be changed
70+
from a mapping of dimension names to lengths to a set of dimension names.
71+
This is to increase consistency with :py:meth:`DataArray.dims`.
72+
To access a mapping of dimension names to lengths please use :py:meth:`Dataset.sizes`.
73+
The same change also applies to `DatasetGroupBy.dims`.
74+
(:issue:`8496`, :pull:`8500`)
75+
By `Tom Nicholas <https://github.com/TomNicholas>`_.
76+
- :py:meth:`Dataset.drop` & :py:meth:`DataArray.drop` are now deprecated, since pending deprecation for
7177
several years. :py:meth:`DataArray.drop_sel` & :py:meth:`DataArray.drop_var`
72-
replace them for labels & variables respectively.
78+
replace them for labels & variables respectively. (:pull:`8497`)
79+
By `Maximilian Roos <https://github.com/max-sixty>`_.
7380

7481
Bug fixes
7582
~~~~~~~~~

xarray/core/common.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1167,7 +1167,7 @@ def _dataset_indexer(dim: Hashable) -> DataArray:
11671167
cond_wdim = cond.drop_vars(
11681168
var for var in cond if dim not in cond[var].dims
11691169
)
1170-
keepany = cond_wdim.any(dim=(d for d in cond.dims.keys() if d != dim))
1170+
keepany = cond_wdim.any(dim=(d for d in cond.dims if d != dim))
11711171
return keepany.to_dataarray().any("variable")
11721172

11731173
_get_indexer = (

xarray/core/concat.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -315,7 +315,7 @@ def _calc_concat_over(datasets, dim, dim_names, data_vars: T_DataVars, coords, c
315315
if dim in ds:
316316
ds = ds.set_coords(dim)
317317
concat_over.update(k for k, v in ds.variables.items() if dim in v.dims)
318-
concat_dim_lengths.append(ds.dims.get(dim, 1))
318+
concat_dim_lengths.append(ds.sizes.get(dim, 1))
319319

320320
def process_subset_opt(opt, subset):
321321
if isinstance(opt, str):
@@ -431,7 +431,7 @@ def _parse_datasets(
431431
variables_order: dict[Hashable, Variable] = {} # variables in order of appearance
432432

433433
for ds in datasets:
434-
dims_sizes.update(ds.dims)
434+
dims_sizes.update(ds.sizes)
435435
all_coord_names.update(ds.coords)
436436
data_vars.update(ds.data_vars)
437437
variables_order.update(ds.variables)

xarray/core/dataset.py

+21-19
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@
105105
from xarray.core.utils import (
106106
Default,
107107
Frozen,
108+
FrozenMappingWarningOnValuesAccess,
108109
HybridMappingProxy,
109110
OrderedSet,
110111
_default,
@@ -778,14 +779,15 @@ def dims(self) -> Frozen[Hashable, int]:
778779
779780
Note that type of this object differs from `DataArray.dims`.
780781
See `Dataset.sizes` and `DataArray.sizes` for consistently named
781-
properties.
782+
properties. This property will be changed to return a type more consistent with
783+
`DataArray.dims` in the future, i.e. a set of dimension names.
782784
783785
See Also
784786
--------
785787
Dataset.sizes
786788
DataArray.dims
787789
"""
788-
return Frozen(self._dims)
790+
return FrozenMappingWarningOnValuesAccess(self._dims)
789791

790792
@property
791793
def sizes(self) -> Frozen[Hashable, int]:
@@ -800,7 +802,7 @@ def sizes(self) -> Frozen[Hashable, int]:
800802
--------
801803
DataArray.sizes
802804
"""
803-
return self.dims
805+
return Frozen(self._dims)
804806

805807
@property
806808
def dtypes(self) -> Frozen[Hashable, np.dtype]:
@@ -1411,7 +1413,7 @@ def _copy_listed(self, names: Iterable[Hashable]) -> Self:
14111413
variables[name] = self._variables[name]
14121414
except KeyError:
14131415
ref_name, var_name, var = _get_virtual_variable(
1414-
self._variables, name, self.dims
1416+
self._variables, name, self.sizes
14151417
)
14161418
variables[var_name] = var
14171419
if ref_name in self._coord_names or ref_name in self.dims:
@@ -1426,7 +1428,7 @@ def _copy_listed(self, names: Iterable[Hashable]) -> Self:
14261428
for v in variables.values():
14271429
needed_dims.update(v.dims)
14281430

1429-
dims = {k: self.dims[k] for k in needed_dims}
1431+
dims = {k: self.sizes[k] for k in needed_dims}
14301432

14311433
# preserves ordering of coordinates
14321434
for k in self._variables:
@@ -1448,7 +1450,7 @@ def _construct_dataarray(self, name: Hashable) -> DataArray:
14481450
try:
14491451
variable = self._variables[name]
14501452
except KeyError:
1451-
_, name, variable = _get_virtual_variable(self._variables, name, self.dims)
1453+
_, name, variable = _get_virtual_variable(self._variables, name, self.sizes)
14521454

14531455
needed_dims = set(variable.dims)
14541456

@@ -1475,7 +1477,7 @@ def _item_sources(self) -> Iterable[Mapping[Hashable, Any]]:
14751477
yield HybridMappingProxy(keys=self._coord_names, mapping=self.coords)
14761478

14771479
# virtual coordinates
1478-
yield HybridMappingProxy(keys=self.dims, mapping=self)
1480+
yield HybridMappingProxy(keys=self.sizes, mapping=self)
14791481

14801482
def __contains__(self, key: object) -> bool:
14811483
"""The 'in' operator will return true or false depending on whether
@@ -2569,7 +2571,7 @@ def info(self, buf: IO | None = None) -> None:
25692571
lines = []
25702572
lines.append("xarray.Dataset {")
25712573
lines.append("dimensions:")
2572-
for name, size in self.dims.items():
2574+
for name, size in self.sizes.items():
25732575
lines.append(f"\t{name} = {size} ;")
25742576
lines.append("\nvariables:")
25752577
for name, da in self.variables.items():
@@ -2697,10 +2699,10 @@ def chunk(
26972699
else:
26982700
chunks_mapping = either_dict_or_kwargs(chunks, chunks_kwargs, "chunk")
26992701

2700-
bad_dims = chunks_mapping.keys() - self.dims.keys()
2702+
bad_dims = chunks_mapping.keys() - self.sizes.keys()
27012703
if bad_dims:
27022704
raise ValueError(
2703-
f"chunks keys {tuple(bad_dims)} not found in data dimensions {tuple(self.dims)}"
2705+
f"chunks keys {tuple(bad_dims)} not found in data dimensions {tuple(self.sizes.keys())}"
27042706
)
27052707

27062708
chunkmanager = guess_chunkmanager(chunked_array_type)
@@ -3952,7 +3954,7 @@ def maybe_variable(obj, k):
39523954
try:
39533955
return obj._variables[k]
39543956
except KeyError:
3955-
return as_variable((k, range(obj.dims[k])))
3957+
return as_variable((k, range(obj.sizes[k])))
39563958

39573959
def _validate_interp_indexer(x, new_x):
39583960
# In the case of datetimes, the restrictions placed on indexers
@@ -4176,7 +4178,7 @@ def _rename_vars(
41764178
return variables, coord_names
41774179

41784180
def _rename_dims(self, name_dict: Mapping[Any, Hashable]) -> dict[Hashable, int]:
4179-
return {name_dict.get(k, k): v for k, v in self.dims.items()}
4181+
return {name_dict.get(k, k): v for k, v in self.sizes.items()}
41804182

41814183
def _rename_indexes(
41824184
self, name_dict: Mapping[Any, Hashable], dims_dict: Mapping[Any, Hashable]
@@ -5168,7 +5170,7 @@ def _get_stack_index(
51685170
if dim in self._variables:
51695171
var = self._variables[dim]
51705172
else:
5171-
_, _, var = _get_virtual_variable(self._variables, dim, self.dims)
5173+
_, _, var = _get_virtual_variable(self._variables, dim, self.sizes)
51725174
# dummy index (only `stack_coords` will be used to construct the multi-index)
51735175
stack_index = PandasIndex([0], dim)
51745176
stack_coords = {dim: var}
@@ -5195,7 +5197,7 @@ def _stack_once(
51955197
if any(d in var.dims for d in dims):
51965198
add_dims = [d for d in dims if d not in var.dims]
51975199
vdims = list(var.dims) + add_dims
5198-
shape = [self.dims[d] for d in vdims]
5200+
shape = [self.sizes[d] for d in vdims]
51995201
exp_var = var.set_dims(vdims, shape)
52005202
stacked_var = exp_var.stack(**{new_dim: dims})
52015203
new_variables[name] = stacked_var
@@ -6351,15 +6353,15 @@ def dropna(
63516353
if subset is None:
63526354
subset = iter(self.data_vars)
63536355

6354-
count = np.zeros(self.dims[dim], dtype=np.int64)
6356+
count = np.zeros(self.sizes[dim], dtype=np.int64)
63556357
size = np.int_(0) # for type checking
63566358

63576359
for k in subset:
63586360
array = self._variables[k]
63596361
if dim in array.dims:
63606362
dims = [d for d in array.dims if d != dim]
63616363
count += np.asarray(array.count(dims))
6362-
size += math.prod([self.dims[d] for d in dims])
6364+
size += math.prod([self.sizes[d] for d in dims])
63636365

63646366
if thresh is not None:
63656367
mask = count >= thresh
@@ -7136,7 +7138,7 @@ def _normalize_dim_order(
71367138
f"Dataset: {list(self.dims)}"
71377139
)
71387140

7139-
ordered_dims = {k: self.dims[k] for k in dim_order}
7141+
ordered_dims = {k: self.sizes[k] for k in dim_order}
71407142

71417143
return ordered_dims
71427144

@@ -7396,7 +7398,7 @@ def to_dask_dataframe(
73967398
var = self.variables[name]
73977399
except KeyError:
73987400
# dimension without a matching coordinate
7399-
size = self.dims[name]
7401+
size = self.sizes[name]
74007402
data = da.arange(size, chunks=size, dtype=np.int64)
74017403
var = Variable((name,), data)
74027404

@@ -7469,7 +7471,7 @@ def to_dict(
74697471
d: dict = {
74707472
"coords": {},
74717473
"attrs": decode_numpy_dict_values(self.attrs),
7472-
"dims": dict(self.dims),
7474+
"dims": dict(self.sizes),
74737475
"data_vars": {},
74747476
}
74757477
for k in self.coords:

xarray/core/formatting.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -739,7 +739,7 @@ def dataset_repr(ds):
739739

740740

741741
def diff_dim_summary(a, b):
742-
if a.dims != b.dims:
742+
if a.sizes != b.sizes:
743743
return f"Differing dimensions:\n ({dim_summary(a)}) != ({dim_summary(b)})"
744744
else:
745745
return ""

xarray/core/formatting_html.py

+6-5
Original file line numberDiff line numberDiff line change
@@ -37,17 +37,18 @@ def short_data_repr_html(array) -> str:
3737
return f"<pre>{text}</pre>"
3838

3939

40-
def format_dims(dims, dims_with_index) -> str:
41-
if not dims:
40+
def format_dims(dim_sizes, dims_with_index) -> str:
41+
if not dim_sizes:
4242
return ""
4343

4444
dim_css_map = {
45-
dim: " class='xr-has-index'" if dim in dims_with_index else "" for dim in dims
45+
dim: " class='xr-has-index'" if dim in dims_with_index else ""
46+
for dim in dim_sizes
4647
}
4748

4849
dims_li = "".join(
4950
f"<li><span{dim_css_map[dim]}>" f"{escape(str(dim))}</span>: {size}</li>"
50-
for dim, size in dims.items()
51+
for dim, size in dim_sizes.items()
5152
)
5253

5354
return f"<ul class='xr-dim-list'>{dims_li}</ul>"
@@ -204,7 +205,7 @@ def _mapping_section(
204205

205206

206207
def dim_section(obj) -> str:
207-
dim_list = format_dims(obj.dims, obj.xindexes.dims)
208+
dim_list = format_dims(obj.sizes, obj.xindexes.dims)
208209

209210
return collapsible_section(
210211
"Dimensions", inline_details=dim_list, enabled=False, collapsed=True

xarray/core/groupby.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
from xarray.core.pycompat import integer_types
3737
from xarray.core.types import Dims, QuantileMethods, T_DataArray, T_Xarray
3838
from xarray.core.utils import (
39+
FrozenMappingWarningOnValuesAccess,
3940
either_dict_or_kwargs,
4041
hashable,
4142
is_scalar,
@@ -1519,7 +1520,7 @@ def dims(self) -> Frozen[Hashable, int]:
15191520
if self._dims is None:
15201521
self._dims = self._obj.isel({self._group_dim: self._group_indices[0]}).dims
15211522

1522-
return self._dims
1523+
return FrozenMappingWarningOnValuesAccess(self._dims)
15231524

15241525
def map(
15251526
self,

xarray/core/utils.py

+54
Original file line numberDiff line numberDiff line change
@@ -50,12 +50,15 @@
5050
Collection,
5151
Container,
5252
Hashable,
53+
ItemsView,
5354
Iterable,
5455
Iterator,
56+
KeysView,
5557
Mapping,
5658
MutableMapping,
5759
MutableSet,
5860
Sequence,
61+
ValuesView,
5962
)
6063
from enum import Enum
6164
from typing import (
@@ -473,6 +476,57 @@ def FrozenDict(*args, **kwargs) -> Frozen:
473476
return Frozen(dict(*args, **kwargs))
474477

475478

479+
class FrozenMappingWarningOnValuesAccess(Frozen[K, V]):
480+
"""
481+
Class which behaves like a Mapping but warns if the values are accessed.
482+
483+
Temporary object to aid in deprecation cycle of `Dataset.dims` (see GH issue #8496).
484+
`Dataset.dims` is being changed from returning a mapping of dimension names to lengths to just
485+
returning a frozen set of dimension names (to increase consistency with `DataArray.dims`).
486+
This class retains backwards compatibility but raises a warning only if the return value
487+
of ds.dims is used like a dictionary (i.e. it doesn't raise a warning if used in a way that
488+
would also be valid for a FrozenSet, e.g. iteration).
489+
"""
490+
491+
__slots__ = ("mapping",)
492+
493+
def _warn(self) -> None:
494+
warnings.warn(
495+
"The return type of `Dataset.dims` will be changed to return a set of dimension names in future, "
496+
"in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, "
497+
"please use `Dataset.sizes`.",
498+
FutureWarning,
499+
)
500+
501+
def __getitem__(self, key: K) -> V:
502+
self._warn()
503+
return super().__getitem__(key)
504+
505+
@overload
506+
def get(self, key: K, /) -> V | None:
507+
...
508+
509+
@overload
510+
def get(self, key: K, /, default: V | T) -> V | T:
511+
...
512+
513+
def get(self, key: K, default: T | None = None) -> V | T | None:
514+
self._warn()
515+
return super().get(key, default)
516+
517+
def keys(self) -> KeysView[K]:
518+
self._warn()
519+
return super().keys()
520+
521+
def items(self) -> ItemsView[K, V]:
522+
self._warn()
523+
return super().items()
524+
525+
def values(self) -> ValuesView[V]:
526+
self._warn()
527+
return super().values()
528+
529+
476530
class HybridMappingProxy(Mapping[K, V]):
477531
"""Implements the Mapping interface. Uses the wrapped mapping for item lookup
478532
and a separate wrapped keys collection for iteration.

0 commit comments

Comments
 (0)