Skip to content

Commit 41654ef

Browse files
Benoit Bovyshoyer
Benoit Bovy
authored andcommitted
Multi-index levels as coordinates (#947)
* make multi-index levels visible as coordinates * make levels also visible for Dataset * fix unnamed levels * allow providing multi-index levels in .sel * refactored _get_valid_indexers to get_dim_indexers * fix broken tests * refactored accessibility and repr of index levels * do not allow providing both level and dim indexers in .sel * cosmetic changes * change signature of Coordinate.__init__ * check for uniqueness of multi-index level names * no need to check for uniqueness of level names in _level_coords * rewritten checking uniqueness of multi-index level names * fix adding coords/vars with the same name than a multi-index level * check for level/var name conflicts in one place * cosmetic changes * fix Coordinate -> IndexVariable * fix col width when formatting multi-index levels * add tests for IndexVariable new methods and indexing * fix bug in assert_unique_multiindex_level_names * add tests for Dataset * fix appveyor tests * add tests for DataArray * add docs * review changes * remove name argument of IndexVariable
1 parent 64b4f35 commit 41654ef

15 files changed

+451
-61
lines changed

doc/data-structures.rst

+35-4
Original file line numberDiff line numberDiff line change
@@ -115,10 +115,6 @@ If you create a ``DataArray`` by supplying a pandas
115115
df
116116
xr.DataArray(df)
117117
118-
Xarray supports labeling coordinate values with a :py:class:`pandas.MultiIndex`.
119-
While it handles multi-indexes with unnamed levels, it is recommended that you
120-
explicitly set the names of the levels.
121-
122118
DataArray properties
123119
~~~~~~~~~~~~~~~~~~~~
124120

@@ -532,6 +528,41 @@ dimension and whose the values are ``Index`` objects:
532528
533529
ds.indexes
534530
531+
MultiIndex coordinates
532+
~~~~~~~~~~~~~~~~~~~~~~
533+
534+
Xarray supports labeling coordinate values with a :py:class:`pandas.MultiIndex`:
535+
536+
.. ipython:: python
537+
538+
midx = pd.MultiIndex.from_arrays([['R', 'R', 'V', 'V'], [.1, .2, .7, .9]],
539+
names=('band', 'wn'))
540+
mda = xr.DataArray(np.random.rand(4), coords={'spec': midx}, dims='spec')
541+
mda
542+
543+
For convenience multi-index levels are directly accessible as "virtual" or
544+
"derived" coordinates (marked by ``-`` when printing a dataset or data array):
545+
546+
.. ipython:: python
547+
548+
mda['band']
549+
mda.wn
550+
551+
Indexing with multi-index levels is also possible using the ``sel`` method
552+
(see :ref:`multi-level indexing`).
553+
554+
Unlike other coordinates, "virtual" level coordinates are not stored in
555+
the ``coords`` attribute of ``DataArray`` and ``Dataset`` objects
556+
(although they are shown when printing the ``coords`` attribute).
557+
Consequently, most of the coordinates related methods don't apply for them.
558+
It also can't be used to replace one particular level.
559+
560+
Because in a ``DataArray`` or ``Dataset`` object each multi-index level is
561+
accessible as a "virtual" coordinate, its name must not conflict with the names
562+
of the other levels, coordinates and data variables of the same object.
563+
Even though Xarray set default names for multi-indexes with unnamed levels,
564+
it is recommended that you explicitly set the names of the levels.
565+
535566
.. [1] Latitude and longitude are 2D arrays because the dataset uses
536567
`projected coordinates`__. ``reference_time`` refers to the reference time
537568
at which the forecast was made, rather than ``time`` which is the valid time

doc/indexing.rst

+17-3
Original file line numberDiff line numberDiff line change
@@ -325,11 +325,25 @@ Additionally, xarray supports dictionaries:
325325
.. ipython:: python
326326
327327
mda.sel(x={'one': 'a', 'two': 0})
328-
mda.loc[{'one': 'a'}, ...]
328+
329+
For convenience, ``sel`` also accepts multi-index levels directly
330+
as keyword arguments:
331+
332+
.. ipython:: python
333+
334+
mda.sel(one='a', two=0)
335+
336+
Note that using ``sel`` it is not possible to mix a dimension
337+
indexer with level indexers for that dimension
338+
(e.g., ``mda.sel(x={'one': 'a'}, two=0)`` will raise a ``ValueError``).
329339

330340
Like pandas, xarray handles partial selection on multi-index (level drop).
331-
As shown in the last example above, it also renames the dimension / coordinate
332-
when the multi-index is reduced to a single index.
341+
As shown below, it also renames the dimension / coordinate when the
342+
multi-index is reduced to a single index.
343+
344+
.. ipython:: python
345+
346+
mda.loc[{'one': 'a'}, ...]
333347
334348
Unlike pandas, xarray does not guess whether you provide index levels or
335349
dimensions when using ``loc`` in some ambiguous cases. For example, for

doc/whats-new.rst

+7
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,13 @@ By `Robin Wilson <https://github.com/robintw>`_.
5050
and deals with names to ensure a perfect 'roundtrip' capability.
5151
By `Robin Wilson <https://github.com/robintw`_.
5252

53+
- Multi-index levels are now accessible as "virtual" coordinate variables,
54+
e.g., ``ds['time']`` can pull out the ``'time'`` level of a multi-index
55+
(see :ref:`coordinates`). ``sel`` also accepts providing multi-index levels
56+
as keyword arguments, e.g., ``ds.sel(time='2000-01')``
57+
(see :ref:`multi-level indexing`).
58+
By `Benoit Bovy <https://github.com/benbovy>`_.
59+
5360
Bug fixes
5461
~~~~~~~~~
5562

xarray/core/coordinates.py

+21
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,27 @@ def __delitem__(self, key):
215215
del self._data._coords[key]
216216

217217

218+
class DataArrayLevelCoordinates(AbstractCoordinates):
219+
"""Dictionary like container for DataArray MultiIndex level coordinates.
220+
221+
Used for attribute style lookup. Not returned directly by any
222+
public methods.
223+
"""
224+
def __init__(self, dataarray):
225+
self._data = dataarray
226+
227+
@property
228+
def _names(self):
229+
return set(self._data._level_coords)
230+
231+
@property
232+
def variables(self):
233+
level_coords = OrderedDict(
234+
(k, self._data[v].variable.get_level_variable(k))
235+
for k, v in self._data._level_coords.items())
236+
return Frozen(level_coords)
237+
238+
218239
class Indexes(Mapping, formatting.ReprMixin):
219240
"""Ordered Mapping[str, pandas.Index] for xarray objects.
220241
"""

xarray/core/dataarray.py

+23-4
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,13 @@
1414
from . import utils
1515
from .alignment import align
1616
from .common import AbstractArray, BaseDataObject, squeeze
17-
from .coordinates import DataArrayCoordinates, Indexes
17+
from .coordinates import (DataArrayCoordinates, DataArrayLevelCoordinates,
18+
Indexes)
1819
from .dataset import Dataset
1920
from .pycompat import iteritems, basestring, OrderedDict, zip
2021
from .variable import (as_variable, Variable, as_compatible_data, IndexVariable,
21-
default_index_coordinate)
22+
default_index_coordinate,
23+
assert_unique_multiindex_level_names)
2224
from .formatting import format_item
2325

2426

@@ -82,6 +84,8 @@ def _infer_coords_and_dims(shape, coords, dims):
8284
'length %s on the data but length %s on '
8385
'coordinate %r' % (d, sizes[d], s, k))
8486

87+
assert_unique_multiindex_level_names(new_coords)
88+
8589
return new_coords, dims
8690

8791

@@ -421,14 +425,29 @@ def _item_key_to_dict(self, key):
421425
key = indexing.expanded_indexer(key, self.ndim)
422426
return dict(zip(self.dims, key))
423427

428+
@property
429+
def _level_coords(self):
430+
"""Return a mapping of all MultiIndex levels and their corresponding
431+
coordinate name.
432+
"""
433+
level_coords = OrderedDict()
434+
for cname, var in self._coords.items():
435+
if var.ndim == 1:
436+
level_names = var.to_index_variable().level_names
437+
if level_names is not None:
438+
dim, = var.dims
439+
level_coords.update({lname: dim for lname in level_names})
440+
return level_coords
441+
424442
def __getitem__(self, key):
425443
if isinstance(key, basestring):
426444
from .dataset import _get_virtual_variable
427445

428446
try:
429447
var = self._coords[key]
430448
except KeyError:
431-
_, key, var = _get_virtual_variable(self._coords, key)
449+
_, key, var = _get_virtual_variable(
450+
self._coords, key, self._level_coords)
432451

433452
return self._replace_maybe_drop_dims(var, name=key)
434453
else:
@@ -448,7 +467,7 @@ def __delitem__(self, key):
448467
@property
449468
def _attr_sources(self):
450469
"""List of places to look-up items for attribute-style access"""
451-
return [self.coords, self.attrs]
470+
return [self.coords, DataArrayLevelCoordinates(self), self.attrs]
452471

453472
def __contains__(self, key):
454473
return key in self._coords

xarray/core/dataset.py

+51-20
Original file line numberDiff line numberDiff line change
@@ -33,34 +33,48 @@
3333
'quarter']
3434

3535

36-
def _get_virtual_variable(variables, key):
37-
"""Get a virtual variable (e.g., 'time.year') from a dict of
38-
xarray.Variable objects (if possible)
36+
def _get_virtual_variable(variables, key, level_vars={}):
37+
"""Get a virtual variable (e.g., 'time.year' or a MultiIndex level)
38+
from a dict of xarray.Variable objects (if possible)
3939
"""
4040
if not isinstance(key, basestring):
4141
raise KeyError(key)
4242

4343
split_key = key.split('.', 1)
44-
if len(split_key) != 2:
44+
if len(split_key) == 2:
45+
ref_name, var_name = split_key
46+
elif len(split_key) == 1:
47+
ref_name, var_name = key, None
48+
else:
4549
raise KeyError(key)
4650

47-
ref_name, var_name = split_key
48-
ref_var = variables[ref_name]
49-
if ref_var.ndim == 1:
50-
date = ref_var.to_index()
51-
elif ref_var.ndim == 0:
52-
date = pd.Timestamp(ref_var.values)
51+
if ref_name in level_vars:
52+
dim_var = variables[level_vars[ref_name]]
53+
ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
5354
else:
54-
raise KeyError(key)
55+
ref_var = variables[ref_name]
5556

56-
if var_name == 'season':
57-
# TODO: move 'season' into pandas itself
58-
seasons = np.array(['DJF', 'MAM', 'JJA', 'SON'])
59-
month = date.month
60-
data = seasons[(month // 3) % 4]
57+
if var_name is None:
58+
virtual_var = ref_var
59+
var_name = key
6160
else:
62-
data = getattr(date, var_name)
63-
return ref_name, var_name, Variable(ref_var.dims, data)
61+
if ref_var.ndim == 1:
62+
date = ref_var.to_index()
63+
elif ref_var.ndim == 0:
64+
date = pd.Timestamp(ref_var.values)
65+
else:
66+
raise KeyError(key)
67+
68+
if var_name == 'season':
69+
# TODO: move 'season' into pandas itself
70+
seasons = np.array(['DJF', 'MAM', 'JJA', 'SON'])
71+
month = date.month
72+
data = seasons[(month // 3) % 4]
73+
else:
74+
data = getattr(date, var_name)
75+
virtual_var = Variable(ref_var.dims, data)
76+
77+
return ref_name, var_name, virtual_var
6478

6579

6680
def calculate_dimensions(variables):
@@ -411,6 +425,21 @@ def _subset_with_all_valid_coords(self, variables, coord_names, attrs):
411425

412426
return self._construct_direct(variables, coord_names, dims, attrs)
413427

428+
@property
429+
def _level_coords(self):
430+
"""Return a mapping of all MultiIndex levels and their corresponding
431+
coordinate name.
432+
"""
433+
level_coords = OrderedDict()
434+
for cname in self._coord_names:
435+
var = self.variables[cname]
436+
if var.ndim == 1:
437+
level_names = var.to_index_variable().level_names
438+
if level_names is not None:
439+
dim, = var.dims
440+
level_coords.update({lname: dim for lname in level_names})
441+
return level_coords
442+
414443
def _copy_listed(self, names):
415444
"""Create a new Dataset with the listed variables from this dataset and
416445
the all relevant coordinates. Skips all validation.
@@ -423,7 +452,7 @@ def _copy_listed(self, names):
423452
variables[name] = self._variables[name]
424453
except KeyError:
425454
ref_name, var_name, var = _get_virtual_variable(
426-
self._variables, name)
455+
self._variables, name, self._level_coords)
427456
variables[var_name] = var
428457
if ref_name in self._coord_names:
429458
coord_names.add(var_name)
@@ -439,7 +468,8 @@ def _construct_dataarray(self, name):
439468
try:
440469
variable = self._variables[name]
441470
except KeyError:
442-
_, name, variable = _get_virtual_variable(self._variables, name)
471+
_, name, variable = _get_virtual_variable(
472+
self._variables, name, self._level_coords)
443473

444474
coords = OrderedDict()
445475
needed_dims = set(variable.dims)
@@ -508,6 +538,7 @@ def __setitem__(self, key, value):
508538
if utils.is_dict_like(key):
509539
raise NotImplementedError('cannot yet use a dictionary as a key '
510540
'to set Dataset values')
541+
511542
self.update({key: value})
512543

513544
def __delitem__(self, key):

0 commit comments

Comments
 (0)