Skip to content

Commit eb70506

Browse files
committed
Indexes are now optional
1 parent 95bca62 commit eb70506

30 files changed

+1048
-712
lines changed

doc/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ Attributes
4444
Dataset.coords
4545
Dataset.attrs
4646
Dataset.indexes
47+
Dataset.get_index
4748

4849
Dictionary interface
4950
--------------------
@@ -193,6 +194,7 @@ Attributes
193194
DataArray.attrs
194195
DataArray.encoding
195196
DataArray.indexes
197+
DataArray.get_index
196198

197199
**ndarray attributes**:
198200
:py:attr:`~DataArray.ndim`

doc/combining.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,7 @@ have ``NaN`` values. This can be used to combine data with overlapping
205205
coordinates as long as any non-missing values agree or are disjoint:
206206

207207
.. ipython:: python
208+
208209
ds1 = xr.Dataset({'a': ('x', [10, 20, 30, np.nan])}, {'x': [1, 2, 3, 4]})
209210
ds2 = xr.Dataset({'a': ('x', [np.nan, 30, 40, 50])}, {'x': [2, 3, 4, 5]})
210211
xr.merge([ds1, ds2], compat='no_conflicts')

doc/computation.rst

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,9 @@ This means, for example, that you always subtract an array from its transpose:
196196
You can explicitly broadcast xaray data structures by using the
197197
:py:func:`~xarray.broadcast` function:
198198

199-
a2, b2 = xr.broadcast(a, b2)
199+
.. ipython:: python
200+
201+
a2, b2 = xr.broadcast(a, b)
200202
a2
201203
b2
202204
@@ -215,15 +217,18 @@ operations. The default result of a binary operation is by the *intersection*
215217

216218
.. ipython:: python
217219
218-
arr + arr[:1]
220+
arr = xr.DataArray(np.arange(3), [('x', range(3))])
221+
arr + arr[:-1]
219222
220-
If the result would be empty, an error is raised instead:
223+
If coordinate values for a dimension are missing on either argument, all
224+
matching dimensions must have the same size:
221225

222-
.. ipython::
226+
.. ipython:: python
223227
224228
@verbatim
225-
In [1]: arr[:2] + arr[2:]
226-
ValueError: no overlapping labels for some dimensions: ['x']
229+
In [1]: arr + xr.DataArray([1, 2], dims='x')
230+
ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension size(s) {2} than the size of the aligned dimension labels: 3
231+
227232
228233
However, one can explicitly change this default automatic alignment type ("inner")
229234
via :py:func:`~xarray.set_options()` in context manager:

doc/data-structures.rst

Lines changed: 36 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -67,18 +67,33 @@ in with default values:
6767
6868
xr.DataArray(data)
6969
70-
As you can see, dimensions and coordinate arrays corresponding to each
71-
dimension are always present. This behavior is similar to pandas, which fills
72-
in index values in the same way.
70+
As you can see, dimension names are always present in the xarray data model: if
71+
you do not provide them, defaults of the form ``dim_N`` will be created.
72+
73+
.. note::
74+
75+
Prior to xarray v0.9, coordinates corresponding to dimension were *also*
76+
always present in xarray: xarray would create default coordinates of the form
77+
``range(dim_size)`` if coordinates were not supplied explicitly. This is no
78+
longer the case.
7379

7480
Coordinates can take the following forms:
7581

76-
- A list of ``(dim, ticks[, attrs])`` pairs with length equal to the number of dimensions
77-
- A dictionary of ``{coord_name: coord}`` where the values are each a scalar value,
78-
a 1D array or a tuple. Tuples are be in the same form as the above, and
79-
multiple dimensions can be supplied with the form ``(dims, data[, attrs])``.
80-
Supplying as a tuple allows other coordinates than those corresponding to
81-
dimensions (more on these later).
82+
- A list of values with length equal to the number of dimensions, providing
83+
coordinate labels for each dimension. Each value must be of one of the
84+
following forms:
85+
86+
* A :py:class:`~xarray.DataArray` or :py:class:`~xarray.Variable`
87+
* A tuple of the form ``(dims, data[, attrs])``, which is converted into
88+
arguments for :py:class:`~xarray.Variable`
89+
* A pandas object or scalar value, which is converted into a ``DataArray``
90+
* A 1D array or list, which is interpreted as values for a one dimensional
91+
coordinate variable along the same dimension as it's name
92+
93+
- A dictionary of ``{coord_name: coord}`` where values are of the same form
94+
as the list. Supplying coordinates as a dictionary allows other coordinates
95+
than those corresponding to dimensions (more on these later). If you supply
96+
``coords`` as a dictionary, you must explicitly provide ``dims``.
8297

8398
As a list of tuples:
8499

@@ -128,7 +143,7 @@ Let's take a look at the important properties on our array:
128143
foo.attrs
129144
print(foo.name)
130145
131-
You can even modify ``values`` inplace:
146+
You can modify ``values`` inplace:
132147

133148
.. ipython:: python
134149
@@ -228,14 +243,19 @@ Creating a Dataset
228243
To make an :py:class:`~xarray.Dataset` from scratch, supply dictionaries for any
229244
variables (``data_vars``), coordinates (``coords``) and attributes (``attrs``).
230245

231-
``data_vars`` are supplied as a dictionary with each key as the name of the variable and each
246+
- ``data_vars`` should be a dictionary with each key as the name of the variable and each
232247
value as one of:
233-
- A :py:class:`~xarray.DataArray`
234-
- A tuple of the form ``(dims, data[, attrs])``
235-
- A pandas object
236248

237-
``coords`` are supplied as dictionary of ``{coord_name: coord}`` where the values are scalar values,
238-
arrays or tuples in the form of ``(dims, data[, attrs])``.
249+
* A :py:class:`~xarray.DataArray` or :py:class:`~xarray.Variable`
250+
* A tuple of the form ``(dims, data[, attrs])``, which is converted into
251+
arguments for :py:class:`~xarray.Variable`
252+
* A pandas object, which is converted into a ``DataArray``
253+
* A 1D array or list, which is interpreted as values for a one dimensional
254+
coordinate variable along the same dimension as it's name
255+
256+
- ``coords`` should be a dictionary of the same form as ``data_vars``.
257+
258+
- ``attrs`` should be a dictionary.
239259

240260
Let's create some fake data for the example we show above:
241261

@@ -256,10 +276,6 @@ Let's create some fake data for the example we show above:
256276
'reference_time': pd.Timestamp('2014-09-05')})
257277
ds
258278
259-
Notice that we did not explicitly include coordinates for the "x" or "y"
260-
dimensions, so they were filled in array of ascending integers of the proper
261-
length.
262-
263279
Here we pass :py:class:`xarray.DataArray` objects or a pandas object as values
264280
in the dictionary:
265281

doc/examples/quick-overview.rst

Lines changed: 37 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ array or list, with optional *dimensions* and *coordinates*:
2323
.. ipython:: python
2424
2525
xr.DataArray(np.random.randn(2, 3))
26-
data = xr.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])
26+
data = xr.DataArray(np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('x', 'y'))
2727
data
2828
2929
If you supply a pandas :py:class:`~pandas.Series` or
@@ -121,31 +121,55 @@ xarray supports grouped operations using a very similar API to pandas:
121121
data.groupby(labels).mean('y')
122122
data.groupby(labels).apply(lambda x: x - x.min())
123123
124-
Convert to pandas
125-
-----------------
124+
pandas
125+
------
126126

127-
A key feature of xarray is robust conversion to and from pandas objects:
127+
Xarray objects can be easily converted to and from pandas objects:
128128

129129
.. ipython:: python
130130
131-
data.to_series()
132-
data.to_pandas()
131+
series = data.to_series()
132+
series
133133
134-
Datasets and NetCDF
135-
-------------------
134+
# convert back
135+
series.to_xarray()
136136
137-
:py:class:`xarray.Dataset` is a dict-like container of ``DataArray`` objects that share
138-
index labels and dimensions. It looks a lot like a netCDF file:
137+
Datasets
138+
--------
139+
140+
:py:class:`xarray.Dataset` is a dict-like container of aligned ``DataArray``
141+
objects. You can think of it as a multi-dimensional generalization of the
142+
:py:class:`pandas.DataFrame`:
139143

140144
.. ipython:: python
141145
142-
ds = data.to_dataset(name='foo')
146+
ds = xr.Dataset({'foo': data, 'bar': ('x', [1, 2]), 'baz': np.pi})
143147
ds
144148
149+
Use dictionary indexing to pull out ``Dataset`` variables as ``DataArray``
150+
objects:
151+
152+
.. ipython:: python
153+
154+
ds['foo']
155+
156+
Variables in datasets can have different ``dtype`` and even different
157+
dimensions, but all dimensions are assumed to refer to points in the same shared
158+
coordinate system.
159+
145160
You can do almost everything you can do with ``DataArray`` objects with
146-
``Dataset`` objects if you prefer to work with multiple variables at once.
161+
``Dataset`` objects (including indexing and arithmetic) if you prefer to work
162+
with multiple variables at once.
163+
164+
NetCDF
165+
------
166+
167+
NetCDF is the recommended binary serialization format for xarray objects. Users
168+
from the geosciences will recognize that the :py:class:`~xarray.Dataset` data
169+
model looks very similar to a netCDF file (which, in fact, inspired it).
147170

148-
Datasets also let you easily read and write netCDF files:
171+
You can directly read and write xarray objects to disk using :py:meth:`~xarray.Dataset.to_netcdf`, :py:func:`~xarray.open_dataset` and
172+
:py:func:`~xarray.open_dataarray`:
149173

150174
.. ipython:: python
151175

doc/indexing.rst

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
221221

222222
.. ipython:: python
223223
224-
data = xr.DataArray([1, 2, 3], dims='x')
224+
data = xr.DataArray([1, 2, 3], [('x', [0, 1, 2])])
225225
data.sel(x=[1.1, 1.9], method='nearest')
226226
data.sel(x=0.1, method='backfill')
227227
data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
@@ -478,6 +478,30 @@ Both ``reindex_like`` and ``align`` work interchangeably between
478478
# this is a no-op, because there are no shared dimension names
479479
ds.reindex_like(other)
480480
481+
.. _indexing.missing_coordinates:
482+
483+
Missing coordinate labels
484+
-------------------------
485+
486+
Coordinate labels for each dimension are optional (as of xarray v0.9). Label
487+
based indexing with ``.sel`` and ``.loc`` uses standard positional,
488+
integer-based indexing as a fallback for dimensions without a coordinate label:
489+
490+
.. ipython:: python
491+
492+
array = xr.DataArray([1, 2, 3], dims='x')
493+
array.sel(x=[0, -1])
494+
495+
Alignment between xarray objects where one or both do not have coordinate labels
496+
succeeds only if all dimensions of the same name have the same length.
497+
Otherwise, it raises an informative error:
498+
499+
.. ipython::
500+
:verbatim:
501+
502+
In [62]: xr.align(array, array[:2])
503+
ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {2, 3}
504+
481505
Underlying Indexes
482506
------------------
483507

@@ -491,3 +515,11 @@ through the :py:attr:`~xarray.DataArray.indexes` attribute.
491515
arr.indexes
492516
arr.indexes['time']
493517
518+
Use :py:meth:`~xarray.DataArray.get_index` to get an index for a dimension,
519+
falling back to a default :py:class:`pandas.RangeIndex` if it has no coordinate
520+
labels:
521+
522+
.. ipython:: python
523+
524+
array
525+
array.get_index('x')

doc/whats-new.rst

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,31 @@ v0.9.0 (unreleased)
2121
Breaking changes
2222
~~~~~~~~~~~~~~~~
2323

24+
- Index coordinates for each dimensions are now optional, and no longer created
25+
by default. This has a number of implications:
26+
27+
- :py:func:`~align` and :py:meth:`~Dataset.reindex` can now error, if
28+
dimensions labels are missing and dimensions have different sizes.
29+
- Because pandas does not support missing indexes, methods such as
30+
``to_dataframe``/``from_dataframe`` and ``stack``/``unstack`` no longer
31+
roundtrip faithfully on all inputs. Use :py:meth:`~Dataset.reset_index` to
32+
remove undesired indexes.
33+
- ``Dataset.__delitem__`` and :py:meth:`~Dataset.drop` no longer delete/drop
34+
variables that have dimensions matching a deleted/dropped variable.
35+
- ``DataArray.coords.__delitem__`` is now allowed on variables matching
36+
dimension names.
37+
- ``.sel`` and ``.loc`` now handle indexing along a dimension without
38+
coordinate labels by doing integer based indexing. See
39+
:ref:`indexing.missing_coordinates` for an example.
40+
- :py:attr:`~Dataset.indexes` is no longer guaranteed to include all
41+
dimensions names as keys. The new method :py:meth:`~Dataset.get_index` has
42+
been added to get an index for a dimension guaranteed, falling back to
43+
produce a default ``RangeIndex`` if necessary.
44+
2445
- The default behavior of ``merge`` is now ``compat='no_conflicts'``, so some
2546
merges will now succeed in cases that previously raised
2647
``xarray.MergeError``. Set ``compat='broadcast_equals'`` to restore the
27-
previous default.
48+
previous default. See :ref:`combining.no_conflicts` for more details.
2849

2950
Deprecations
3051
~~~~~~~~~~~~
@@ -123,6 +144,13 @@ Bug fixes
123144
should be computed or not.
124145
By `Fabien Maussion <https://github.com/fmaussion>`_.
125146

147+
- Grouping over an dimension with non-unique values with ``groupby`` gives
148+
correct groups.
149+
By `Stephan Hoyer <https://github.com/shoyer>`_.
150+
151+
- Fixed accessing coordinate variables with non-string names from ``.coords``.
152+
By `Stephan Hoyer <https://github.com/shoyer>`_.
153+
126154
.. _whats-new.0.8.2:
127155

128156
v0.8.2 (18 August 2016)
@@ -1242,7 +1270,7 @@ Enhancements
12421270

12431271
.. ipython:: python
12441272
1245-
data = xray.DataArray([1, 2, 3], dims='x')
1273+
data = xray.DataArray([1, 2, 3], [('x', range(3))])
12461274
data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
12471275
12481276
This will be especially useful once pandas 0.16 is released, at which point

xarray/backends/common.py

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -33,25 +33,6 @@ def _decode_variable_name(name):
3333
return name
3434

3535

36-
def is_trivial_index(var):
37-
"""
38-
Determines if in index is 'trivial' meaning that it is
39-
equivalent to np.arange(). This is determined by
40-
checking if there are any attributes or encodings,
41-
if ndims is one, dtype is int and finally by comparing
42-
the actual values to np.arange()
43-
"""
44-
# if either attributes or encodings are defined
45-
# the index is not trivial.
46-
if len(var.attrs) or len(var.encoding):
47-
return False
48-
# if the index is not a 1d integer array
49-
if var.ndim > 1 or not var.dtype.kind == 'i':
50-
return False
51-
arange = np.arange(var.size, dtype=var.dtype)
52-
return np.all(var.values == arange)
53-
54-
5536
def robust_getitem(array, key, catch=Exception, max_retries=6,
5637
initial_delay=500):
5738
"""
@@ -203,12 +184,6 @@ def store_dataset(self, dataset):
203184

204185
def store(self, variables, attributes, check_encoding_set=frozenset()):
205186
self.set_attributes(attributes)
206-
neccesary_dims = [v.dims for v in variables.values()]
207-
neccesary_dims = set(itertools.chain(*neccesary_dims))
208-
# set all non-indexes and any index which is not trivial.
209-
variables = OrderedDict((k, v) for k, v in iteritems(variables)
210-
if not (k in neccesary_dims and
211-
is_trivial_index(v)))
212187
self.set_variables(variables, check_encoding_set)
213188

214189
def set_attributes(self, attributes):

xarray/conventions.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -913,7 +913,7 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
913913
identify coordinates.
914914
drop_variables: string or iterable, optional
915915
A variable or list of variables to exclude from being parsed from the
916-
dataset.This may be useful to drop variables with problems or
916+
dataset. This may be useful to drop variables with problems or
917917
inconsistent values.
918918
919919
Returns
@@ -939,7 +939,7 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
939939
vars, attrs, concat_characters, mask_and_scale, decode_times,
940940
decode_coords, drop_variables=drop_variables)
941941
ds = Dataset(vars, attrs=attrs)
942-
ds = ds.set_coords(coord_names.union(extra_coords))
942+
ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars))
943943
ds._file_obj = file_obj
944944
return ds
945945

0 commit comments

Comments
 (0)