Skip to content

Commit 3865ff2

Browse files
committed
Merge pull request #512 from shoyer/slice-indexing-never-copies
Clarify rules for copies vs views when indexing
2 parents 511da36 + 5814a12 commit 3865ff2

File tree

4 files changed

+137
-122
lines changed

4 files changed

+137
-122
lines changed

doc/indexing.rst

Lines changed: 118 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,12 @@ DataArray:
5353
arr[0, 0]
5454
arr[:, [2, 1]]
5555
56+
Attributes are persisted in all indexing operations.
57+
5658
.. warning::
5759

5860
Positional indexing deviates from the NumPy when indexing with multiple
59-
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`indexing details`.
61+
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`orthogonal`.
6062
See :ref:`pointwise indexing` for how to achieve this functionality in xray.
6163

6264
xray also supports label-based indexing, just like pandas. Because
@@ -81,6 +83,7 @@ Setting values with label based indexing is also supported:
8183
arr.loc['2000-01-01', ['IL', 'IN']] = -10
8284
arr
8385
86+
8487
Indexing with labeled dimensions
8588
--------------------------------
8689

@@ -204,39 +207,132 @@ index labels along a dimension dropped:
204207
205208
``drop`` is both a ``Dataset`` and ``DataArray`` method.
206209

207-
.. _indexing details:
210+
.. _nearest neighbor lookups:
208211

209-
Indexing details
210-
----------------
212+
Nearest neighbor lookups
213+
------------------------
214+
215+
The label based selection methods :py:meth:`~xray.Dataset.sel`,
216+
:py:meth:`~xray.Dataset.reindex` and :py:meth:`~xray.Dataset.reindex_like` all
217+
support a ``method`` keyword argument. The method parameter allows for
218+
enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
219+
``'backfill'`` or ``'nearest'``:
220+
221+
.. ipython:: python
222+
223+
data = xray.DataArray([1, 2, 3], dims='x')
224+
data.sel(x=[1.1, 1.9], method='nearest')
225+
data.sel(x=0.1, method='backfill')
226+
data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
227+
228+
Using ``method='nearest'`` or a scalar argument with ``.sel()`` requires pandas
229+
version 0.16 or newer.
230+
231+
The method parameter is not yet supported if any of the arguments
232+
to ``.sel()`` is a ``slice`` object:
233+
234+
.. ipython::
235+
:verbatim:
236+
237+
In [1]: data.sel(x=slice(1, 3), method='nearest')
238+
NotImplementedError
239+
240+
However, you don't need to use ``method`` to do inexact slicing. Slicing
241+
already returns all values inside the range (inclusive), as long as the index
242+
labels are monotonic increasing:
243+
244+
.. ipython:: python
245+
246+
data.sel(x=slice(0.9, 3.1))
247+
248+
Indexing axes with monotonic decreasing labels also works, as long as the
249+
``slice`` or ``.loc`` arguments are also decreasing:
211250

212-
Like pandas, whether array indexing returns a view or a copy of the underlying
213-
data depends entirely on numpy:
251+
.. ipython:: python
214252
215-
* Indexing with a single label or a slice returns a view.
216-
* Indexing with a vector of array labels returns a copy.
253+
reversed_data = data[::-1]
254+
reversed_data.loc[3.1:0.9]
217255
218-
Attributes are persisted in array indexing:
256+
.. _masking with where:
257+
258+
Masking with ``where``
259+
----------------------
260+
261+
Indexing methods on xray objects generally return a subset of the original data.
262+
However, it is sometimes useful to select an object with the same shape as the
263+
original data, but with some elements masked. To do this type of selection in
264+
xray, use :py:meth:`~xray.DataArray.where`:
219265

220266
.. ipython:: python
221267
222-
arr2 = arr.copy()
223-
arr2.attrs['units'] = 'meters'
224-
arr2[0, 0].attrs
268+
arr = xray.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])
269+
arr.where(arr.x + arr.y < 4)
270+
271+
This is particularly useful for ragged indexing of multi-dimensional data,
272+
e.g., to apply a 2D mask to an image. Note that ``where`` follows all the
273+
usual xray broadcasting and alignment rules for binary operations (e.g.,
274+
``+``) between the object being indexed and the condition, as described in
275+
:ref:`comput`:
276+
277+
.. ipython:: python
278+
279+
arr.where(arr.y < 2)
280+
281+
Multi-dimensional indexing
282+
--------------------------
283+
284+
Xray does not yet support efficient routines for generalized multi-dimensional
285+
indexing or regridding. However, we are definitely interested in adding support
286+
for this in the future (see :issue:`475` for the ongoing discussion).
287+
288+
.. _copies vs views:
289+
290+
Copies vs. views
291+
----------------
292+
293+
Whether array indexing returns a view or a copy of the underlying
294+
data depends on the nature of the labels. For positional (integer)
295+
indexing, xray follows the same rules as NumPy:
296+
297+
* Positional indexing with only integers and slices returns a view.
298+
* Positional indexing with arrays or lists returns a copy.
299+
300+
The rules for label based indexing are more complex:
301+
302+
* Label-based indexing with only slices returns a view.
303+
* Label-based indexing with arrays returns a copy.
304+
* Label-based indexing with scalars returns a view or a copy, depending
305+
upon if the corresponding positional indexer can be represented as an
306+
integer or a slice object. The exact rules are determined by pandas.
307+
308+
Whether data is a copy or a view is more predictable in xray than in pandas, so
309+
unlike pandas, xray does not produce `SettingWithCopy warnings`_. However, you
310+
should still avoid assignment with chained indexing.
311+
312+
.. _SettingWithCopy warnings: http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy
313+
314+
.. _orthogonal:
315+
316+
Orthogonal (outer) vs. vectorized indexing
317+
------------------------------------------
225318

226319
Indexing with xray objects has one important difference from indexing numpy
227320
arrays: you can only use one-dimensional arrays to index xray objects, and
228321
each indexer is applied "orthogonally" along independent axes, instead of
229-
using numpy's advanced broadcasting. This means you can do indexing like this,
230-
which would require slightly more awkward syntax with numpy arrays:
322+
using numpy's broadcasting rules to vectorize indexers. This means you can do
323+
indexing like this, which would require slightly more awkward syntax with
324+
numpy arrays:
231325

232326
.. ipython:: python
233327
234328
arr[arr['time.day'] > 1, arr['space'] != 'IL']
235329
236-
This is a much simpler model than numpy's `advanced indexing`__,
237-
and is basically the only model that works for labeled arrays. If you would
238-
like to do array indexing, you can always index ``.values`` directly
239-
instead:
330+
This is a much simpler model than numpy's `advanced indexing`__. If you would
331+
like to do advanced-style array indexing in xray, you have several options:
332+
333+
* :ref:`pointwise indexing`
334+
* :ref:`masking with where`
335+
* Index the underlying NumPy directly array using ``.values``:
240336

241337
__ http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
242338

@@ -255,6 +351,10 @@ original values are subset to the index labels still found in the new labels,
255351
and values corresponding to new labels not found in the original object are
256352
in-filled with `NaN`.
257353

354+
Xray operations that combine multiple objects generally automatically align
355+
their arguments to share the same indexes. However, manual alignment can be
356+
useful for greater control and for increased performance.
357+
258358
To reindex a particular dimension, use :py:meth:`~xray.DataArray.reindex`:
259359

260360
.. ipython:: python
@@ -302,103 +402,3 @@ Both ``reindex_like`` and ``align`` work interchangeably between
302402
other = xray.DataArray(['a', 'b', 'c'], dims='other')
303403
# this is a no-op, because there are no shared dimension names
304404
ds.reindex_like(other)
305-
306-
.. _nearest neighbor lookups:
307-
308-
Nearest neighbor lookups
309-
------------------------
310-
311-
The label based selection methods :py:meth:`~xray.Dataset.sel`,
312-
:py:meth:`~xray.Dataset.reindex` and :py:meth:`~xray.Dataset.reindex_like` all
313-
support a ``method`` keyword argument. The method parameter allows for
314-
enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
315-
``'backfill'`` or ``'nearest'``:
316-
317-
.. use verbatim because I can't seem to install pandas 0.16.1 on RTD :(
318-
319-
.. .. ipython::
320-
:verbatim:
321-
In [35]: data = xray.DataArray([1, 2, 3], dims='x')
322-
In [36]: data.sel(x=[1.1, 1.9], method='nearest')
323-
Out[36]:
324-
<xray.DataArray (x: 2)>
325-
array([2, 3])
326-
Coordinates:
327-
* x (x) int64 1 2
328-
In [37]: data.sel(x=0.1, method='backfill')
329-
Out[37]:
330-
<xray.DataArray ()>
331-
array(2)
332-
Coordinates:
333-
x int64 1
334-
In [38]: data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
335-
Out[38]:
336-
<xray.DataArray (x: 5)>
337-
array([1, 2, 2, 3, 3])
338-
Coordinates:
339-
* x (x) float64 0.5 1.0 1.5 2.0 2.5
340-
341-
.. ipython:: python
342-
343-
data = xray.DataArray([1, 2, 3], dims='x')
344-
data.sel(x=[1.1, 1.9], method='nearest')
345-
data.sel(x=0.1, method='backfill')
346-
data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
347-
348-
Using ``method='nearest'`` or a scalar argument with ``.sel()`` requires pandas
349-
version 0.16 or newer.
350-
351-
The method parameter is not yet supported if any of the arguments
352-
to ``.sel()`` is a ``slice`` object:
353-
354-
.. ipython::
355-
:verbatim:
356-
357-
In [1]: data.sel(x=slice(1, 3), method='nearest')
358-
NotImplementedError
359-
360-
However, you don't need to use ``method`` to do inexact slicing. Slicing
361-
already returns all values inside the range (inclusive), as long as the index
362-
labels are monotonic increasing:
363-
364-
.. ipython:: python
365-
366-
data.sel(x=slice(0.9, 3.1))
367-
368-
Indexing axes with monotonic decreasing labels also works, as long as the
369-
``slice`` or ``.loc`` arguments are also decreasing:
370-
371-
.. ipython:: python
372-
373-
reversed_data = data[::-1]
374-
reversed_data.loc[3.1:0.9]
375-
376-
Masking with ``where``
377-
----------------------
378-
379-
Indexing methods on xray objects generally return a subset of the original data.
380-
However, it is sometimes useful to select an object with the same shape as the
381-
original data, but with some elements masked. To do this type of selection in
382-
xray, use :py:meth:`~xray.DataArray.where`:
383-
384-
.. ipython:: python
385-
386-
arr = xray.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])
387-
arr.where(arr.x + arr.y < 4)
388-
389-
This is particularly useful for ragged indexing of multi-dimensional data,
390-
e.g., to apply a 2D mask to an image. Note that ``where`` follows all the
391-
usual xray broadcasting and alignment rules for binary operations (e.g.,
392-
``+``) between the object being indexed and the condition, as described in
393-
:ref:`comput`:
394-
395-
.. ipython:: python
396-
397-
arr.where(arr.y < 2)
398-
399-
Multi-dimensional indexing
400-
--------------------------
401-
402-
Xray does not yet support efficient routines for generalized multi-dimensional
403-
indexing or regridding. However, we are definitely interested in adding support
404-
for this in the future (see :issue:`475` for the ongoing discussion).

doc/whats-new.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ v0.5.3 (unreleased)
2828

2929
- Variables in netCDF files with multiple missing values are now decoded as NaN
3030
after issuing a warning if open_dataset is called with mask_and_scale=True.
31-
31+
- We clarified our rules for when the result from an xray operation is a copy
32+
vs. a view (see :ref:`copies vs views` for more details).
3233
- Dataset variables are now written to netCDF files in order of appearance
3334
when using the netcdf4 backend (:issue:`479`).
3435

xray/core/indexing.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,12 @@ def convert_label_indexer(index, label, index_name='', method=None):
140140
indexer = index.slice_indexer(_try_get_item(label.start),
141141
_try_get_item(label.stop),
142142
_try_get_item(label.step))
143+
if not isinstance(indexer, slice):
144+
# unlike pandas, in xray we never want to silently convert a slice
145+
# indexer into an array indexer
146+
raise KeyError('cannot represent labeled-based slice indexer for '
147+
'dimension %r with a slice over integer positions; '
148+
'the index is unsorted or non-unique')
143149
else:
144150
label = np.asarray(label)
145151
if label.ndim == 0:
@@ -149,8 +155,8 @@ def convert_label_indexer(index, label, index_name='', method=None):
149155
else:
150156
indexer = index.get_indexer(label, method=method)
151157
if np.any(indexer < 0):
152-
raise ValueError('not all values found in index %r'
153-
% index_name)
158+
raise KeyError('not all values found in index %r'
159+
% index_name)
154160
return indexer
155161

156162

xray/test/test_indexing.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,11 +70,19 @@ def test_orthogonal_indexer(self):
7070
def test_convert_label_indexer(self):
7171
# TODO: add tests that aren't just for edge cases
7272
index = pd.Index([1, 2, 3])
73-
with self.assertRaisesRegexp(ValueError, 'not all values found'):
73+
with self.assertRaisesRegexp(KeyError, 'not all values found'):
7474
indexing.convert_label_indexer(index, [0])
7575
with self.assertRaises(KeyError):
7676
indexing.convert_label_indexer(index, 0)
7777

78+
def test_convert_unsorted_datetime_index_raises(self):
79+
index = pd.to_datetime(['2001', '2000', '2002'])
80+
with self.assertRaises(KeyError):
81+
# pandas will try to convert this into an array indexer. We should
82+
# raise instead, so we can be sure the result of indexing with a
83+
# slice is always a view.
84+
indexing.convert_label_indexer(index, slice('2001', '2002'))
85+
7886
def test_remap_label_indexers(self):
7987
# TODO: fill in more tests!
8088
data = Dataset({'x': ('x', [1, 2, 3])})

0 commit comments

Comments
 (0)