Skip to content

document the duck array integration status #4530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Nov 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions doc/duckarrays.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
.. currentmodule:: xarray

Working with numpy-like arrays
==============================

.. warning::

This feature should be considered experimental. Please report any bug you may find on
xarray’s github repository.

Numpy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with
additional features, like propagating physical units or a different layout in memory.

:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as
long as they satisfy certain conditions (see :ref:`internals.duck_arrays`).
Comment on lines +14 to +15
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to point to the explicitly tested duck arrays here (pint, sparse)? We could also add a user-maintained list of duck array libraries, just like the current "related projects" list.

I did think about adding usage examples, but maybe it's better to leave that to the extension packages?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a list would be nice.


.. note::

For ``dask`` support see :ref:`dask`.


Missing features
----------------
Most of the API does support :term:`duck array` objects, but there are a few areas where
the code will still cast to ``numpy`` arrays:

- dimension coordinates, and thus all indexing operations:

* :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel`
* :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc`
* :py:meth:`Dataset.drop_sel` and :py:meth:`DataArray.drop_sel`
* :py:meth:`Dataset.reindex`, :py:meth:`Dataset.reindex_like`,
:py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in
data variables and non-dimension coordinates won't be casted

- functions and methods that depend on external libraries or features of ``numpy`` not
covered by ``__array_function__`` / ``__array_ufunc__``:

* :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``)
* :py:meth:`Dataset.bfill` and :py:meth:`DataArray.bfill` (uses ``bottleneck``)
* :py:meth:`Dataset.interp`, :py:meth:`Dataset.interp_like`,
:py:meth:`DataArray.interp` and :py:meth:`DataArray.interp_like` (uses ``scipy``):
duck arrays in data variables and non-dimension coordinates will be casted in
addition to not supporting duck arrays in dimension coordinates
* :py:meth:`Dataset.rolling_exp` and :py:meth:`DataArray.rolling_exp` (uses
``numbagg``)
* :py:meth:`Dataset.rolling` and :py:meth:`DataArray.rolling` (uses internal functions
of ``numpy``)
* :py:meth:`Dataset.interpolate_na` and :py:meth:`DataArray.interpolate_na` (uses
:py:class:`numpy.vectorize`)
* :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`)

- incompatibilities between different :term:`duck array` libraries:

* :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was
not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should
wrap the new ``dask`` array; changing the chunk sizes works.


Extensions using duck arrays
----------------------------
Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays
easier:

- `pint-xarray <https://github.com/xarray-contrib/pint-xarray>`_
2 changes: 2 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Documentation
* :doc:`io`
* :doc:`dask`
* :doc:`plotting`
* :doc:`duckarrays`

.. toctree::
:maxdepth: 1
Expand All @@ -80,6 +81,7 @@ Documentation
io
dask
plotting
duckarrays

**Help & reference**

Expand Down
15 changes: 8 additions & 7 deletions doc/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,21 +42,24 @@ xarray objects via the (readonly) :py:attr:`Dataset.variables
<xarray.Dataset.variables>` and
:py:attr:`DataArray.variable <xarray.DataArray.variable>` attributes.

Duck arrays
-----------

.. _internals.duck_arrays:

Integrating with duck arrays
----------------------------

.. warning::

This is a experimental feature.

xarray can wrap custom `duck array`_ objects as long as they define numpy's
xarray can wrap custom :term:`duck array` objects as long as they define numpy's
``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``,
``__array_ufunc__`` and ``__array_function__`` methods.

In certain situations (e.g. when printing the collapsed preview of
variables of a ``Dataset``), xarray will display the repr of a `duck array`_
variables of a ``Dataset``), xarray will display the repr of a :term:`duck array`
in a single line, truncating it to a certain number of characters. If that
would drop too much information, the `duck array`_ may define a
would drop too much information, the :term:`duck array` may define a
``_repr_inline_`` method that takes ``max_width`` (number of characters) as an
argument:

Expand All @@ -71,8 +74,6 @@ argument:

...

.. _duck array: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html


Extending xarray
----------------
Expand Down
8 changes: 8 additions & 0 deletions doc/terminology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,11 @@ complete examples, please consult the relevant documentation.*
one, it has 0 dimensions. That means that, e.g., :py:class:`int`,
:py:class:`float`, and :py:class:`str` objects are "scalar" while
:py:class:`list` or :py:class:`tuple` are not.

duck array
`Duck arrays`__ are array implementations that behave
like numpy arrays. They have to define the ``shape``, ``dtype`` and
``ndim`` properties. For integration with ``xarray``, the ``__array__``,
``__array_ufunc__`` and ``__array_function__`` protocols are also required.

__ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ Bug fixes

Documentation
~~~~~~~~~~~~~
- document the API not supported with duck arrays (:pull:`4530`).
By `Justus Magin <https://github.com/keewis>`_.

- Update the docstring of :py:class:`DataArray` and :py:class:`Dataset`.
(:pull:`4532`);
Expand Down