From c46ee529cd3356c2dee04b8d3079251ed1fa46ad Mon Sep 17 00:00:00 2001 From: Keewis Date: Fri, 23 Oct 2020 23:16:00 +0200 Subject: [PATCH 01/20] document the missing features of duck array integration --- doc/duckarrays.rst | 36 ++++++++++++++++++++++++++++++++++++ doc/internals.rst | 3 +++ 2 files changed, 39 insertions(+) create mode 100644 doc/duckarrays.rst diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst new file mode 100644 index 00000000000..325ed1daf34 --- /dev/null +++ b/doc/duckarrays.rst @@ -0,0 +1,36 @@ +.. currentmodule:: xarray + +Duck arrays +=========== +This is a high-level overview, for the technical details of integrating duck arrays with +``xarray``, see :ref:`internals.duck_arrays`. + +Missing features +---------------- +Most of the API does support duck arrays, but there are a areas where the code +will still cast to ``numpy`` arrays: + +- dimension coordinates, and thus all indexing operations: + + * :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel` + * :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc` + * :py:meth:`Dataset.drop_sel` and :py:meth:`DataArray.drop_sel` + * :py:meth:`Dataset.reindex`, :py:meth:`Dataset.reindex_like`, + :py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in + data variables and non-dimension coordinates won't be casted + +- functions and methods that depend on external libraries or features of ``numpy`` not + covered by ``__array_function__`` / ``__array_ufunc__``: + + * :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``) + * :py:meth:`Dataset.bfill` and :py:meth:`DataArray.bfill` (uses ``bottleneck``) + * :py:meth:`Dataset.interp`, :py:meth:`Dataset.interp_like`, + :py:meth:`DataArray.interp` and :py:meth:`DataArray.interp_like` (uses ``scipy``): + duck arrays in data variables and non-dimension coordinates will be casted in + addition to not supporting duck arrays in dimension coordinates + * :py:meth:`Dataset.rolling_exp` and :py:meth:`DataArray.rolling_exp` (uses + ``numbagg``) + * :py:meth:`Dataset.rolling` and :py:meth:`DataArray.rolling` (uses internal functions + of ``numpy``) + * :py:meth:`Dataset.interpolate_na` and :py:meth:`DataArray.interpolate_na` (uses + :py:func:`numpy.vectorize`) \ No newline at end of file diff --git a/doc/internals.rst b/doc/internals.rst index aa9e1dedc68..a2dd26c3370 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -42,6 +42,9 @@ xarray objects via the (readonly) :py:attr:`Dataset.variables ` and :py:attr:`DataArray.variable ` attributes. + +.. _duck_arrays: + Duck arrays ----------- From 516b25babe11225c51496517eb83870f9830792c Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 00:13:53 +0200 Subject: [PATCH 02/20] add a list of extension libraries --- doc/duckarrays.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index 325ed1daf34..c1c4a515dc8 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -33,4 +33,12 @@ will still cast to ``numpy`` arrays: * :py:meth:`Dataset.rolling` and :py:meth:`DataArray.rolling` (uses internal functions of ``numpy``) * :py:meth:`Dataset.interpolate_na` and :py:meth:`DataArray.interpolate_na` (uses - :py:func:`numpy.vectorize`) \ No newline at end of file + :py:func:`numpy.vectorize`) + + +Extensions using duck arrays +---------------------------- +Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays +easier: + +- `pint-xarray `_ From f2edbca8b439a8c6c900f2ff7f110f098f743019 Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 00:14:13 +0200 Subject: [PATCH 03/20] some rewording --- doc/duckarrays.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index c1c4a515dc8..10df45ae015 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -3,11 +3,11 @@ Duck arrays =========== This is a high-level overview, for the technical details of integrating duck arrays with -``xarray``, see :ref:`internals.duck_arrays`. +``xarray`` see :ref:`internals.duck_arrays`. Missing features ---------------- -Most of the API does support duck arrays, but there are a areas where the code +Most of the API does support duck arrays, but there are a few areas where the code will still cast to ``numpy`` arrays: - dimension coordinates, and thus all indexing operations: From 8aa499182639d4ccdfc9fcdbdcc48db900bd2c39 Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 00:15:04 +0200 Subject: [PATCH 04/20] include in the toctree --- doc/index.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/index.rst b/doc/index.rst index e3cbb331285..ee44d0ad4d9 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -60,6 +60,7 @@ Documentation * :doc:`io` * :doc:`dask` * :doc:`plotting` +* :doc:`duckarrays` .. toctree:: :maxdepth: 1 @@ -80,6 +81,7 @@ Documentation io dask plotting + duckarrays **Help & reference** From 3c37a2d2ae90ccf452d281589699065735a8a121 Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 00:36:34 +0200 Subject: [PATCH 05/20] rename the label --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index a2dd26c3370..ef2bd46915e 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -43,7 +43,7 @@ xarray objects via the (readonly) :py:attr:`Dataset.variables :py:attr:`DataArray.variable ` attributes. -.. _duck_arrays: +.. _internals.duck_arrays: Duck arrays ----------- From 483566b537e16b2920166826a69fc2a7c47a1594 Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 00:55:36 +0200 Subject: [PATCH 06/20] change the heading --- doc/duckarrays.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index 10df45ae015..e68ec1f79bb 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -1,7 +1,7 @@ .. currentmodule:: xarray -Duck arrays -=========== +Using duck arrays with xarray +============================= This is a high-level overview, for the technical details of integrating duck arrays with ``xarray`` see :ref:`internals.duck_arrays`. From 710f8c6d67572c1be448fbfdbc4d12455f29721a Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 00:55:52 +0200 Subject: [PATCH 07/20] properly reference numpy.vectorize --- doc/duckarrays.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index e68ec1f79bb..85b840bdeff 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -33,7 +33,7 @@ will still cast to ``numpy`` arrays: * :py:meth:`Dataset.rolling` and :py:meth:`DataArray.rolling` (uses internal functions of ``numpy``) * :py:meth:`Dataset.interpolate_na` and :py:meth:`DataArray.interpolate_na` (uses - :py:func:`numpy.vectorize`) + :py:class:`numpy.vectorize`) Extensions using duck arrays From 304684b84bd09b44fef751e50354130c4a08bdfa Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 15:07:06 +0200 Subject: [PATCH 08/20] rewrite a few headings --- doc/duckarrays.rst | 4 ++-- doc/internals.rst | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index 85b840bdeff..6605e349e9c 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -1,7 +1,7 @@ .. currentmodule:: xarray -Using duck arrays with xarray -============================= +Working with duck arrays +======================== This is a high-level overview, for the technical details of integrating duck arrays with ``xarray`` see :ref:`internals.duck_arrays`. diff --git a/doc/internals.rst b/doc/internals.rst index ef2bd46915e..cc6d118ce61 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -45,8 +45,8 @@ xarray objects via the (readonly) :py:attr:`Dataset.variables .. _internals.duck_arrays: -Duck arrays ------------ +Integrating with duck arrays +---------------------------- .. warning:: From 944894018b0faad23b3eb6414179f201e557d8d7 Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 15:07:23 +0200 Subject: [PATCH 09/20] add apply_ufunc with vectorize=True to the unsupported features --- doc/duckarrays.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index 6605e349e9c..f53a4433c90 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -34,6 +34,7 @@ will still cast to ``numpy`` arrays: of ``numpy``) * :py:meth:`Dataset.interpolate_na` and :py:meth:`DataArray.interpolate_na` (uses :py:class:`numpy.vectorize`) + * :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`) Extensions using duck arrays From 21fefa42de84ecd7c90577e8bdebdd1b6be250e3 Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 24 Oct 2020 15:09:02 +0200 Subject: [PATCH 10/20] update whats-new.rst --- doc/whats-new.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/whats-new.rst b/doc/whats-new.rst index f6f087cce53..b4881949608 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -53,7 +53,8 @@ Bug fixes Documentation ~~~~~~~~~~~~~ - +- document the API not supported with duck arrays (:pull:`4530`). + By `Justus Magin `_. Internal Changes ~~~~~~~~~~~~~~~~ From 7cd5160ec0abd71bd9736921c36a39a10b9e89f5 Mon Sep 17 00:00:00 2001 From: Keewis Date: Mon, 26 Oct 2020 12:25:35 +0100 Subject: [PATCH 11/20] move the definition of a duck array to the terminology page --- doc/duckarrays.rst | 4 ++-- doc/internals.rst | 6 ++---- doc/terminology.rst | 8 ++++++++ 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index f53a4433c90..7d7810eca45 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -7,8 +7,8 @@ This is a high-level overview, for the technical details of integrating duck arr Missing features ---------------- -Most of the API does support duck arrays, but there are a few areas where the code -will still cast to ``numpy`` arrays: +Most of the API does support :term:`duck array` objects, but there are a few areas where +the code will still cast to ``numpy`` arrays: - dimension coordinates, and thus all indexing operations: diff --git a/doc/internals.rst b/doc/internals.rst index cc6d118ce61..9de5e5acd1a 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -52,14 +52,14 @@ Integrating with duck arrays This is a experimental feature. -xarray can wrap custom `duck array`_ objects as long as they define numpy's +xarray can wrap custom :term:`duck array` objects as long as they define numpy's ``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``, ``__array_ufunc__`` and ``__array_function__`` methods. In certain situations (e.g. when printing the collapsed preview of variables of a ``Dataset``), xarray will display the repr of a `duck array`_ in a single line, truncating it to a certain number of characters. If that -would drop too much information, the `duck array`_ may define a +would drop too much information, the :term:`duck array` may define a ``_repr_inline_`` method that takes ``max_width`` (number of characters) as an argument: @@ -74,8 +74,6 @@ argument: ... -.. _duck array: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html - Extending xarray ---------------- diff --git a/doc/terminology.rst b/doc/terminology.rst index a85837bafbc..df1c8dae85c 100644 --- a/doc/terminology.rst +++ b/doc/terminology.rst @@ -104,3 +104,11 @@ complete examples, please consult the relevant documentation.* one, it has 0 dimensions. That means that, e.g., :py:class:`int`, :py:class:`float`, and :py:class:`str` objects are "scalar" while :py:class:`list` or :py:class:`tuple` are not. + + duck array + `Duck arrays `_ are array implementations that behave + like numpy arrays. They have to define the ``shape``, ``dtype`` and + ``ndim`` properties. For integration with ``xarray``, the ``__array__``, + ``__array_ufunc__`` and ``__array_function__`` protocols are also required. + + .. _duck array: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html From e09f507f54d8dfce8affba297885e04af0c98447 Mon Sep 17 00:00:00 2001 From: Keewis Date: Mon, 26 Oct 2020 12:31:08 +0100 Subject: [PATCH 12/20] use a less technical heading and rewrite the introduction --- doc/duckarrays.rst | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index 7d7810eca45..99e7b40971d 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -1,9 +1,17 @@ .. currentmodule:: xarray -Working with duck arrays -======================== -This is a high-level overview, for the technical details of integrating duck arrays with -``xarray`` see :ref:`internals.duck_arrays`. +Working with numpy-like arrays +============================== + +.. warning:: + + This is a experimental feature, please report any bugs you might find. + +Numpy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with +additional features, like propagating physical units or a different layout in memory. + +:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as +long as they satisfy certain conditions (see :ref:`internals.duck_arrays`). Missing features ---------------- From 63e07f9f9c83740b8dd8116357457c9791fa990e Mon Sep 17 00:00:00 2001 From: Keewis Date: Mon, 26 Oct 2020 13:25:59 +0100 Subject: [PATCH 13/20] fix a broken link --- doc/internals.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/internals.rst b/doc/internals.rst index 9de5e5acd1a..b1678f00bdd 100644 --- a/doc/internals.rst +++ b/doc/internals.rst @@ -57,7 +57,7 @@ xarray can wrap custom :term:`duck array` objects as long as they define numpy's ``__array_ufunc__`` and ``__array_function__`` methods. In certain situations (e.g. when printing the collapsed preview of -variables of a ``Dataset``), xarray will display the repr of a `duck array`_ +variables of a ``Dataset``), xarray will display the repr of a :term:`duck array` in a single line, truncating it to a certain number of characters. If that would drop too much information, the :term:`duck array` may define a ``_repr_inline_`` method that takes ``max_width`` (number of characters) as an From f04012e028e86a9f43fd23a3dd336922fffdb288 Mon Sep 17 00:00:00 2001 From: Keewis Date: Mon, 26 Oct 2020 14:39:04 +0100 Subject: [PATCH 14/20] reword the warning --- doc/duckarrays.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index 99e7b40971d..bd8c6beeaab 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -5,7 +5,8 @@ Working with numpy-like arrays .. warning:: - This is a experimental feature, please report any bugs you might find. + This feature should be considered experimental. Please report any bug you may find on + xarray’s github repository. Numpy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with additional features, like propagating physical units or a different layout in memory. From 74e243b7128e777773d6b909b4aee48509fac788 Mon Sep 17 00:00:00 2001 From: Keewis Date: Mon, 26 Oct 2020 22:23:41 +0100 Subject: [PATCH 15/20] mention that dask is handled differently --- doc/duckarrays.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index bd8c6beeaab..ed21f521f1e 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -14,6 +14,11 @@ additional features, like propagating physical units or a different layout in me :py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as long as they satisfy certain conditions (see :ref:`internals.duck_arrays`). +.. note:: + + For ``dask`` support see :ref:`dask`. + + Missing features ---------------- Most of the API does support :term:`duck array` objects, but there are a few areas where From 6af4344162ad355086e31609cf4bb250ae2990cd Mon Sep 17 00:00:00 2001 From: Keewis Date: Thu, 29 Oct 2020 16:23:07 +0100 Subject: [PATCH 16/20] also note that chunk does not working with some duck arrays i.e. those which, like pint, are higher in the type hierarchy than dask. --- doc/duckarrays.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index ed21f521f1e..ff0eb5c221a 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -50,6 +50,12 @@ the code will still cast to ``numpy`` arrays: :py:class:`numpy.vectorize`) * :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`) +- incompatibilities between different :term:`duck array` libraries: + + * :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was + not already chunked and the :term:`duck array` should wrap the new ``dask`` array; + changing the chunk sizes works. + Extensions using duck arrays ---------------------------- From c346f96f3afcc04cce92960f790b25461a86cec9 Mon Sep 17 00:00:00 2001 From: Keewis Date: Sat, 31 Oct 2020 02:01:24 +0100 Subject: [PATCH 17/20] add pint as an example for duck arrays for which chunk fails --- doc/duckarrays.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst index ff0eb5c221a..ba13d5160ae 100644 --- a/doc/duckarrays.rst +++ b/doc/duckarrays.rst @@ -53,8 +53,8 @@ the code will still cast to ``numpy`` arrays: - incompatibilities between different :term:`duck array` libraries: * :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was - not already chunked and the :term:`duck array` should wrap the new ``dask`` array; - changing the chunk sizes works. + not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should + wrap the new ``dask`` array; changing the chunk sizes works. Extensions using duck arrays From 261a4a447008d3158f2b7d8ee9d3637280f2b0e7 Mon Sep 17 00:00:00 2001 From: Keewis Date: Thu, 19 Nov 2020 12:10:54 +0100 Subject: [PATCH 18/20] rename a link label --- doc/terminology.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/terminology.rst b/doc/terminology.rst index df1c8dae85c..79669e3e4a6 100644 --- a/doc/terminology.rst +++ b/doc/terminology.rst @@ -106,9 +106,9 @@ complete examples, please consult the relevant documentation.* :py:class:`list` or :py:class:`tuple` are not. duck array - `Duck arrays `_ are array implementations that behave + `Duck arrays `_ are array implementations that behave like numpy arrays. They have to define the ``shape``, ``dtype`` and ``ndim`` properties. For integration with ``xarray``, the ``__array__``, ``__array_ufunc__`` and ``__array_function__`` protocols are also required. - .. _duck array: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html + .. _duck-array: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html From d8b20e485844ef4aa65125607dcb270cd97a3665 Mon Sep 17 00:00:00 2001 From: Keewis Date: Thu, 19 Nov 2020 14:14:31 +0100 Subject: [PATCH 19/20] remove the indirection --- doc/terminology.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/terminology.rst b/doc/terminology.rst index 79669e3e4a6..39b0af15990 100644 --- a/doc/terminology.rst +++ b/doc/terminology.rst @@ -106,9 +106,10 @@ complete examples, please consult the relevant documentation.* :py:class:`list` or :py:class:`tuple` are not. duck array - `Duck arrays `_ are array implementations that behave + `Duck arrays`_ are array implementations that behave like numpy arrays. They have to define the ``shape``, ``dtype`` and ``ndim`` properties. For integration with ``xarray``, the ``__array__``, ``__array_ufunc__`` and ``__array_function__`` protocols are also required. - .. _duck-array: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html + +.. _Duck arrays: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html From 56228986b6fb65eaf42bce334047e0417acc3691 Mon Sep 17 00:00:00 2001 From: Keewis Date: Thu, 19 Nov 2020 14:34:20 +0100 Subject: [PATCH 20/20] use the double underscore syntax instead --- doc/terminology.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/doc/terminology.rst b/doc/terminology.rst index 39b0af15990..3cfc211593f 100644 --- a/doc/terminology.rst +++ b/doc/terminology.rst @@ -106,10 +106,9 @@ complete examples, please consult the relevant documentation.* :py:class:`list` or :py:class:`tuple` are not. duck array - `Duck arrays`_ are array implementations that behave + `Duck arrays`__ are array implementations that behave like numpy arrays. They have to define the ``shape``, ``dtype`` and ``ndim`` properties. For integration with ``xarray``, the ``__array__``, ``__array_ufunc__`` and ``__array_function__`` protocols are also required. - -.. _Duck arrays: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html + __ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html