-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support extension array indexes #9671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution
…ocessing, raise now early
…t resolution, fix code and tests to allow this
for more information, see https://pre-commit.ci
…-resolution
… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)
…rray` series creating an extension array when `.array` is accessed
@dcherian Could you give a bit of background into the changes you pushed? I'm not really following.
Did I do something wrong in the PR without knowing it i.e., bypassing the tests? It would be great to understand! |
No you didn't do anything wrong per-se.
|
* main: Fix sparse dask repr test (pydata#10200) Apply ruff preview rule RUF046 (pydata#10199) DOC: Remove mention of netcdf pypi package (pydata#10197)
@@ -102,7 +102,7 @@ def replace_duck_with_extension_array(args) -> list: | |||
return type(self)[type(res)](res) | |||
return res | |||
|
|||
def __array_ufunc__(ufunc, method, *inputs, **kwargs): | |||
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test failure in test_units
has to do with this implementation for the equals ufunc.
@ilan-gold can you take a look please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature is correct https://numpy.org/devdocs/user/basics.subclassing.html
I have seen that error before but I can't remember what it means now, maybe a segfault...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that, but we have self
included for __array_ufunc__
inside Xarray 🤷🏾♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok! hopefully my fix helps, but it was hard to make heads or tails of what changed because the diff was super dirty (there was a merge from main
or two). please open a PR into mine in the future, would make things easier :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the missing self
in the declared signature is a documentation bug: just a couple of lines later in the example the self
is included (which makes sense because __array_ufunc__
is supposed to be a method)
xarray/core/formatting.py
Outdated
try: | ||
nbytes_str = f" {render_human_readable_nbytes(variable.nbytes)}" | ||
except TypeError: | ||
nbytes_str = " ?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit ugly, but we'd need to define nbytes
on pandas wrappers, and it breaks many reprs in our tests & doctests :/
@@ -1902,6 +1926,10 @@ def copy(self, deep: bool = True) -> Self: | |||
array = self.array.copy(deep=True) if deep else self.array | |||
return type(self)(array, self._dtype) | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change breaks a lot of tests. I would do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have something for this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xarray/core/variable.py
Outdated
@@ -192,8 +192,6 @@ def _maybe_wrap_data(data): | |||
if isinstance(data, pd.Index): | |||
return PandasIndexingAdapter(data) | |||
if isinstance(data, pd.api.extensions.ExtensionArray): | |||
if isinstance(data.dtype, pd.Int64Dtype | pd.Float64Dtype | pd.StringDtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This casting was intentional. We'd like these dtypes to be converted to numpy arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I had lost track of who did this or not, I thought I checked the blame but must not have. Could you explain a bit why so I can add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its mostly backwards compatibility, and we aren't sure how to handle the possibility of both numpy and pandas handling very similar dtypes. So we are opting to not change behaviour
I think the failing test is coming from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless we want to special-case it in .data
or anywhere else that gives access to the underlying array I don't think there's a way around exposing the extension array wrapper (I think the special-casing would make the code a bit more complicated, though). I believe that's fine, though, since that would just be a special kind of array that is compatible with the array API.
@@ -102,7 +102,7 @@ def replace_duck_with_extension_array(args) -> list: | |||
return type(self)[type(res)](res) | |||
return res | |||
|
|||
def __array_ufunc__(ufunc, method, *inputs, **kwargs): | |||
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the missing self
in the declared signature is a documentation bug: just a couple of lines later in the example the self
is included (which makes sense because __array_ufunc__
is supposed to be a method)
…y into ig/fix_extension_indexer
It duck types but is not array-api compatible. That being said I would not be opposed to removing the visible wrapper from |
@keewis Excellent suggestion: ilan-gold#1. It seems to be crazy straightforward |
Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....
whats-new.rst
api.rst