-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to_pandas
writable bug for datetime
and timedelta
types
#17913
Conversation
Curious how #17743 as a bug affects subsequent operations (i.e. is this a bug we really need to worry about?)
|
It's not serious. But AFAIK it is because |
Yeah so I'm curious what subsequent issue a user would run into if a |
This is the user-facing case I'm talking about: In [1]: import pandas as pd
i
In [2]: i = pd.Index([1, 2, 3], dtype="datetime64[ns]")
In [4]: i.to_numpy()
Out[4]:
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
'1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')
In [5]: x = i.to_numpy()
In [6]: x[0] = 1
In [7]: x
Out[7]:
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
'1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')
In [8]: x[0] = 2
In [9]: x
Out[9]:
array(['1970-01-01T00:00:00.000000002', '1970-01-01T00:00:00.000000002',
'1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')
In [10]: import cudf
In [11]: i = cudf.Index([1, 2, 3], dtype="datetime64[ns]")
In [13]: x = i.to_pandas().to_numpy()
In [14]: x
Out[14]:
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
'1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')
In [15]: x[0] = 2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[15], line 1
----> 1 x[0] = 2
ValueError: assignment destination is read-only |
nullable: bool = False, | ||
arrow_type: bool = False, | ||
) -> pd.Index: | ||
if arrow_type and nullable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do something like
if not (arrow_type and nullable):
return pd.Index(self.data_array_view(mode="read").copy_to_host())
return super().to_pandas(nullable=nullable, arrow_type=arrow_type)
instead? I think that should avoid the extra CPU copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would but this doesn't support nulls:
[left]: TimedeltaIndex([NaT, NaT, NaT, NaT, NaT, NaT, NaT], dtype='timedelta64[ns]', freq=None)
[right]: TimedeltaIndex(['0 days 00:00:00.000000012', '0 days 00:00:00.000000011',
'0 days 00:00:00.000000002', '0 days 00:00:00.000002234',
'0 days 00:00:00.000002343', '0 days 00:00:00.000023432',
'0 days 00:00:00.000023234'],
dtype='timedelta64[ns]', freq=None)
|
||
|
||
def test_writable_numpy_array(): | ||
gi = cudf.Index([1, 2, 3], dtype="datetime64[ns]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just do
gi = cudf.Index(...).to_numpy()
pi = pd.Index(...).to_numpy()
assert gi.flags == pi.flags
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ownable
flag is not the same for both so we will not be able to this assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. Darn yeah I forgot about to_numpy
. Left some other feedback on the implementation
/merge |
18533b2
into
rapidsai:branch-25.04
Description
Fixes: #17743
This PR fixes writable flag for numpy arrays. This bug is only specific to these two types.
Checklist