Fix `to_pandas` writable bug for `datetime` and `timedelta` types #17913

galipremsagar · 2025-02-04T17:44:31Z

Description

This PR fixes writable flag for numpy arrays. This bug is only specific to these two types.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

mroeschke · 2025-02-04T21:37:02Z

Curious how #17743 as a bug affects subsequent operations (i.e. is this a bug we really need to worry about?)

pandas.Index is not supposed to be publicly mutable, so the underlying numpy array not being writable doesn't seem that big of a deal, especially since the fix requires making a CPU copy of the data

galipremsagar · 2025-02-04T22:50:44Z

Curious how #17743 as a bug affects subsequent operations (i.e. is this a bug we really need to worry about?)

pandas.Index is not supposed to be publicly mutable, so the underlying numpy array not being writable doesn't seem that big of a deal, especially since the fix requires making a CPU copy of the data

It's not serious. But AFAIK it is because ._data yields a DatetimeArray and that is being mutated in pandas pytests leading to failures in the test suite. If not this fix, we will need to try to address this for cudf.pandas case at least. But since users can run into this issue with cudf classic aswell I went ahead and opened a short-term fix until the arrow bug is fixed.

mroeschke · 2025-02-05T01:32:50Z

But since users can run into this issue with cudf classic

Yeah so I'm curious what subsequent issue a user would run into if a pandas.Index returned from to_pandas() if the underlying numpy array was not writable. (i.e. if no public APIs break when using this Index I'm just tempted to stay this is a won't fix to avoid the CPU copy)

galipremsagar · 2025-02-05T02:04:00Z

But since users can run into this issue with cudf classic

Yeah so I'm curious what subsequent issue a user would run into if a pandas.Index returned from to_pandas() if the underlying numpy array was not writable. (i.e. if no public APIs break when using this Index I'm just tempted to stay this is a won't fix to avoid the CPU copy)

This is the user-facing case I'm talking about:

In [1]: import pandas as pd
i 
In [2]: i = pd.Index([1, 2, 3], dtype="datetime64[ns]")

In [4]: i.to_numpy()
Out[4]: 
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

In [5]: x = i.to_numpy()

In [6]: x[0] = 1

In [7]: x
Out[7]: 
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

In [8]: x[0] = 2

In [9]: x
Out[9]: 
array(['1970-01-01T00:00:00.000000002', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

In [10]: import cudf

In [11]: i = cudf.Index([1, 2, 3], dtype="datetime64[ns]")

In [13]: x = i.to_pandas().to_numpy()

In [14]: x
Out[14]: 
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

In [15]: x[0] = 2
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[15], line 1
----> 1 x[0] = 2

ValueError: assignment destination is read-only

mroeschke · 2025-02-05T18:32:03Z

python/cudf/cudf/core/column/datetime.py

+        nullable: bool = False,
+        arrow_type: bool = False,
+    ) -> pd.Index:
+        if arrow_type and nullable:


Could we do something like

if not (arrow_type and nullable): return pd.Index(self.data_array_view(mode="read").copy_to_host()) return super().to_pandas(nullable=nullable, arrow_type=arrow_type)

instead? I think that should avoid the extra CPU copy.

It would but this doesn't support nulls:

[left]: TimedeltaIndex([NaT, NaT, NaT, NaT, NaT, NaT, NaT], dtype='timedelta64[ns]', freq=None) [right]: TimedeltaIndex(['0 days 00:00:00.000000012', '0 days 00:00:00.000000011', '0 days 00:00:00.000000002', '0 days 00:00:00.000002234', '0 days 00:00:00.000002343', '0 days 00:00:00.000023432', '0 days 00:00:00.000023234'], dtype='timedelta64[ns]', freq=None)

mroeschke · 2025-02-05T18:35:22Z

python/cudf/cudf/tests/test_datetime.py

+
+
+def test_writable_numpy_array():
+    gi = cudf.Index([1, 2, 3], dtype="datetime64[ns]")


Could we just do

gi = cudf.Index(...).to_numpy() pi = pd.Index(...).to_numpy() assert gi.flags == pi.flags

instead?

the ownable flag is not the same for both so we will not be able to this assertion.

mroeschke

Ah OK. Darn yeah I forgot about to_numpy. Left some other feedback on the implementation

galipremsagar · 2025-02-11T17:00:24Z

/merge

fix datetime and timedelta ndarray writable bug

50ebf25

galipremsagar added bug Something isn't working non-breaking Non-breaking change labels Feb 4, 2025

galipremsagar self-assigned this Feb 4, 2025

galipremsagar requested a review from a team as a code owner February 4, 2025 17:44

galipremsagar requested review from bdice, mroeschke and vyasr February 4, 2025 17:44

github-actions bot added the Python Affects Python cuDF API. label Feb 4, 2025

galipremsagar mentioned this pull request Feb 4, 2025

[BUG] DatetimeArray is not writable in cudf.pandas #17743

Closed

mroeschke reviewed Feb 5, 2025

View reviewed changes

mroeschke requested changes Feb 5, 2025

View reviewed changes

galipremsagar requested a review from mroeschke February 6, 2025 17:28

Merge branch 'branch-25.04' into 17743

f15584d

mroeschke approved these changes Feb 11, 2025

View reviewed changes

galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Feb 11, 2025

rapids-bot bot merged commit 18533b2 into rapidsai:branch-25.04 Feb 11, 2025
108 of 109 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `to_pandas` writable bug for `datetime` and `timedelta` types #17913

Fix `to_pandas` writable bug for `datetime` and `timedelta` types #17913

galipremsagar commented Feb 4, 2025

mroeschke commented Feb 4, 2025

galipremsagar commented Feb 4, 2025 •

edited

Loading

mroeschke commented Feb 5, 2025 •

edited

Loading

galipremsagar commented Feb 5, 2025

mroeschke Feb 5, 2025

galipremsagar Feb 6, 2025

mroeschke Feb 5, 2025

galipremsagar Feb 6, 2025

mroeschke left a comment •

edited

Loading

galipremsagar commented Feb 11, 2025



		def test_writable_numpy_array():
		gi = cudf.Index([1, 2, 3], dtype="datetime64[ns]")

Fix to_pandas writable bug for datetime and timedelta types #17913

Fix to_pandas writable bug for datetime and timedelta types #17913

Conversation

galipremsagar commented Feb 4, 2025

Description

Checklist

mroeschke commented Feb 4, 2025

galipremsagar commented Feb 4, 2025 • edited Loading

mroeschke commented Feb 5, 2025 • edited Loading

galipremsagar commented Feb 5, 2025

mroeschke Feb 5, 2025

Choose a reason for hiding this comment

galipremsagar Feb 6, 2025

Choose a reason for hiding this comment

mroeschke Feb 5, 2025

Choose a reason for hiding this comment

galipremsagar Feb 6, 2025

Choose a reason for hiding this comment

mroeschke left a comment • edited Loading

Choose a reason for hiding this comment

galipremsagar commented Feb 11, 2025

Fix `to_pandas` writable bug for `datetime` and `timedelta` types #17913

Fix `to_pandas` writable bug for `datetime` and `timedelta` types #17913

galipremsagar commented Feb 4, 2025 •

edited

Loading

mroeschke commented Feb 5, 2025 •

edited

Loading

mroeschke left a comment •

edited

Loading