ENH(string dtype): Implement cumsum for Python-backed strings #60938

rhshadrach · 2025-02-15T23:06:45Z

Follow-up on #60633

jorisvandenbossche

Looks good, thanks!

jorisvandenbossche · 2025-02-17T08:41:16Z

pandas/core/arrays/string_.py

+                    # We can retain the running min/max by forward/backward filling.
+                    ndarray = ndarray.copy()
+                    missing.pad_or_backfill_inplace(
+                        ndarray.T,


Is the .T needed? (I would think that ndarray is 1D)

jorisvandenbossche · 2025-02-17T08:43:36Z

pandas/core/arrays/string_.py

+                # the first NA value onward.
+                idx = np.argmax(na_mask)
+                tail = np.empty(len(ndarray) - idx, dtype="object")
+                tail[:] = np.nan


Suggested change

tail[:] = np.nan

tail[:] = self.dtype.na_value

So we directly fill it with the appropriate NA value (although I assume the constructor would fix it up anyway)

jorisvandenbossche · 2025-02-17T08:45:30Z

pandas/core/arrays/string_.py

+        if tail is not None:
+            np_result = np.hstack((np_result, tail))
+        elif na_mask is not None:
+            np_result = np.where(na_mask, np.nan, np_result)


Suggested change

np_result = np.where(na_mask, np.nan, np_result)

np_result = np.where(na_mask, self.dtype.na_value, np_result)

mroeschke · 2025-02-19T02:35:49Z

Thanks @rhshadrach

lumberbot-app · 2025-02-19T02:36:08Z

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

git checkout 2.3.x
git pull

Cherry pick the first parent branch of the this PR on top of the older branch:

git cherry-pick -x -m1 4e20195086e5cdd5bde56da7d95cf672b795b32e

You will likely have some merge/cherry-pick conflict here, fix them and commit:

git commit -am 'Backport PR #60938: ENH(string dtype): Implement cumsum for Python-backed strings'

Push to a named branch:

git push YOURFORK 2.3.x:auto-backport-of-pr-60938-on-2.3.x

Create a PR against branch 2.3.x, I would have named this PR:

"Backport PR #60938 on branch 2.3.x (ENH(string dtype): Implement cumsum for Python-backed strings)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

rhshadrach added 4 commits February 15, 2025 18:06

ENH(string dtype): Implement cumsum for Python-backed strings

51b363e

cleanups

188f92e

cleanups

c04969e

type-hint fixup

7e116ef

rhshadrach requested review from jorisvandenbossche and WillAyd February 16, 2025 17:46

rhshadrach added 2 commits February 16, 2025 13:16

More type fixes

0629fcf

Use quotes for cast

d0f6673

jorisvandenbossche added this to the 2.3 milestone Feb 17, 2025

jorisvandenbossche added the Strings String extension data type and string data label Feb 17, 2025

jorisvandenbossche reviewed Feb 17, 2025

View reviewed changes

rhshadrach added 2 commits February 17, 2025 16:21

Refinements

7f9571d

type-ignore

2fd9779

WillAyd approved these changes Feb 18, 2025

View reviewed changes

Merge branch 'main' into enh_cumsum_for_np_str

85093a6

mroeschke approved these changes Feb 19, 2025

View reviewed changes

mroeschke merged commit 4e20195 into pandas-dev:main Feb 19, 2025
42 checks passed

lumberbot-app bot added the Still Needs Manual Backport label Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH(string dtype): Implement cumsum for Python-backed strings #60938

ENH(string dtype): Implement cumsum for Python-backed strings #60938

rhshadrach commented Feb 15, 2025 •

edited by jorisvandenbossche

Loading

jorisvandenbossche left a comment

jorisvandenbossche Feb 17, 2025

jorisvandenbossche Feb 17, 2025

jorisvandenbossche Feb 17, 2025

mroeschke commented Feb 19, 2025

lumberbot-app bot commented Feb 19, 2025

	np_result = np.where(na_mask, np.nan, np_result)
	np_result = np.where(na_mask, self.dtype.na_value, np_result)

ENH(string dtype): Implement cumsum for Python-backed strings #60938

ENH(string dtype): Implement cumsum for Python-backed strings #60938

Conversation

rhshadrach commented Feb 15, 2025 • edited by jorisvandenbossche Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Feb 17, 2025

Choose a reason for hiding this comment

jorisvandenbossche Feb 17, 2025

Choose a reason for hiding this comment

jorisvandenbossche Feb 17, 2025

Choose a reason for hiding this comment

mroeschke commented Feb 19, 2025

lumberbot-app bot commented Feb 19, 2025

rhshadrach commented Feb 15, 2025 •

edited by jorisvandenbossche

Loading