-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API (string): str.center with pyarrow-backed string dtype #59624
Conversation
I won't argue for ArrowDtype if there is a preference to not do object fallback there, but for StringDtype, we definitely want to fallback to object dtype, I think (as I proposed in #54807 (comment)). Otherwise this gives a breaking change for people with older pyarrow, and inconsistent behaviour depending on the pyarrow version you have installed. While we can very easily avoid this breaking change, so then we should just do that IMO |
Using the
The first explanation that comes to mind is that I misunderstood something and should be passing
|
ee6561b
to
ac3c969
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Added a few small comments
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -277,6 +277,7 @@ Other API changes | |||
- Made ``dtype`` a required argument in :meth:`ExtensionArray._from_sequence_of_strings` (:issue:`56519`) | |||
- Passing a :class:`Series` input to :func:`json_normalize` will now retain the :class:`Series` :class:`Index`, previously output had a new :class:`RangeIndex` (:issue:`51452`) | |||
- Removed :meth:`Index.sort` which always raised a ``TypeError``. This attribute is not defined and will raise an ``AttributeError`` (:issue:`59283`) | |||
- The ``center`` method on :class:`Series` and :class:`Index` object ``str`` accessors with pyarrow-backed dtype now matches the python behavior in corner cases with an odd number of fill characters when using pyarrow versions 17.0 and above (:issue:`54792`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to 2.3.0 file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
from pandas import array | ||
|
||
obj_arr = self.astype(object, copy=False) # type: ignore[attr-defined] | ||
obj = array(obj_arr, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also do obj_arr.array._str_pad(..)
and then don't need this explicit construction?
obj_arr = self.astype(object, copy=False) # type: ignore[attr-defined] | ||
obj = array(obj_arr, dtype=object) | ||
result = obj._str_pad(width, side, fillchar) # type: ignore[attr-defined] | ||
return type(self)._from_sequence(result, dtype=self.dtype) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how significant this is, but one small performance disadvantage of this instead of type(self)(pa.array(result))
is that _from_sequence
has an (in this case) unnecessary call to lib.ensure_string_array
.
The problem is that we don't know here which pyarrow type to use? But then could do something like pa.array(result, type=self.dtype.pyarrow_dtype)
to preserve the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not inclined to worry about this too much since its in a branch that will be going away eventually
ac3c969
to
1f2902d
Compare
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Two bad options for older-pyarrow: different behavior or object fallback. I guessed that @mroeschke would prefer different behavior here, would be happy to be proved wrong.