-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Series|Expr.rank
#1342
Conversation
# crazy workaround for the case of `na_option="keep"` and nullable | ||
# integer dtypes. This should be supported in pandas > 3.0 | ||
# https://github.com/pandas-dev/pandas/issues/56976 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the workaround.
@MarcoGorelli I was not able to properly use the pandas like util function get_dtype_backend
to figure out the nullable backend. It should not really matter as the non-nullable backend would not result in integer type if the series contains nulls anyway
Hey @adamblake, this is an initial implementation to support |
I will need to come back to the failing tests: it seems that rank for pandas version 2.0.x with pyarrow backend fails for integer dtype but it works fine for floats |
Nice, thanks for sticking with this! Does it risk being annoying if the default ('average') isn't available in several backends? Could we not set any default, so the user is 'forced' to specify a method (like we do in |
Of course! Adding the possibility to using it in
For now this is just pyarrow, right?
I don't have a strong opinion on this, I am fine to give more responsibility to the user |
you're right sorry, i thought it would be the same for pyarrow-backed pandas. thinking ahead, I think duckdb doesn't support it either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @FBruzzesi !
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below.
This PR provides initial support for rank method. I will start it as a draft due to a bunch of shortcomings:
over
context is availablegroup_by
context is not available.pyarrow.compute.rank
which is available but not exposed/documented (?)