feat: add `Series|Expr.rank` #1342

FBruzzesi · 2024-11-09T20:41:41Z

What type of PR is this? (check all applicable)

Related issues

Closes [Enh]: Support polars.Expr.rank #1323

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below.

This PR provides initial support for rank method. I will start it as a draft due to a bunch of shortcomings:

pandas:
- there is a (nasty) trick to make it work with nulls/nans and nullable dtypes (related: to BUG: rank does not respect na_option='keep' for numpy nullable integer dtypes, which should be fixed in pandas 3) - I will comment the trick in the code
- support in over context is available
- support in group_by context is not available.
pyarrow:
- does not support polars default method (namely, "average"), therefore if rank is called without specifying another method, it will end up raising an error
- does not implement rank in a group by context
- I am using pyarrow.compute.rank which is available but not exposed/documented (?)
dask: it just does not support ranking
polars:
- group by context always returns an aggregate, in this case a list of ranks - which is fairly useless as it is a list of increasing/descreasing range until the size of the group

FBruzzesi · 2024-11-09T20:45:33Z

narwhals/_pandas_like/series.py

+            # crazy workaround for the case of `na_option="keep"` and nullable
+            # integer dtypes. This should be supported in pandas > 3.0
+            # https://github.com/pandas-dev/pandas/issues/56976


Here is the workaround.

@MarcoGorelli I was not able to properly use the pandas like util function get_dtype_backend to figure out the nullable backend. It should not really matter as the non-nullable backend would not result in integer type if the series contains nulls anyway

tests/expr_and_series/rank_test.py

FBruzzesi · 2024-11-09T20:56:41Z

Hey @adamblake, this is an initial implementation to support rank. In the description I tried to explain all the shortcomings and the challenges I am facing.

FBruzzesi · 2024-12-26T19:28:02Z

I will need to come back to the failing tests: it seems that rank for pandas version 2.0.x with pyarrow backend fails for integer dtype but it works fine for floats

MarcoGorelli · 2024-12-29T17:54:11Z

Nice, thanks for sticking with this!

Does it risk being annoying if the default ('average') isn't available in several backends?

Could we not set any default, so the user is 'forced' to specify a method (like we do in quantile)?

FBruzzesi · 2024-12-29T19:38:04Z

Nice, thanks for sticking with this!

Of course! Adding the possibility to using it in over context is a great addition for the use case reported in the issue in the first place!

Does it risk being annoying if the default ('average') isn't available in several backends?

For now this is just pyarrow, right?

Could we not set any default, so the user is 'forced' to specify a method (like we do in quantile)?

I don't have a strong opinion on this, I am fine to give more responsibility to the user

MarcoGorelli · 2024-12-29T19:53:11Z

For now this is just pyarrow, right?

you're right sorry, i thought it would be the same for pyarrow-backed pandas. thinking ahead, I think duckdb doesn't support it either

…hals into feat/expr-rank

MarcoGorelli

thanks @FBruzzesi !

FBruzzesi added 6 commits November 6, 2024 16:01

WIP

1e0d4ae

WIP

ebf4321

WIPWIP

e60214d

merge main

ea13f0c

pandas int workaround

cbc13b5

comma?

8b492d5

github-actions bot added the enhancement New feature or request label Nov 9, 2024

FBruzzesi commented Nov 9, 2024

View reviewed changes

tests/expr_and_series/rank_test.py Outdated Show resolved Hide resolved

FBruzzesi changed the title ~~feat: ass Series|Expr.rank~~ feat: add Series|Expr.rank Nov 10, 2024

FBruzzesi added 8 commits November 10, 2024 10:49

Merge branch 'main' into feat/expr-rank

cafed4b

merge main, test invalid method

4c8cc1b

old pyarrow

ec0f8a7

merge main

95e0bc5

WIP

e8989e3

merge main

bb458f8

support in over context

165507a

fail pandas_pyarrow for pandas < (2,1)

07571c8

FBruzzesi marked this pull request as ready for review December 26, 2024 19:26

FBruzzesi added 3 commits December 27, 2024 08:54

xfail int only

96520ae

fix options in over

6ad961e

forgot a file

5c565a4

FBruzzesi requested a review from MarcoGorelli December 27, 2024 08:03

Merge branch 'main' into feat/expr-rank

b798255

Merge branch 'main' into feat/expr-rank

a83d8cb

FBruzzesi and others added 6 commits December 30, 2024 16:29

merge main and better return docstring

1b550e4

Merge branch 'feat/expr-rank' of https://github.com/narwhals-dev/narw…

4bbc5a0

…hals into feat/expr-rank

docstrings

54c4a81

float(nan) -> None

b68f575

Merge branch 'main' into feat/expr-rank

585d0d6

test eager only for rank

6c72df7

MarcoGorelli approved these changes Jan 7, 2025

View reviewed changes

FBruzzesi merged commit 5c0a33a into main Jan 7, 2025
24 checks passed

FBruzzesi deleted the feat/expr-rank branch January 7, 2025 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `Series|Expr.rank` #1342

feat: add `Series|Expr.rank` #1342

FBruzzesi commented Nov 9, 2024 •

edited

Loading

FBruzzesi Nov 9, 2024

FBruzzesi commented Nov 9, 2024

FBruzzesi commented Dec 26, 2024

MarcoGorelli commented Dec 29, 2024

FBruzzesi commented Dec 29, 2024

MarcoGorelli commented Dec 29, 2024

MarcoGorelli left a comment

feat: add Series|Expr.rank #1342

feat: add Series|Expr.rank #1342

Conversation

FBruzzesi commented Nov 9, 2024 • edited Loading

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

FBruzzesi Nov 9, 2024

Choose a reason for hiding this comment

FBruzzesi commented Nov 9, 2024

FBruzzesi commented Dec 26, 2024

MarcoGorelli commented Dec 29, 2024

FBruzzesi commented Dec 29, 2024

MarcoGorelli commented Dec 29, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

feat: add `Series|Expr.rank` #1342

feat: add `Series|Expr.rank` #1342

FBruzzesi commented Nov 9, 2024 •

edited

Loading