fix(python): DataFrame `rows_by_key` returning key tuples with elements in wrong order #19486

lukemanley · 2024-10-27T23:44:45Z

Fixes the following and simplifies the implementation a bit.

In [1]: import polars as pl

In [2]: df = pl.DataFrame(
   ...:    ...:     {
   ...:    ...:         "a": ["a", "a"],
   ...:    ...:         "b": ["b", "b"],
   ...:    ...:         "c": [1, 2],
   ...:    ...:     }
   ...:    ...: )

In [3]: df.rows_by_key(["a", "b"])
Out[3]: defaultdict(list, {('a', 'b'): [1, 2]})   # <-- key tuple ordered as (a, b) as requested

In [4]: df.rows_by_key(["b", "a"])
Out[4]: defaultdict(list, {('a', 'b'): [1, 2]})   # <-- key tuple ordered as (a, b) rather than (b, a) as requested

codecov · 2024-10-28T00:05:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.55%. Comparing base (5f11dd9) to head (c378400).
Report is 32 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main   #19486       +/-   ##
===========================================
+ Coverage   59.87%   79.55%   +19.68%     
===========================================
  Files        1545     1545               
  Lines      213433   213405       -28     
  Branches     2442     2429       -13     
===========================================
+ Hits       127791   169782    +41991     
+ Misses      85092    43075    -42017     
+ Partials      550      548        -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alexander-beedie · 2024-10-28T15:50:56Z

Thanks; I'll take a look at this one later tonight (or tomorrow).
Looks much simplified indeed 👍

lukemanley · 2024-11-04T23:58:19Z

@alexander-beedie - sorry for the ping, just checking in on this.

alexander-beedie · 2024-11-18T12:38:57Z

Apologies, got distracted and let this slip through the cracks - taking a look now.

alexander-beedie

Tested various parameter permutations on a large-scale frame, and 5 of 7 combinations were faster with this new approach. Really like the simplification you made here.

I might take a look later to see if we can get the two permutations that were marginally slower (~15%) back to the same speed, but as the other 5 parameter combinations were about the same amount faster, this PR looks like a clear win to me (in addition to the correctness fix). Great job 👌

fix DataFrame.rows_by_key returning keys in wrong order

8aaff53

lukemanley requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli and reswqa as code owners October 27, 2024 23:44

github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Oct 27, 2024

lukemanley added 2 commits October 28, 2024 05:57

mypy

016196a

formatting

2124a28

alexander-beedie self-assigned this Oct 28, 2024

Merge remote-tracking branch 'upstream/main' into fix-rows-by-key

0d9a5e6

lukemanley added 3 commits November 7, 2024 19:59

Merge remote-tracking branch 'upstream/main' into fix-rows-by-key

c591da0

Merge remote-tracking branch 'upstream/main' into fix-rows-by-key

ab405cd

Merge remote-tracking branch 'upstream/main' into fix-rows-by-key

c378400

alexander-beedie approved these changes Nov 18, 2024

View reviewed changes

alexander-beedie merged commit f893f75 into pola-rs:main Nov 18, 2024
14 checks passed

alexander-beedie changed the title ~~fix: DataFrame.rows_by_key returning key tuples with elements in wrong order~~ fix(python): DataFrame rows_by_key returning key tuples with elements in wrong order Nov 18, 2024

alexander-beedie removed the rust Related to Rust Polars label Nov 18, 2024

cmdlineluser mentioned this pull request Nov 26, 2024

Add new option in rows_by_key to return scalar values #19994

Open

lukemanley deleted the fix-rows-by-key branch December 6, 2024 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(python): DataFrame `rows_by_key` returning key tuples with elements in wrong order #19486

fix(python): DataFrame `rows_by_key` returning key tuples with elements in wrong order #19486

lukemanley commented Oct 27, 2024 •

edited

Loading

codecov bot commented Oct 28, 2024 •

edited

Loading

alexander-beedie commented Oct 28, 2024

lukemanley commented Nov 4, 2024

alexander-beedie commented Nov 18, 2024 •

edited

Loading

alexander-beedie left a comment •

edited

Loading

fix(python): DataFrame rows_by_key returning key tuples with elements in wrong order #19486

fix(python): DataFrame rows_by_key returning key tuples with elements in wrong order #19486

Conversation

lukemanley commented Oct 27, 2024 • edited Loading

codecov bot commented Oct 28, 2024 • edited Loading

Codecov Report

alexander-beedie commented Oct 28, 2024

lukemanley commented Nov 4, 2024

alexander-beedie commented Nov 18, 2024 • edited Loading

alexander-beedie left a comment • edited Loading

Choose a reason for hiding this comment

fix(python): DataFrame `rows_by_key` returning key tuples with elements in wrong order #19486

fix(python): DataFrame `rows_by_key` returning key tuples with elements in wrong order #19486

lukemanley commented Oct 27, 2024 •

edited

Loading

codecov bot commented Oct 28, 2024 •

edited

Loading

alexander-beedie commented Nov 18, 2024 •

edited

Loading

alexander-beedie left a comment •

edited

Loading