feat(python): auto-determine `index`/`columns`/`values` columns in `pivot` if one is left out, deprecate passing arguments positionally #12125

mcrumiller · 2023-10-31T02:07:19Z

THIS IS A BREAKING CHANGE

Resolves #12087

Since index and columns are both required, there's no reason why values can't be optional. To do so requires moving values to the 3rd argument, which is a breaking change unless we wanted to make all arguments keyword args, or have the user supply None manually. It makes sense to me that values should be optional, as I think the most common usage is to use all remaining columns.

MarcoGorelli · 2023-11-05T19:39:15Z

Thanks for working on this

I think this is going to break too much code, especially without a deprecation period. Not totally sure about what to suggest instead though, other than keyword-only args. Maybe they could all be optional, and if you specify 2, the third is implied?

mcrumiller · 2023-11-05T20:49:58Z

@MarcoGorelli no problem. I've moved the values parameter back to the front and made it required, but if the user passes None, all available columns are used.

MarcoGorelli · 2023-11-12T11:17:51Z

Not sure about values=None having to be passed explicitly

Why not

keyword-only args. Maybe they could all be optional, and if you specify 2, the third is implied?

mcrumiller · 2023-11-12T19:58:05Z

@MarcoGorelli yeah that's perfectly reasonable, just implemented.

MarcoGorelli · 2023-11-13T15:40:18Z

you'll need test for if less than 2 are specified, and deprecate_nonkeyword_arguments

mcrumiller · 2023-11-13T18:51:46Z

Hi @MarcoGorelli,

Sorry for not being thorough on this, been busy. New commit adds deprecation warning and also an error test for when < 2/3 args are supplied.

py-polars/polars/dataframe/frame.py

mcrumiller · 2023-11-14T14:08:24Z

Hi @MarcoGorelli thanks for taking a close look at this and catching my stupid mistakes.

@alexander-beedie I made a very minor change to _expand_selectors where Nones are simply ignored instead of being added as elements to the list output. This doesn't cause any problems in any tests but I wanted your input on this since that's your domain.

MarcoGorelli

Personally, I'm on-board with the suggestion - it's not actually breaking, as there would be a deprecation period, and the arguments would become keyword-only (so, nobody's code would silently produce something different)

If I read df.pivot("a", "b", "c"), I have no idea what to expect (esp. as the order of arguments differs from pandas.DataFrame.pivot!), so I welcome keyword-only args

And determining the third argument from the 2 specified ones seems like a good ergonomic improvement

So, I'll approve, but this is pending

OK from at least @stinodego on the API decision
OK from @alexander-beedie on the selectors change

py-polars/tests/unit/operations/test_pivot.py

mcrumiller · 2024-01-15T14:09:34Z

@stinodego this one has been sitting around for a while. Are you still on-board with this one and, if so, do you have any suggested changes?

MarcoGorelli · 2024-01-25T13:58:47Z

Looks like there's not much appetite for this one, and on the pandas side I've seen complaints that keyword-only arguments here are cumbersome (pandas-dev/pandas#51359 (comment))

I'd suggest closing for now then, it can always be reconsidered in the future

stinodego · 2024-01-25T14:02:04Z

I admit I haven't really looked at this yet. First I'd have to figure out what pivot does exactly 😄 but I can come back to this soon.

mcrumiller · 2024-01-25T15:57:11Z

First I'd have to figure out what pivot does exactly 😄

Pivot is a fancier unstack operation with aggregation, if that helps. I think the documentation could be improved with better column names to align directly with the keywords to show what's happening. Here is my improved example that I think lends clarity :

import polars as pl

df = pl.DataFrame({
    # The unique values of this column determine the output "index" column
    # Think of this as the "group_by" column
    "index": ["one", "one", "two", "two", "three", "three",],

    # Each unique value becomes a new output column, so our output columns
    # will be "x", "y", and "z"
    "columns1": ["x", "x", "x", "y", "y", "y"],
    "columns2": ["z", "z", "z", "z", "z", "z"],

    # The values in this column must be aggregated somehow
    "values": [1, 2, 3, 4, 5, 6],
})

# df:
# shape: (6, 4)
# ┌───────┬──────────┬──────────┬────────┐
# │ index ┆ columns1 ┆ columns2 ┆ values │
# │ ---   ┆ ---      ┆ ---      ┆ ---    │
# │ str   ┆ str      ┆ str      ┆ i64    │
# ╞═══════╪══════════╪══════════╪════════╡
# │ one   ┆ x        ┆ z        ┆ 1      │
# │ one   ┆ x        ┆ z        ┆ 2      │
# │ two   ┆ x        ┆ z        ┆ 3      │
# │ two   ┆ y        ┆ z        ┆ 4      │
# │ three ┆ y        ┆ z        ┆ 5      │
# │ three ┆ y        ┆ z        ┆ 6      │
# └───────┴──────────┴──────────┴────────┘

out = df.pivot(
    values="values",
    columns=["columns1", "columns2"],
    index=["index"],
    aggregate_function="sum",
)

# out:
# shape: (3, 4)
# ┌───────┬──────┬──────┬─────┐
# │ index ┆ x    ┆ y    ┆ z   │
# │ ---   ┆ ---  ┆ ---  ┆ --- │
# │ str   ┆ i64  ┆ i64  ┆ i64 │
# ╞═══════╪══════╪══════╪═════╡
# │ one   ┆ 3*   ┆ null ┆ 3   │   * aggregation where index = "one" and columns1 = "x"
# │ two   ┆ 3    ┆ 4*   ┆ 7   │   * aggregation where index = "two" and columns1 = "y"
# │ three ┆ null ┆ 11   ┆ 11* │   * aggregation where index = "three" and columns2 = "z"
# └───────┴──────┴──────┴─────┘

FYI this just led me to find a bug, whereby pivot can great a dataframe with duplicate column names (#13994).

Make values optional

fc55cd4

mcrumiller requested review from ritchie46, stinodego, alexander-beedie and MarcoGorelli as code owners October 31, 2023 02:07

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Oct 31, 2023

stinodego changed the title ~~feat(python): make values optional parameter to pivot~~ feat(python)!: make values optional parameter to pivot Oct 31, 2023

github-actions bot added the breaking Change that breaks backwards compatibility label Oct 31, 2023

Update as_dict positions as per pola-rs#12131

3c7d950

mcrumiller added 2 commits November 5, 2023 15:22

Move values back to 1st arg, make required

defe957

Merge branch 'main' into pivot-no-values

d56f537

Fix arg order to missed test

5e72f99

mcrumiller added 2 commits November 12, 2023 14:26

Merge branch 'main' into pivot-no-values

ebd26b5

Make args keyword and only require two of three

016bc4e

mcrumiller changed the title ~~feat(python)!: make values optional parameter to pivot~~ feat(python)!: auto-determine index/columns/values columns in pivot if one is left out Nov 12, 2023

Add nonkeyword deprecation and neg unit tests

3151323

MarcoGorelli reviewed Nov 14, 2023

View reviewed changes

py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved

Simplify 3-arg check

d8da733

MarcoGorelli approved these changes Dec 4, 2023

View reviewed changes

MarcoGorelli reviewed Dec 4, 2023

View reviewed changes

py-polars/tests/unit/operations/test_pivot.py Outdated Show resolved Hide resolved

MarcoGorelli changed the title ~~feat(python)!: auto-determine index/columns/values columns in pivot if one is left out~~ feat(python): auto-determine index/columns/values columns in pivot if one is left out, make arguments keyword-only Dec 4, 2023

Revert old test parameter order

2e55f74

mcrumiller force-pushed the pivot-no-values branch from 27fd0a3 to 2e55f74 Compare December 4, 2023 17:52

mcrumiller requested a review from c-peters as a code owner January 15, 2024 18:48

mcrumiller closed this Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): auto-determine `index`/`columns`/`values` columns in `pivot` if one is left out, deprecate passing arguments positionally #12125

feat(python): auto-determine `index`/`columns`/`values` columns in `pivot` if one is left out, deprecate passing arguments positionally #12125

mcrumiller commented Oct 31, 2023 •

edited by stinodego

Loading

MarcoGorelli commented Nov 5, 2023

mcrumiller commented Nov 5, 2023

MarcoGorelli commented Nov 12, 2023

mcrumiller commented Nov 12, 2023

MarcoGorelli commented Nov 13, 2023

mcrumiller commented Nov 13, 2023

mcrumiller commented Nov 14, 2023

MarcoGorelli left a comment •

edited

Loading

mcrumiller commented Jan 15, 2024

MarcoGorelli commented Jan 25, 2024 •

edited

Loading

stinodego commented Jan 25, 2024

mcrumiller commented Jan 25, 2024 •

edited

Loading

feat(python): auto-determine index/columns/values columns in pivot if one is left out, deprecate passing arguments positionally #12125

feat(python): auto-determine index/columns/values columns in pivot if one is left out, deprecate passing arguments positionally #12125

Conversation

mcrumiller commented Oct 31, 2023 • edited by stinodego Loading

MarcoGorelli commented Nov 5, 2023

mcrumiller commented Nov 5, 2023

MarcoGorelli commented Nov 12, 2023

mcrumiller commented Nov 12, 2023

MarcoGorelli commented Nov 13, 2023

mcrumiller commented Nov 13, 2023

mcrumiller commented Nov 14, 2023

MarcoGorelli left a comment • edited Loading

Choose a reason for hiding this comment

mcrumiller commented Jan 15, 2024

MarcoGorelli commented Jan 25, 2024 • edited Loading

stinodego commented Jan 25, 2024

mcrumiller commented Jan 25, 2024 • edited Loading

feat(python): auto-determine `index`/`columns`/`values` columns in `pivot` if one is left out, deprecate passing arguments positionally #12125

feat(python): auto-determine `index`/`columns`/`values` columns in `pivot` if one is left out, deprecate passing arguments positionally #12125

mcrumiller commented Oct 31, 2023 •

edited by stinodego

Loading

MarcoGorelli left a comment •

edited

Loading

MarcoGorelli commented Jan 25, 2024 •

edited

Loading

mcrumiller commented Jan 25, 2024 •

edited

Loading