-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): auto-determine index
/columns
/values
columns in pivot
if one is left out, deprecate passing arguments positionally
#12125
Conversation
values
optional parameter to pivot
values
optional parameter to pivot
Thanks for working on this I think this is going to break too much code, especially without a deprecation period. Not totally sure about what to suggest instead though, other than keyword-only args. Maybe they could all be optional, and if you specify 2, the third is implied? |
@MarcoGorelli no problem. I've moved the |
Not sure about Why not
|
@MarcoGorelli yeah that's perfectly reasonable, just implemented. |
values
optional parameter to pivot
index
/columns
/values
columns in pivot
if one is left out
you'll need test for if less than 2 are specified, and deprecate_nonkeyword_arguments |
Hi @MarcoGorelli, Sorry for not being thorough on this, been busy. New commit adds deprecation warning and also an error test for when < 2/3 args are supplied. |
Hi @MarcoGorelli thanks for taking a close look at this and catching my stupid mistakes. @alexander-beedie I made a very minor change to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'm on-board with the suggestion - it's not actually breaking, as there would be a deprecation period, and the arguments would become keyword-only (so, nobody's code would silently produce something different)
If I read df.pivot("a", "b", "c")
, I have no idea what to expect (esp. as the order of arguments differs from pandas.DataFrame.pivot
!), so I welcome keyword-only args
And determining the third argument from the 2 specified ones seems like a good ergonomic improvement
So, I'll approve, but this is pending
- OK from at least @stinodego on the API decision
- OK from @alexander-beedie on the selectors change
index
/columns
/values
columns in pivot
if one is left outindex
/columns
/values
columns in pivot
if one is left out, make arguments keyword-only
index
/columns
/values
columns in pivot
if one is left out, make arguments keyword-onlyindex
/columns
/values
columns in pivot
if one is left out, deprecate passing arguments positionally
27fd0a3
to
2e55f74
Compare
@stinodego this one has been sitting around for a while. Are you still on-board with this one and, if so, do you have any suggested changes? |
Looks like there's not much appetite for this one, and on the pandas side I've seen complaints that keyword-only arguments here are cumbersome (pandas-dev/pandas#51359 (comment)) I'd suggest closing for now then, it can always be reconsidered in the future |
I admit I haven't really looked at this yet. First I'd have to figure out what pivot does exactly 😄 but I can come back to this soon. |
Pivot is a fancier import polars as pl
df = pl.DataFrame({
# The unique values of this column determine the output "index" column
# Think of this as the "group_by" column
"index": ["one", "one", "two", "two", "three", "three",],
# Each unique value becomes a new output column, so our output columns
# will be "x", "y", and "z"
"columns1": ["x", "x", "x", "y", "y", "y"],
"columns2": ["z", "z", "z", "z", "z", "z"],
# The values in this column must be aggregated somehow
"values": [1, 2, 3, 4, 5, 6],
})
# df:
# shape: (6, 4)
# ┌───────┬──────────┬──────────┬────────┐
# │ index ┆ columns1 ┆ columns2 ┆ values │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ str ┆ i64 │
# ╞═══════╪══════════╪══════════╪════════╡
# │ one ┆ x ┆ z ┆ 1 │
# │ one ┆ x ┆ z ┆ 2 │
# │ two ┆ x ┆ z ┆ 3 │
# │ two ┆ y ┆ z ┆ 4 │
# │ three ┆ y ┆ z ┆ 5 │
# │ three ┆ y ┆ z ┆ 6 │
# └───────┴──────────┴──────────┴────────┘
out = df.pivot(
values="values",
columns=["columns1", "columns2"],
index=["index"],
aggregate_function="sum",
)
# out:
# shape: (3, 4)
# ┌───────┬──────┬──────┬─────┐
# │ index ┆ x ┆ y ┆ z │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ i64 ┆ i64 ┆ i64 │
# ╞═══════╪══════╪══════╪═════╡
# │ one ┆ 3* ┆ null ┆ 3 │ * aggregation where index = "one" and columns1 = "x"
# │ two ┆ 3 ┆ 4* ┆ 7 │ * aggregation where index = "two" and columns1 = "y"
# │ three ┆ null ┆ 11 ┆ 11* │ * aggregation where index = "three" and columns2 = "z"
# └───────┴──────┴──────┴─────┘ FYI this just led me to find a bug, whereby |
THIS IS A BREAKING CHANGE
Resolves #12087
Since
index
andcolumns
are both required, there's no reason whyvalues
can't be optional. To do so requires movingvalues
to the 3rd argument, which is a breaking change unless we wanted to make all arguments keyword args, or have the user supplyNone
manually. It makes sense to me thatvalues
should be optional, as I think the most common usage is to use all remaining columns.