-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop_nulls in group_by creates list now? #12030
Comments
The key point is that for group( If you add another non-none element with df = pl.DataFrame({
"id":["1","1","3","3","2","2"],
"text":['a','b', None,'e','c','d']
})
df.group_by("id").agg(
pl.col('text').drop_nulls().str.concat()
) ┌─────┬──────┐
│ id ┆ text │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪══════╡
│ 2 ┆ c-d │
│ 1 ┆ a-b │
│ 3 ┆ e │
└─────┴──────┘ |
Hmm but that's new behavior then. in v0.18.15 it always auto exploded. |
@reswqa |
Closing it since this it was to be expected due to change in |
ugh, I hope this is still open for discussion :O afaik @ritchie46 always wanted a single return type per expression and no "surprises" depending on data which makes total sense. df = pl.DataFrame(
{
"id": ["1", "1", "2"],
"text": ["a", "b", "c"], # df1
# "text": ["a", "b", None], # df2: None instead of "c"
}
)
df.group_by("id").agg(
list=pl.col("text").drop_nulls(),
concat=pl.col("text").drop_nulls().str.concat(),
)
# df1: expected behaviour
┌─────┬────────────┬────────┐
│ id ┆ list ┆ concat │
│ --- ┆ --- ┆ --- │
│ str ┆ list[str] ┆ str │
╞═════╪════════════╪════════╡
│ 1 ┆ ["a", "b"] ┆ a-b │
│ 2 ┆ ["c"] ┆ c │
└─────┴────────────┴────────┘
# df2: whhhyyy? =)
┌─────┬────────────┬───────────┐
│ id ┆ list ┆ concat │
│ --- ┆ --- ┆ --- │
│ str ┆ list[str] ┆ list[str] │ # >>>>> expect "str"
╞═════╪════════════╪═══════════╡
│ 1 ┆ ["a", "b"] ┆ ["a-b"] │ # >>>>> expect "a-b" like above
│ 2 ┆ [] ┆ [] │ # >>>>> expect null because `str.concat` on empyt list is null
└─────┴────────────┴───────────┘ I honestly think this is very dangerous for data transformations because imagine if your following transformation does some |
@JulianCologne @reswqa maybe it should also just auto explode when something is null, then you don't get unexpected results |
No, It's an empty series indeed. This is also the reason why it can not be exploded now. If you get an aggregated list like
For aggregated list like |
@JulianCologne we can close this now, right? |
I think we can. Closed as complete in #12066. |
Checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Log output
No response
Issue description
Current
.drop_nulls()
beforestr.concat
creates a list column. But it didn't this in v0.18.15.I do like that I can now remove .drop_nulls and str.concat ignores the null while concatenating, so I can get the same and improved behavour by doing this:
But I don't think it should create a list when you do drop_nulls, but I may be wrong :D
Expected behavior
Don't create a list column but give same output as if you did:
Installed versions
The text was updated successfully, but these errors were encountered: