-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Return the truth values of ne_missing
and eq_missing
operations for struct instead of null
#18930
fix: Return the truth values of ne_missing
and eq_missing
operations for struct instead of null
#18930
Conversation
I saw the failures, will work on a fix. |
ne_missing
and eq_missing
operations for struct instead of null
ne_missing
and eq_missing
operations for struct instead of null
@ritchie46 I found the problem - the root of the problem is not in my code: For instance, this code won't fail: import polars as pl
from polars.testing import assert_frame_equal
df1 = pl.DataFrame({
"a": [None, {"foo": 0, "bar": "1"}],
})
df2 = pl.DataFrame({
"a": [{"foo": 0, "bar": "1"}, {"foo": 0, "bar": "1"}],
})
assert_frame_equal(df1, df2) So although the code in EDIT: |
It looks like a regression bug (took code from my failed test): import polars as pl
dtype = pl.Struct({"foo": pl.Int32, "bar": pl.String})
len = 2
broadcast = (False, False, True)
mask = [False, True]
if_true = [None, {'foo': 0, 'bar': '1'}]
if_false = [{'foo': 0, 'bar': '1'}, {'foo': 0, 'bar': '1'}]
py_mask, py_true, py_false = (
[c[0]] * len if b else c
for b, c in zip(broadcast, [mask, if_true, if_false])
)
pl_mask, pl_true, pl_false = (
c.first() if b else c
for b, c in zip(broadcast, [pl.col.mask, pl.col.if_true, pl.col.if_false])
)
expected = pl.DataFrame(
{"if_true": [t if m else f for m, t, f in zip(py_mask, py_true, py_false)]},
schema={"if_true": dtype},
)
df = pl.DataFrame(
{
"mask": mask,
"if_true": if_true,
"if_false": if_false,
},
schema={"mask": pl.Boolean, "if_true": dtype, "if_false": dtype},
)
ans = df.select(pl.when(pl.col.mask).then(pl.col.if_true).otherwise(pl.col.if_false.first()))
print(expected)
print(ans) On
On
Can someone reproduce? |
To wrap all things up, this is what happend:
I plan tommorow to open an issue of the regression bug, and when it will get fixed (I hope I will have time to), come back to this PR. |
I can reproduce the test behaviour. On 1.8.1 I get On 1.8.2 I get Update: This seems to be where the behaviour changes: #18886 It seems there is a problem once you add another expression e.g. first/last/etc df = pl.DataFrame({"foo": [dict(a=1), dict(a=1)]})
df.select(pl.col.foo == pl.col.foo)
# shape: (2, 1)
# ┌──────┐
# │ foo │
# │ --- │
# │ bool │
# ╞══════╡
# │ true │
# │ true │
# └──────┘ df.select(pl.col.foo == pl.col.foo.last())
# shape: (2, 1)
# ┌───────┐
# │ foo │
# │ --- │
# │ bool │
# ╞═══════╡
# │ false │
# │ false │
# └───────┘ |
@coastalwhite can you take a look? |
…truct_missing_comparision_ops
…truct_missing_comparision_ops
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #18930 +/- ##
==========================================
- Coverage 79.89% 79.88% -0.01%
==========================================
Files 1524 1524
Lines 207655 207661 +6
Branches 2904 2904
==========================================
- Hits 165898 165896 -2
- Misses 41210 41218 +8
Partials 547 547 ☔ View full report in Codecov by Sentry. |
@@ -801,6 +804,7 @@ impl ChunkCompareEq<&StructChunked> for StructChunked { | |||
|l, r| l.equal(r).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is not what this PR is trying to solve, but we made a policy that nested equality is always eq_missing. Maybe, this should changed here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not all types support eq_missing
, so the code reaches panic
.
Maybe it is better to get fixed in separate PR?
@@ -759,6 +759,7 @@ fn struct_helper<F, R>( | |||
op: F, | |||
reduce: R, | |||
value: bool, | |||
apply_null_validity: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is named is_missing
in other helpers. Maybe that should be here as well. Not really a blocker though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will rename to is_missing
Thank you @barak1412 and thanks for the review @coastalwhite |
Failure is unrelated. |
@ritchie46 Thanks! (by the way |
Fixes #18870.