Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.when(...).otherwise(...) fails for predicate involving struct fields #19304

Closed
2 tasks done
nsfinkelstein opened this issue Oct 18, 2024 · 2 comments
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@nsfinkelstein
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
df = pl.DataFrame({'json': [{'a': 1}, {'a': None}]})
condition = pl.all_horizontal(pl.col('json').struct.field(f).is_null() for f in df['json'].struct.fields)
fixed_df = df.with_columns(pl.when(condition).then(None).otherwise(pl.col('json')))

Log output

shape: (2, 2)
┌───────────┬───────────┐
│ json      ┆ literal   │
│ ---       ┆ ---       │
│ struct[1] ┆ struct[1] │
╞═══════════╪═══════════╡
│ {1}       ┆ null      │
│ {null}    ┆ {null}    │
└───────────┴───────────┘

Issue description

The correct behavior can be triggered by switching out otherwise for a then with an explicit negation of the condition, which should be equivalent but is not.

import polars as pl
df = pl.DataFrame({'json': [{'a': 1}, {'a': None}]})
condition = pl.all_horizontal(pl.col('json').struct.field(f).is_null() for f in df['json'].struct.fields)
fixed_df = df.with_columns(pl.when(condition).then(None).when(~condition).then(pl.col('json')))

This yields the correct output.

shape: (2, 2)
┌───────────┬───────────┐
│ json      ┆ literal   │
│ ---       ┆ ---       │
│ struct[1] ┆ struct[1] │
╞═══════════╪═══════════╡
│ {1}       ┆ {1}       │
│ {null}    ┆ null      │
└───────────┴───────────┘

Expected behavior

The output dataframe is the correct dataframe shown above.

Installed versions

--------Version info---------
Polars:              1.9.0
Index type:          UInt32
Platform:            Linux-5.10.219-208.866.amzn2.x86_64-x86_64-with-glibc2.35
Python:              3.11.9 (main, Sep 23 2024, 17:32:51) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.9.0
gevent               <not installed>
great_tables         0.13.0
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           2.0.35
torch                2.3.1+cu121
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@nsfinkelstein nsfinkelstein added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 18, 2024
@cmdlineluser
Copy link
Contributor

I believe this is fixed on main. #19148

>>> fixed_df
shape: (2, 2)
┌───────────┬───────────┐
│ json      ┆ literal   │
│ ---       ┆ ---       │
│ struct[1] ┆ struct[1] │
╞═══════════╪═══════════╡
│ {1}       ┆ {1}       │
│ {null}    ┆ null      │
└───────────┴───────────┘

@ritchie46
Copy link
Member

I will close as this is fixed on main. Will release this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants