Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.when produces wrong result, depending on the values in .then(..). Used to work in 1.5.0 #19122

Closed
2 tasks done
Elvynzs opened this issue Oct 7, 2024 · 5 comments · Fixed by #19148
Closed
2 tasks done
Assignees
Labels
accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars

Comments

@Elvynzs
Copy link

Elvynzs commented Oct 7, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import numpy as np
import polars as pl

df = pl.Series(np.random.randn(100,2)).arr.to_struct(fields=["real","imag"])
df = pl.DataFrame(df, ["values"])

#WORKS AS EXPECTED : 
out_df = df.select(pl.when(pl.col("values").struct.field("real").is_nan()).then(1).otherwise(pl.col("values")).alias("values")) #Output is equal to input
assert out_df.equals(df)

#WRONG OUTPUT :
out_df = df.select(pl.when(pl.col("values").struct.field("real").is_nan()).then(None).otherwise(pl.col("values")).alias("values")) #Output is full of nulls
assert out_df.equals(df) #Fails

Log output

N/A

Issue description

Initially, I was converting numpy complex data to a polars format by using a struct with 2 fields (real, imag).
When I upgraded from polars 1.5.0 to 1.9.0, I noticed a regression in my unittests (see above for reproducible exemple)

When checking if any of the field contain nans (in order to replace with a null), I instead get an output full of null, except of only nulls where there are nans in my input data.

This behavior seems to depend on the content in the .then(<values>) part (!), as can be seen in my example code.

Expected behavior

The assert should not fail in my reproducible example.

The output of pl.when(A).then(B).otherwise(C) should not depend on B if A is always False.

Installed versions

--------Version info---------
Polars:              1.9.0
Index type:          UInt32
Platform:            Windows-10-10.0.22000-SP0
Python:              3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:40:50) [MSC v.1937 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               5.3.0
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.8.4
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             3.1.2
pandas               2.2.2
pyarrow              15.0.2
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                2.4.0
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@Elvynzs Elvynzs added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 7, 2024
@Elvynzs Elvynzs changed the title pl.when(...) produces wrong result, depending on the values in .then(..). Used to work in 1.5.0 pl.when produces wrong result, depending on the values in .then(..). Used to work in 1.5.0 Oct 7, 2024
@orlp orlp added accepted Ready for implementation P-high Priority: high and removed needs triage Awaiting prioritization by a maintainer labels Oct 7, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Oct 7, 2024
@barak1412
Copy link
Contributor

The regression started at 1.8.2.

@coastalwhite
Copy link
Collaborator

Yeah, it is a regression started by #18886. I did a bisect.

@Elvynzs
Copy link
Author

Elvynzs commented Oct 7, 2024

Yeah, it is a regression started by #18886. I did a bisect.

Looking at the comments, it seems my issue is maybe a duplicate of #19122 ?

@orlp
Copy link
Collaborator

orlp commented Oct 7, 2024

@Elvynzs You linked this issue. An issue is not a duplicate of itself.

@Elvynzs
Copy link
Author

Elvynzs commented Oct 8, 2024

@Elvynzs You linked this issue. An issue is not a duplicate of itself.

Oops my bad, I must have been tired, I did not notice it was my own issue 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants