Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL query with WHERE clause that evaluates to true/false gives ShapeError on DataFrame with null columns #18786

Open
2 tasks done
cmadlener opened this issue Sep 17, 2024 · 0 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@cmadlener
Copy link

cmadlener commented Sep 17, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({"null_column": [None, None]})

ctx = pl.SQLContext({"df": df}, eager=True)
ctx.execute("SELECT * FROM df WHERE (true)")
# ctx.execute("SELECT * FROM df WHERE true OR false")
# ctx.execute("SELECT * FROM df WHERE false AND false")
# ctx.execute("SELECT * FROM df WHERE 'a' = 'a' and 'b' = 'b'")

Log output

Traceback (most recent call last):
  File "/home/christoph/polars-select-where-true/main.py", line 9, in <module>
    ctx.execute("SELECT * FROM df WHERE 'a' = 'a' and 'b' = 'b'")
  File "/home/christoph/polars-select-where-true/.venv/lib/python3.11/site-packages/polars/sql/context.py", line 440, in execute
    return res.collect() if (eager or self._eager_execution) else res
           ^^^^^^^^^^^^^
  File "/home/christoph/polars-select-where-true/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2032, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ShapeError: filter's length: 1 differs from that of the series: 2

Issue description

This is a follow-up to #18373. I commented there as well, however, since then I encountered what I described there in our application. I.e. there is a WHERE clause of the form 'a' = 'a' OR 'b' = 'c' I would like to use, but I get the ShapeError above.

Note, that this still only happens when there are null columns in the DataFrame.

My crude understanding of the situation is that these WHERE clauses are parsed down to essentially pl.lit(True)/pl.lit(False) by parse_sql_expr, and the same behavior can be observed in the following:

import polars as pl

df = pl.DataFrame({"null_column": [None, None]})
df.filter(pl.lit(True))

This example works for DataFrames where there are no null columns. In versions prior to 1.0, this worked fine even with null columns.

Expected behavior

I get the same DataFrame when the WHERE clause evaluates to true, and an empty DataFrame when it evaluates to false.

Installed versions

--------Version info---------
Polars:              1.7.1
Index type:          UInt32
Platform:            Linux-6.8.0-40-generic-x86_64-with-glibc2.35
Python:              3.11.3 (main, May  8 2023, 10:18:23) [GCC 11.3.0]
----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                2.1.0
openpyxl             <not installed>
pandas               <not installed>
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@cmadlener cmadlener added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant