Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The result of filtering df with Expr.str.starts_with changes depending on the suffix. #20059

Closed
2 tasks done
NMZ0429 opened this issue Nov 28, 2024 · 2 comments
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@NMZ0429
Copy link

NMZ0429 commented Nov 28, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({
    "x": ["CS00621400001"]
})
print(df)

filtered_df = df.filter(pl.col("x").str.starts_with("C"))
print(filtered_df) # NOTE: no row is printed

df = pl.DataFrame({
    "x": ["CS0062140000"]
})
print(df)

filtered_df = df.filter(pl.col("x").str.starts_with("C"))
print(filtered_df) # NOTE: one row is printed

Log output

No response

Issue description

Given a dataframe with the value CS00621400001, filtering it with str.starts_with("C") does not catch the row. However, removing the trailing 1 of the value i.e., the value CS0062140000 changes the result of the filtering and the row is caught.

Expected behavior

Both CS00621400001 and CS0062140000 are caught by the filter condition str.starts_with("C") .

Installed versions

--------Version info---------
Polars:              1.15.0
Index type:          UInt32
Platform:            macOS-14.6.1-arm64-arm-64bit
Python:              3.12.7 | packaged by conda-forge | (main, Oct  4 2024, 15:57:01) [Clang 17.0.6 ]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             <not installed>
pandas               
2.2.3
pyarrow              18.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@NMZ0429 NMZ0429 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 28, 2024
@cmdlineluser
Copy link
Contributor

I believe this is fixed on main: #20003

@alexander-beedie
Copy link
Collaborator

Yup, this was fixed by #20006; I believe the 1.16 release will be coming shortly and that contains this patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants