Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False values from CategoricalSeries.unique() #19409

Closed
2 tasks done
s-banach opened this issue Oct 23, 2024 · 3 comments · Fixed by #19417
Closed
2 tasks done

False values from CategoricalSeries.unique() #19409

s-banach opened this issue Oct 23, 2024 · 3 comments · Fixed by #19417
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@s-banach
Copy link
Contributor

s-banach commented Oct 23, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame({"x": [str(n) for n in range(50)]}).cast(pl.Categorical)
print(df.unique().null_count())

shape: (1, 1)
┌─────┐
│ x   │
│ --- │
│ u32 │
╞═════╡
│ 2   │
└─────┘

Log output

No response

Issue description

Notice that nulls appear in a series without nulls, after taking unique().
Sometimes python quits without printing anything, depending on the categorical series I test with.
Bug does not seem to be present in version 1.10

Expected behavior

Don't add nulls, don't crash.

Installed versions

--------Version info---------
Polars:              1.11.0
Index type:          UInt32
Platform:            Windows-11-10.0.22631-SP0
Python:              3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:42:31) [MSC v.1937 64 bit (AMD64)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            0.10.4
fsspec               2024.6.1
gevent               <not installed>
great_tables         0.9.0
matplotlib           3.8.4
nest_asyncio         1.6.0
numpy                2.1.0
openpyxl             3.1.5
pandas               2.2.2
pyarrow              17.0.0
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           2.0.35
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           3.2.0
@s-banach s-banach added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 23, 2024
@cmdlineluser
Copy link
Contributor

cmdlineluser commented Oct 23, 2024

I haven't been able to get a crash, but do get a null_count() of 2 each time.

Just some debugging notes:

It seems to have changed after #19359

It also seems to go away if I set POLARS_MAX_THREADS=1

@ritchie46
Copy link
Member

@orlp can you take a look?

@orlp
Copy link
Collaborator

orlp commented Oct 24, 2024

I can reproduce, but only with POLARS_MAX_THREADS=2.

@c-peters c-peters added the accepted Ready for implementation label Oct 28, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Oct 28, 2024
@c-peters c-peters moved this from Ready to Done in Backlog Oct 28, 2024
@s-banach s-banach changed the title Crashes and false values from CategoricalSeries.unique() False values from CategoricalSeries.unique() Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants