Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement string, boolean and binary dtype in top_k #15488

Merged
merged 13 commits into from
Apr 8, 2024

Conversation

CanglongCl
Copy link
Contributor

This allows more dtype to use top_k which only supported numeric dtypes before.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Apr 5, 2024
Copy link

codecov bot commented Apr 5, 2024

Codecov Report

Attention: Patch coverage is 56.83453% with 60 lines in your changes are missing coverage. Please review.

Project coverage is 81.13%. Comparing base (31df06d) to head (0bc66cc).
Report is 14 commits behind head on main.

Files Patch % Lines
crates/polars-ops/src/chunked_array/top_k.rs 64.22% 44 Missing ⚠️
crates/polars-arrow/src/array/boolean/mutable.rs 0.00% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15488      +/-   ##
==========================================
- Coverage   81.14%   81.13%   -0.01%     
==========================================
  Files        1362     1362              
  Lines      174951   175115     +164     
  Branches     2533     2532       -1     
==========================================
+ Hits       141960   142083     +123     
- Misses      32508    32551      +43     
+ Partials      483      481       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, but this is not a valid solution. The whole point of top_k is that it is more efficient than sort followed by head.

If we want to support more datatypes in top_k, we should implement the algorithm for those datatypes. Booleans can have a dedicated fast implementation because we know there are only two possible values. For strings, I don't know the correct approach.

@CanglongCl CanglongCl changed the title feat(python, rust): use Series.sort in top_k feat(python, rust): Implement string, boolean and binary dtype in top_k Apr 5, 2024
@CanglongCl
Copy link
Contributor Author

Thanks for the PR, but this is not a valid solution. The whole point of top_k is that it is more efficient than sort followed by head.

If we want to support more datatypes in top_k, we should implement the algorithm for those datatypes. Booleans can have a dedicated fast implementation because we know there are only two possible values. For strings, I don't know the correct approach.

Thanks for review. I've rewrite the pr. binary and string now using the same algorithm with numeric types.

crates/polars-ops/src/chunked_array/top_k.rs Outdated Show resolved Hide resolved
crates/polars-ops/src/chunked_array/top_k.rs Outdated Show resolved Hide resolved
crates/polars-ops/src/chunked_array/top_k.rs Outdated Show resolved Hide resolved
@CanglongCl CanglongCl requested a review from ritchie46 April 6, 2024 22:25
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Only one remark on the string path.

crates/polars-ops/src/chunked_array/top_k.rs Outdated Show resolved Hide resolved
crates/polars-ops/src/chunked_array/top_k.rs Show resolved Hide resolved
@CanglongCl CanglongCl requested a review from ritchie46 April 8, 2024 00:24
@ritchie46 ritchie46 changed the title feat(python, rust): Implement string, boolean and binary dtype in top_k feat: Implement string, boolean and binary dtype in top_k Apr 8, 2024
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Going in. Thank you!

@ritchie46 ritchie46 merged commit 9c73063 into pola-rs:main Apr 8, 2024
24 checks passed
@CanglongCl CanglongCl deleted the top_k_bool branch April 12, 2024 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants