Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python) Series.index_of() and equivalent expression - Request for architectural review, take 2 #19408

Closed
wants to merge 11 commits into from

Conversation

itamarst
Copy link
Contributor

Goal is to fix a decent chunk of #5503 when finished. At this point it is more in the "am I in the right direction?" stage, though it does work for the implemented cases!

  1. Is this more what you were thinking architecturally @ritchie46?
  2. I assume this should all be behind a cargo feature like the other ops?
  3. Is there a way to turn a Column into an AnyValue? Passing in a length-1 Series as the value to search by makes me a little sad. Although I guess that can grow into supporting the functionality search_sorted() has where it can search for multiple values in a single pass, so actually maybe better to leave it as is.

Out of scope for this pass:

  • List, Array, and Struct dtypes.
  • Possibly other dtypes if they turn out to be difficult.
  • Searching for multiple values at once.

input: vec![self, element],
function: FunctionExpr::IndexOf,
options: FunctionOptions {
// TODO which ApplyOptions, if any?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific choice that makes sense here? I don't yet understand the enum options' meanings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wildcard expansion should be set to 2..

Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 2.22222% with 88 lines in your changes missing coverage. Please review.

Project coverage is 80.00%. Comparing base (20ead46) to head (3738dff).
Report is 224 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-ops/src/series/ops/index_of.rs 0.00% 41 Missing ⚠️
crates/polars-plan/src/dsl/mod.rs 0.00% 16 Missing ⚠️
...ates/polars-plan/src/dsl/function_expr/index_of.rs 0.00% 13 Missing ⚠️
crates/polars-python/src/series/scatter.rs 0.00% 5 Missing ⚠️
py-polars/polars/series/series.py 20.00% 4 Missing ⚠️
crates/polars-plan/src/dsl/function_expr/mod.rs 0.00% 3 Missing ⚠️
crates/polars-python/src/expr/general.rs 0.00% 3 Missing ⚠️
py-polars/polars/expr/expr.py 33.33% 2 Missing ⚠️
crates/polars-plan/src/dsl/function_expr/schema.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19408      +/-   ##
==========================================
- Coverage   80.03%   80.00%   -0.03%     
==========================================
  Files        1528     1534       +6     
  Lines      209657   210840    +1183     
  Branches     2416     2443      +27     
==========================================
+ Hits       167791   168675     +884     
- Misses      41317    41610     +293     
- Partials      549      555       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

Searching for multiple values at once.

That start's to feel like a join.

crates/polars-plan/src/dsl/mod.rs Show resolved Hide resolved
input: vec![self, element],
function: FunctionExpr::IndexOf,
options: FunctionOptions {
// TODO which ApplyOptions, if any?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wildcard expansion should be set to 2..

crates/polars-ops/src/series/ops/index_of.rs Show resolved Hide resolved
T: PolarsNumericType,
{
fn index_of(&'a self, value: Option<T::Native>) -> Option<usize> {
// A NaN is never equal to anything, including itself. But we still want
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be implemented over the chunks and not using an iterator over ChunkedArray's as they are really slow.

In the case we are looking for a None we can also fast path to first null.

@itamarst itamarst closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants