-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust): get_dtype handles input node schema and CSE #16582
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16582 +/- ##
========================================
Coverage 81.49% 81.49%
========================================
Files 1414 1414
Lines 185561 185714 +153
Branches 2997 3008 +11
========================================
+ Hits 151219 151345 +126
- Misses 33826 33853 +27
Partials 516 516 ☔ View full report in Codecov by Sentry. |
py-polars/src/lazyframe/visit.rs
Outdated
let schema = ir_node.schema(&lp_arena); | ||
let schema = match ir_node { | ||
// Select operates on the input schema | ||
IR::Select { .. } => ir_node.input_schema(&lp_arena).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything operates on it's input schema? Shouldn't we take input_schema
for all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I think you are right. I confused myself thinking through various scenarios.
Specifically, join had me worried because the left (right) keys have dtypes that depend on the schema context of the left (right) input, but the correct way to treat that is to ensure that the relevant input is "active" when asking for the dtypes of the relevant set of keys.
Pushed an update.
71728ec
to
001628f
Compare
Both Select and HStack need to augment the schema for get_dtype with the dtypes of the CSE expressions, and everyone should look at their input to determine the schema in which the expressions are evaluated.
Also plumb through Reduce to provide that information to the outside world.