-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 #10181
Comments
Example from @ion-elgreco @alamb this is the code:
Which comes from here: https://github.com/delta-io/delta-rs/blob/main/crates%2Fcore%2Fsrc%2Foperations%2Fdelete.rs#L770-L774 |
I see a few options:
I would be happy to work on 2 if someone could point me at how people are creating Physical exprs today |
I think I'd be happy with 2. The example you linked is how we are using datafusion. Here is an updated example that fails with the error:
|
@alamb this is how we create expressions: /// Parse a string predicate into an `Expr`
pub(crate) fn parse_predicate_expression(
schema: &DFSchema,
expr: impl AsRef<str>,
df_state: &SessionState,
) -> DeltaResult<Expr> {
let dialect = &GenericDialect {};
let mut tokenizer = Tokenizer::new(dialect, expr.as_ref());
let tokens = tokenizer
.tokenize()
.map_err(|err| DeltaTableError::GenericError {
source: Box::new(err),
})?;
let sql = Parser::new(dialect)
.with_tokens(tokens)
.parse_expr()
.map_err(|err| DeltaTableError::GenericError {
source: Box::new(err),
})?;
let context_provider = DeltaContextProvider { state: df_state };
let sql_to_rel =
SqlToRel::new_with_options(&context_provider, DeltaParserOptions::default().into());
Ok(sql_to_rel.sql_to_expr(sql, schema, &mut Default::default())?)
} |
I'll work on creating an example shortly |
This also leads to a sort of "variant problem" for any code we write that handles It would be nicer I think if there was a single canonical way to represent a nested field access in |
Here is an example of how to make Expr::struct work in 37.1.0: #10183 I think we need a better API to do this for real (in 38.0.0 and going forward). I will think about this -- maybe @jayzhan211 has some thoughts |
I think this can be controlled by the consumer -- for example if you are walking
I think the idea is people might want to override |
I agree (and thanks for the example). This works for us. All of our expr from the user start as
What's the motivation for doing this at the logical level instead of doing this as part of the conversion from logical to physical? |
I don't think there is any particular motivation (or any reason that the conversion needs to be done at either spot) 🤔 |
I think, for me, it's just a general unease with having multiple ways of expressing the same thing. I feel like this can lead to "implicit layers" of the plan. For example, there is already some notion of "parse plan", "unoptimized logical plan" and "optimized logical plan", and "physical plan". The middle two are both represented by Another way to tackle it could be to leave the concept of a That being said, my needs are met (thanks again for your help), and perfect is the enemy of the good, so I'm happy to leave well enough alone. |
I had also thought about deprecating
One good reason is that we don't need to care about physical expr if we converted it to functions in logical level. I think the early we optimize, the less duplicated things we leave. if we move this rewrite to
If we have functions after parsing, they can rewrite functions to their expected one through register their own rewrite rules, so I think it is also not a problem
"rewritten / simplified logical plan" is actually "optimized logical plan" to me, "optimized" is more general term.
I think we are talking about better API design of |
I agree this would be clearest (basically remove
I was actually thinking about an API for the usecase of "I created an Currently there is an example of how to do this in datafusion/datafusion-examples/examples/expr_api.rs Lines 251 to 252 in 711239f
Maybe it is time to "promote" that function to a real API in
Those are interesting functions -- now that we have the notion of function packages, adding better support for the Map datatype would be sweet. |
I am thinking I'll try and make a PR with such an API over the next day or two to see how it might look |
@alamb just checking in, is there something we need to refactor? Or are you simplifying the API and automatically handling this within datafusion on all code paths? |
I suggest for the time being you use the pattern here (is this possible)?
For the next release (38.0.0) I was thinking would move the Thoughts? |
I am not entirely sure, in delta-rs we just parse SQL strings into logical exprs and then convert it to physical exprs without doing any adjustments there. I think it makes sense that these expr conversion are automatically done in core. Because logically between v36, 37 nothing changed in the intent of the operation. That the physical impl is different should be abstracted. |
I think in 37.0.0 you'll need to update the code that converts to a physical expr to call the code in #10183
Yes I agree. I will make a PR over the next few days with a proposed API |
Here is a PR with a proposed new API: #10330 |
NOTE -- Here is an example of how to make
Expr::NamedStructField
work in 37.1.0: #10183Is your feature request related to a problem or challenge?
In 37.0.0 many of the built in functions have been migrated to
UDF
s as described on #8045 . The migration is completed in 38.0.0One part of this change is that now certain
Expr
s must be rewritten into the appropriate functions. Most notablyget_field
that extracts a field from aStruct
Among other things this allows people to customize how Expr behaves: #7845 (comment) or in slack to return NULLs for rows that don't pass in maps
The rewrite happens automatically as part of the logical planner (in the Analyzer pass)
However if you bypass those passes it will not happen
Yeah you need to use the FunctionRewriter here (with the relevant rewriter registered) https://github.com/apache/arrow-datafusion/blob/0573f78c7e7a4d94c3204cee464b3860479e0afb/datafusion/optimizer/src/analyzer/function_rewrite.rs#L33
Example
An example from discord: link is:
This returns an error "NamedStructField should be rewritten in OperatorToFunction"
Describe the solution you'd like
No response
Describe alternatives you've considered
One potential workaround is to call
get_field
directly rather thanExpr::field
So instead of
call like
Additional context
@ion-elgreco is seeing the same issue in Delta-rs: #9904 (comment)
@westonpace brings it up in discord link
Another report in discord: link
The text was updated successfully, but these errors were encountered: