-
Notifications
You must be signed in to change notification settings - Fork 2
Plan AsyncFuncExpr by physical planner #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @alamb, what do you think about this concept? I also wonder if it makes sense to implement it directly in DataFusion. I think it can solved apache/datafusion#6518 |
I found that presenting the async function exec as a logical plan doesn't make sense. The logical plan should describe what we need to do (executing a scalar function), not how we do it (executing the scalar function asynchronously). Even if we don't have a logical plan for it, we can still recognize it when planning federation queries. I close this issue but I'll try to port the async function exec to the datafusion core. |
Thanks @goldmedal -- sorry for my late response. I have been on vacation this past week. I think porting the async functions idea to the datafusion core would likely be useful for many others as well Thank you |
Description
There are some reasons why I want to implement in the logical plan level
Go through all the expression
When working on #4, I noticed that it's not so convenient to visit all the expressions(To check if it's an AsyncFuncExpr) of the physical plan. In the logical plan phase, we can use something like
LogicalPlan::map_expression
to go through all expressions of the plan.By the way, I considered implementing a similar tree node method for the physical plan. However, the
ExecutionPlan
isn't anENUM
. It's hard to maintain this API if anyone adds a newExecutionPlan
. 🤔Discouple with the optimization rule
In my opinion, the optimization phase isn't required for SQL execution. A physical plan should be executable even if we don't apply the optimization rule. The physical planner can ensure that a logical
AsyncExecute
is planned toAsyncFuncExec
, then apply the optimization for the batch coalesce if necessary.On the other hand, the ordering of the optimization is important. If we do the planning thing in the optimization phase, I guess it may break some optimization effects.
Keep the compatibility for the federation scenario
datafusion-federation attempts to unparse the logical plan into SQL and push it down to the external database. If we can't identify async scalar functions in the logical plan phase, we may generate incorrect SQL. Conversely, if async scalar functions can be recognized during logical planning or unparsing, we can push down only valid plans to the data source and apply the async scalar function to the results from the external database.
Consider the following case: (the pg_items is provided by an external Postgres)
It will be planned to
If we apply the concept of datafusion-federation, we can get the plan like
What I may implement
AsyncExec
AsyncExec
by the logical planner. (maybe analyzer rule)AsyncExec
to the physicalAsyncExec
. (maybe in the physical planner)If we plan the logical plan mentioned above to the physical plan:
Then, apply the optimization rule
The text was updated successfully, but these errors were encountered: