Description
What is the problem the feature request solves?
Hey, we're looking to use DataFusion to replace some Spark workflows, but in a somewhat different way than Comet - we convert Spark logical plans into Substrait and that into DataFusion. Our goal is still similar to Comet: to get equivalent-but-faster execution compared to Spark.
We've noticed some differences in the Spark and DataFusion native expressions, which you guys have already addressed in Comet by adding custom expressions that match closer to Spark's, and so we'd like to be able to reuse the expressions from Comet where possible.
While that works today in theory, in practice it's a bit difficult given dependency issues between Comet's version of DataFusion and the version we use (just later commit on datafusion main atm, but still), and also pulling all of Comet for just the expressions is a bit unnecessary. So my question is - would you be open to separating the expressions into e.g. datafusion-comet-exprs
crate, which could have a reduced set of dependencies and so be easier to reuse downstream? I'd be happy to write a PR if you'd find it acceptable.
I might need to also change some functions to be exported from that crate to reuse them, as Comet operates mainly on the physical expr level but for our use it's easier to have the expressions as ScalarUDFs, if that's okay.
I tested already through a fork that reusing the Comet expressions is possible, and it does give better compatibility with Spark :)
Describe the potential solution
Rename core
into datafusion-comet/core
Move core/src/execution/datafusion/expressions
into datafusion-comet/expr
(probably some more changes to fix cross-dependencies, maybe introduce datafusion-comet/common
to share stuff if needed)
Open for other suggestions as well!
Additional context
Related: apache/datafusion#11201