From 3664121d55884ca39b44a2ae1135c7250e6e133d Mon Sep 17 00:00:00 2001 From: the0ninjas <100471039+the0ninjas@users.noreply.github.com> Date: Mon, 7 Apr 2025 18:07:47 +1000 Subject: [PATCH 1/2] Updated extending operators documentation --- .../library-user-guide/extending-operators.md | 39 ++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/docs/source/library-user-guide/extending-operators.md b/docs/source/library-user-guide/extending-operators.md index 631bdc67975a..7649b27f87d9 100644 --- a/docs/source/library-user-guide/extending-operators.md +++ b/docs/source/library-user-guide/extending-operators.md @@ -19,4 +19,41 @@ # Extending DataFusion's operators: custom LogicalPlan and Execution Plans -Coming soon +DataFusion supports extension of operators by transforming logical plan and execution plan through customized [optimizer rules](https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html). This section will use the µWheel project to illustrate such capabilities. + +## About DataFusion µWheel + +[DataFusion µWheel](https://github.com/uwheel/datafusion-uwheel/tree/main) is a native DataFusion optimizer which improves query performance for time-based analytics through fast temporal aggregation and pruning using custom indices. The integration of µWheel into DataFusion is a joint effort with the DataFusion community. + +### Optimizing Logical Plan + +The `rewrite` function transforms logical plans by identifying temporal patterns and aggregation functions that match the stored wheel indices. When match is found, it queries the corresponding index to retrieve pre-computed aggregate values, stores these results in a [MemTable](https://docs.rs/datafusion/latest/datafusion/datasource/memory/struct.MemTable.html), and returns as a new `LogicalPlan::TableScan`. If no match is found, the original plan proceeds unchanged through DataFusion's standard execution path. + +```rust +fn rewrite( + &self, + plan: LogicalPlan, + _config: &dyn OptimizerConfig, +) -> Result> { + // Attemps to rewrite a logical plan to a uwheel-based plan that either provides + // plan-time aggregates or skips execution based on min/max pruning. + if let Some(rewritten) = self.try_rewrite(&plan) { + Ok(Transformed::yes(rewritten)) + } else { + Ok(Transformed::no(plan)) + } +} +``` + +```rust +// Converts a uwheel aggregate result to a TableScan with a MemTable as source +fn agg_to_table_scan(result: f64, schema: SchemaRef) -> Result { + let data = Float64Array::from(vec![result]); + let record_batch = RecordBatch::try_new(schema.clone(), vec![Arc::new(data)])?; + let df_schema = Arc::new(DFSchema::try_from(schema.clone())?); + let mem_table = MemTable::try_new(schema, vec![vec![record_batch]])?; + mem_table_as_table_scan(mem_table, df_schema) +} +``` + +To get a deeper dive into the usage of the µWheel project, visit the [blog post](https://uwheel.rs/post/datafusion_uwheel/) by Max Meldrum. From cbcd17229ca03fa699703d8c3ae87af751190876 Mon Sep 17 00:00:00 2001 From: the0ninjas <100471039+the0ninjas@users.noreply.github.com> Date: Sun, 27 Apr 2025 21:52:26 +1000 Subject: [PATCH 2/2] commented out Rust code to pass doc test --- docs/source/library-user-guide/extending-operators.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/library-user-guide/extending-operators.md b/docs/source/library-user-guide/extending-operators.md index 7649b27f87d9..3d491806a4e6 100644 --- a/docs/source/library-user-guide/extending-operators.md +++ b/docs/source/library-user-guide/extending-operators.md @@ -29,7 +29,7 @@ DataFusion supports extension of operators by transforming logical plan and exec The `rewrite` function transforms logical plans by identifying temporal patterns and aggregation functions that match the stored wheel indices. When match is found, it queries the corresponding index to retrieve pre-computed aggregate values, stores these results in a [MemTable](https://docs.rs/datafusion/latest/datafusion/datasource/memory/struct.MemTable.html), and returns as a new `LogicalPlan::TableScan`. If no match is found, the original plan proceeds unchanged through DataFusion's standard execution path. -```rust +```rust,ignore fn rewrite( &self, plan: LogicalPlan, @@ -45,7 +45,7 @@ fn rewrite( } ``` -```rust +```rust,ignore // Converts a uwheel aggregate result to a TableScan with a MemTable as source fn agg_to_table_scan(result: f64, schema: SchemaRef) -> Result { let data = Float64Array::from(vec![result]);