From 2b1e1838138929097ac51ceb3e38f8b946bc3aa5 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 28 Sep 2024 10:26:24 -0600 Subject: [PATCH 1/2] Update DataFusion introduction to show that DataFusion offers packaged versions for end users --- README.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index bb8526c24e2c..7fb8f8ca99c4 100644 --- a/README.md +++ b/README.md @@ -42,14 +42,25 @@ DataFusion is an extensible query engine written in [Rust] that -uses [Apache Arrow] as its in-memory format. DataFusion's target users are +uses [Apache Arrow] as its in-memory format. + +The core DataFusion libraries in this repository are not designed to be an out-of-the-box tool for end users. The +following subprojects offer packaged versions of DataFusion. + +- [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame + queries. +- [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on + DataFusion. +- [DataFusion Ray](https://github.com/apache/datafusion-ray/) provides a distributed version of DataFusion that scales + out on Ray clusters. + +The target audience for the DataFusion crates in this repository are developers building fast and feature rich database and analytic systems, customized to particular workloads. See [use cases] for examples. -"Out of the box," DataFusion offers [SQL] and [`Dataframe`] APIs, +DataFusion offers [SQL] and [`Dataframe`] APIs, excellent [performance], built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. -[Python Bindings] are also available. DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can From f8a668f035e16936bc14a59f5140dbef034a3222 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Sat, 28 Sep 2024 10:36:15 -0600 Subject: [PATCH 2/2] change order --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 7fb8f8ca99c4..6d94c1db382b 100644 --- a/README.md +++ b/README.md @@ -44,15 +44,15 @@ DataFusion is an extensible query engine written in [Rust] that uses [Apache Arrow] as its in-memory format. -The core DataFusion libraries in this repository are not designed to be an out-of-the-box tool for end users. The -following subprojects offer packaged versions of DataFusion. +The core DataFusion libraries in this repository are not designed to be an out-of-the-box tool for end users. However, +the following subprojects offer packaged versions of DataFusion. - [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame queries. -- [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on - DataFusion. - [DataFusion Ray](https://github.com/apache/datafusion-ray/) provides a distributed version of DataFusion that scales out on Ray clusters. +- [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on + DataFusion. The target audience for the DataFusion crates in this repository are developers building fast and feature rich database and analytic systems,