Skip to content

Commit

Permalink
Update DataFusion introduction to show that DataFusion offers package…
Browse files Browse the repository at this point in the history
…d versions for end users
  • Loading branch information
andygrove committed Sep 28, 2024
1 parent f87db21 commit 2b1e183
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,25 @@
</a>

DataFusion is an extensible query engine written in [Rust] that
uses [Apache Arrow] as its in-memory format. DataFusion's target users are
uses [Apache Arrow] as its in-memory format.

The core DataFusion libraries in this repository are not designed to be an out-of-the-box tool for end users. The
following subprojects offer packaged versions of DataFusion.

- [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame
queries.
- [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on
DataFusion.
- [DataFusion Ray](https://github.com/apache/datafusion-ray/) provides a distributed version of DataFusion that scales
out on Ray clusters.

The target audience for the DataFusion crates in this repository are
developers building fast and feature rich database and analytic systems,
customized to particular workloads. See [use cases] for examples.

"Out of the box," DataFusion offers [SQL] and [`Dataframe`] APIs,
DataFusion offers [SQL] and [`Dataframe`] APIs,
excellent [performance], built-in support for CSV, Parquet, JSON, and Avro,
extensive customization, and a great community.
[Python Bindings] are also available.

DataFusion features a full query planner, a columnar, streaming, multi-threaded,
vectorized execution engine, and partitioned data sources. You can
Expand Down

0 comments on commit 2b1e183

Please sign in to comment.