FuseQuery is a Cloud Distributed SQL Query Engine at scale.
Cloud-Native and Distributed ClickHouse from scratch in Rust.
Give thanks to ClickHouse and Arrow.
-
High Performance
- Everything is Parallelism
-
High Scalability
- Everything is Distributed
-
High Reliability
- True Separation of Storage and Compute
Crate | Description | Status |
---|---|---|
distributed | Distributed scheduler and executor for planner | WIP |
optimizers | Optimizer for Distributed&Local plan | WIP |
datablocks | Vectorized data processing unit | WIP |
datastreams | Async streaming iterators | WIP |
datasources | Interface to the datasource(system.numbers for performance/Fuse-Store) | WIP |
executors | Executor(EXPLAIN/SELECT) for the Pipeline | WIP |
functions | Scalar and Aggregation Functions | WIP |
processors | Dataflow Streaming Processor | WIP |
planners | Distributed&Local planners for building processor pipelines | WIP |
servers | Server handler(MySQL/HTTP) | MySQL |
transforms | Data Stream Transform(Source/Filter/Projection/AggregatorPartial/AggregatorFinal/Limit) | WIP |
- Projection
- Filter
- Limit
- Aggregate
- Functions
- Filter Push-Down
- Projection Push-Down (TODO)
- Distributed Query (WIP)
- Sorting (TODO)
- Joins (TODO)
- SubQueries (TODO)
- Memory SIMD-Vector processing performance only
- Dataset: 10,000,000,000 (10 Billion)
- Hardware: 8vCPUx16G Cloud Instance
- Rust: rustc 1.50.0-nightly (f76ecd066 2020-12-15)
- Build with Link-time Optimization and Using CPU Specific Instructions
Query | FuseQuery (v0.1) | ClickHouse (v19.17.6) |
---|---|---|
SELECT avg(number) FROM system.numbers_mt | (1.32 s.) | ×1.29 (1.70 s) 5.90 billion rows/s., 47.16 GB/s |
SELECT sum(number) FROM system.numbers_mt | (1.35 s.) | ×0.99 (1.34 s) 7.48 billion rows/s., 59.80 GB/s |
SELECT min(number) FROM system.numbers_mt | (1.34 s.) | ×1.17 (1.57 s.) 6.36 billion rows/s., 50.89 GB/s |
SELECT max(number) FROM system.numbers_mt | (1.32 s.) | ×1.77 (2.33 s.) 4.34 billion rows/s., 34.74 GB/s |
SELECT max(number+1) FROM system.numbers_mt | (3.77 s.) | ×0.87 (3.29 s.) 3.04 billion rows/s., 24.31 GB/s |
SELECT count(number) FROM system.numbers_mt | (1.31 s.) | ×0.51 (0.67 s.) 15.00 billion rows/s., 119.99 GB/s |
SELECT sum(number+number+number) FROM numbers_mt | (4.05 s.) | ×1.22 (4.95 s.) 2.02 billion rows/s., 16.17 GB/s |
SELECT sum(number) / count(number) FROM system.numbers_mt | (1.32 s.) | ×0.97 (1.28 s.) 7.84 billion rows/s., 62.73 GB/s |
SELECT sum(number) / count(number), max(number), min(number) FROM system.numbers_mt | (1.76 s.) | ×2.29 (4.03 s.) 2.33 billion rows/s., 18.61 GB/s |
Note:
- ClickHouse system.numbers_mt is 8-way parallelism processing
- FuseQuery system.numbers_mt is 8-way parallelism processing
Run from source
$ make run
12:46:15 [ INFO] Options { log_level: "debug", num_cpus: 8, mysql_handler_port: 3307 }
12:46:15 [ INFO] Fuse-Query Cloud Compute Starts...
12:46:15 [ INFO] Usage: mysql -h127.0.0.1 -P3307
or Run with docker(Recommended):
$ docker pull datafusedev/fuse-query
...
$ docker run --init --rm -p 3307:3307 datafusedev/fuse-query
05:12:36 [ INFO] Options { log_level: "debug", num_cpus: 6, mysql_handler_port: 3307 }
05:12:36 [ INFO] Fuse-Query Cloud Compute Starts...
05:12:36 [ INFO] Usage: mysql -h127.0.0.1 -P3307
$ mysql -h127.0.0.1 -P3307
mysql> explain select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| └─ Limit: 3
└─ Projection: (number + 1) as c1:UInt64, (number / 2) as c2:UInt64
└─ Filter: (((c1 + c2) + 1) < 100)
└─ ReadDataSource: scan parts [8](Read from system.numbers_mt table) |
|
└─ LimitTransform × 1 processor
└─ Merge (LimitTransform × 8 processors) to (MergeProcessor × 1)
└─ LimitTransform × 8 processors
└─ ProjectionTransform × 8 processors
└─ FilterTransform × 8 processors
└─ SourceTransform × 8 processors |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+------+------+
| c1 | c2 |
+------+------+
| 1 | 0 |
| 2 | 0 |
| 3 | 1 |
+------+------+
3 rows in set (0.06 sec)
$ make test
- 0.1 support aggregation select
- 0.2 support distributed query (WIP)
- 0.3 support group by, order by
- 0.4 support join
- 0.5 support sub queries
- 0.6 support TPC-H benchmark