ballista-0.6.0 (2021-11-13)
Breaking changes:
- File partitioning for ListingTable #1141 (rdettai)
- Register tables in BallistaContext using TableProviders instead of Dataframe #1028 (rdettai)
- Make TableProvider.scan() and PhysicalPlanner::create_physical_plan() async #1013 (rdettai)
- Reorganize table providers by table format #1010 (rdettai)
- Move CBOs and Statistics to physical plan #965 (rdettai)
- Update to sqlparser v 0.10.0 #934 [sql] (alamb)
- FilePartition and PartitionedFile for scanning flexibility #932 [sql] (yjshen)
- Improve SQLMetric APIs, port existing metrics #908 (alamb)
- Add support for EXPLAIN ANALYZE #858 [sql] (alamb)
- Rename concurrency to target_partitions #706 (andygrove)
Implemented enhancements:
- Update datafusion-cli to support Ballista, or implement new ballista-cli #886
- Prepare Ballista crates for publishing #509
- Add drop table support #1266 [sql] (viirya)
- use arrow 6.1.0 #1255 (Jimexist)
- Add support for
create table as
via MemTable #1243 [sql] (Dandandan) - add values list expression #1165 [sql] (Jimexist)
- Multiple files per partitions for CSV Avro Json #1138 (rdettai)
- Implement INTERSECT & INTERSECT DISTINCT #1135 [sql] (xudong963)
- Simplify file struct abstractions #1120 (rdettai)
- Implement
is [not] distinct from
#1117 [sql] (Dandandan) - add digest(utf8, method) function and refactor all current hash digest functions #1090 (Jimexist)
- [crypto] add
blake3
algorithm todigest
function #1086 (Jimexist) - [crypto] add blake2b and blake2s functions #1081 (Jimexist)
- Update sqlparser-rs to 0.11 #1052 [sql] (alamb)
- remove hard coded partition count in ballista logicalplan deserialization #1044 (xudong963)
- Indexed field access for List #1006 [sql] (Igosuki)
- Update DataFusion to arrow 6.0 #984 (alamb)
- Implement Display for Expr, improve operator display #971 [sql] (matthewmturner)
- ObjectStore API to read from remote storage systems #950 (yjshen)
- fixes #933 replace placeholder fmt_as fr ExecutionPlan impls #939 (tiphaineruy)
- Support
NotLike
in Ballista #916 (Dandandan) - Avro Table Provider #910 [sql] (Igosuki)
- Add BaselineMetrics, Timestamp metrics, add for
CoalescePartitionsExec
, rename output_time -> elapsed_compute #909 (alamb) - [Ballista] Add executor last seen info to the ui #895 (msathis)
- add cross join support to ballista #891 (houqp)
- Add Ballista support to DataFusion CLI #889 (andygrove)
- Add support for PostgreSQL regex match #870 [sql] (b41sh)
Fixed bugs:
- Test execution_plans::shuffle_writer::tests::test Fail #1040
- Integration test fails to build docker images #918
- Ballista: Remove hard-coded concurrency from logical plan serde code #708
- How can I make ballista distributed compute work? #327
- fix subquery alias #1067 [sql] (xudong963)
- Fix compilation for ballista in stand-alone mode #1008 (Igosuki)
Documentation updates:
- Add Ballista roadmap #1166 (andygrove)
- Adds note on compatible rust version #1097 (1nF0rmed)
- implement
approx_distinct
function using HyperLogLog #1087 (Jimexist) - Improve User Guide #954 (andygrove)
- Update plan_query_stages doc #951 (rdettai)
- [DataFusion] - Add show and show_limit function for DataFrame #923 (francis-du)
- update docs related to protoc and optional syntax #902 (Jimexist)
- Improve Ballista crate README content #878 (andygrove)
Performance improvements:
Closed issues:
- InList expr with NULL literals do not work #1190
- update the homepage README to include values,
approx_distinct
, etc. #1171 - [Python]: Inconsistencies with Python package name #1011
- Wanting to contribute to project where to start? #983
- delete redundant code #973
- How to build DataFusion python wheel #853
- Produce a design for a metrics framework #21
Merged pull requests:
For older versions, see apache/arrow/CHANGELOG.md
ballista-0.5.0 (2021-08-10)
Breaking changes:
- [ballista] support date_part and date_turnc ser/de, pass tpch 7 #840 (houqp)
- Box ScalarValue:Lists, reduce size by half size #788 (alamb)
- Support DataFrame.collect for Ballista DataFrames #785 (andygrove)
- JOIN conditions are order dependent #778 (seddonm1)
- UnresolvedShuffleExec should represent a single shuffle #727 (andygrove)
- Ballista: Make shuffle partitions configurable in benchmarks #702 (andygrove)
- Rename MergeExec to CoalescePartitionsExec #635 (andygrove)
- Ballista: Rename QueryStageExec to ShuffleWriterExec #633 (andygrove)
- fix 593, reduce cloning by taking ownership in logical planner's
from
fn #610 (Jimexist) - fix join column handling logic for
On
andUsing
constraints #605 (houqp) - Move ballista standalone mode to client #589 (edrevo)
- Ballista: Implement map-side shuffle #543 (andygrove)
- ShuffleReaderExec now supports multiple locations per partition #541 (andygrove)
- Make external hostname in executor optional #232 (edrevo)
- Remove namespace from executors #75 (edrevo)
- Support qualified columns in queries #55 (houqp)
- Read CSV format text from stdin or memory #54 (heymind)
- Remove Ballista DataFrame #48 (andygrove)
- Use atomics for SQLMetric implementation, remove unused name field #25 (returnString)
Implemented enhancements:
- Add crate documentation for Ballista crates #830
- Support DataFrame.collect for Ballista DataFrames #787
- Ballista: Prep for supporting shuffle correctly, part one #736
- Ballista: Implement physical plan serde for ShuffleWriterExec #710
- Ballista: Finish implementing shuffle mechanism #707
- Rename QueryStageExec to ShuffleWriterExec #542
- Ballista ShuffleReaderExec should be able to read from multiple locations per partition #540
- [Ballista] Use deployments in k8s user guide #473
- Ballista refactor QueryStageExec in preparation for map-side shuffle #458
- Ballista: Implement map-side of shuffle #456
- Refactor Ballista to separate Flight logic from execution logic #449
- Use published versions of arrow rather than github shas #393
- BallistaContext::collect() logging is too noisy #352
- Update Ballista to use new physical plan formatter utility #343
- Add Ballista Getting Started documentation #329
- Remove references to ballistacompute Docker Hub repo #325
- Implement scalable distributed joins #63
- Remove hard-coded Ballista version from scripts #32
- Implement streaming versions of Dataframe.collect methods #789 (andygrove)
- Ballista shuffle is finally working as intended, providing scalable distributed joins #750 (andygrove)
- Update to use arrow 5.0 #721 (alamb)
- Implement serde for ShuffleWriterExec #712 (andygrove)
- dedup using join column in wildcard expansion #678 (houqp)
- Implement metrics for shuffle read and write #676 (andygrove)
- Remove hard-coded PartitionMode from Ballista serde #637 (andygrove)
- Ballista: Implement scalable distributed joins #634 (andygrove)
- Add Keda autoscaling for ballista in k8s #586 (edrevo)
- Add some resiliency to lost executors #568 (edrevo)
- Add
partition by
constructs in window functions and modify logical planning #501 (Jimexist) - Support anti join #482 (Dandandan)
- add
order by
construct in window function and logical plans #463 (Jimexist) - Refactor Ballista executor so that FlightService delegates to an Executor struct #450 (andygrove)
- implement lead and lag built-in window function #429 (Jimexist)
- Implement fmt_as for ShuffleReaderExec #400 (andygrove)
- Add window expression part 1 - logical and physical planning, structure, to/from proto, and explain, for empty over clause only #334 (Jimexist)
- [breaking change] fix 265, log should be log10, and add ln #271 (Jimexist)
- Allow table providers to indicate their type for catalog metadata #205 (returnString)
- Add query 19 to TPC-H regression tests #59 (Dandandan)
- Use arrow eq kernels in CaseWhen expression evaluation #52 (Dandandan)
- Add option param for standalone mode #42 (djKooks)
- [DataFusion] Optimize hash join inner workings, null handling fix #24 (Dandandan)
- [Ballista] Docker files for ui #22 (msathis)
Fixed bugs:
- Ballista: TPC-H q3 @ SF=1000 never completes #835
- Ballista does not support MIN/MAX aggregate functions #832
- Ballista docker images fail to build #828
- Ballista: UnresolvedShuffleExec should only have a single stage_id #726
- Ballista integration tests are failing #623
- Integration test build failure due to arrow-rs using unstable feature #596
cargo build
cannot build the project #531- ShuffleReaderExec does not get formatted correctly in displayable physical plan #399
- Implement serde for MIN and MAX #833 (andygrove)
- Ballista: Prep for fixing shuffle mechansim, part 1 #738 (andygrove)
- Ballista: Shuffle write bug fix #714 (andygrove)
- honor table name for csv/parquet scan in ballista plan serde #629 (houqp)
- MINOR: Fix integration tests by adding datafusion-cli module to docker image #322 (andygrove)
Documentation updates:
- Add minimal crate documentation for Ballista crates #831 (andygrove)
- Add Ballista examples #775 (andygrove)
- Update ballista.proto link in architecture doc #502 (terrycorley)
- Update k8s user guide to use deployments #474 (edrevo)
- use prettier to format md files #367 (Jimexist)
- Make it easier for developers to find Ballista documentation #330 (andygrove)
- Instructions for cross-compiling Ballista to the Raspberry Pi #263 (andygrove)
- Add install guide in README #236 (djKooks)
Performance improvements:
- Ballista: Avoid sleeping between polling for tasks #698 (Dandandan)
- Make BallistaContext::collect streaming #535 (edrevo)
Closed issues:
- Confirm git tagging strategy for releases #770
- arrow::util::pretty::pretty_format_batches missing #769
- move the
assert_batches_eq!
macros to a non part of datafusion #745 - fix an issue where aliases are not respected in generating downstream schemas in window expr #592
- make the planner to print more succinct and useful information in window function explain clause #526
- move window frame module to be in
logical_plan
#517 - use a more rust idiomatic way of handling nth_value #448
- Make Ballista not depend on arrow directly #446
- create a test with more than one partition for window functions #435
- Implement hash-partitioned hash aggregate #27
- Consider using GitHub pages for DataFusion/Ballista documentation #18
- Add Ballista to default cargo workspace #17
- Update "repository" in Cargo.toml #16
- Consolidate TPC-H benchmarks #6
- [Ballista] Fix integration test script #4
- Ballista should not have separate DataFrame implementation #2
Merged pull requests:
- Change datatype of tpch keys from Int32 to UInt64 to support sf=1000 #836 (andygrove)
- Add ballista-examples to docker build #829 (andygrove)
- Update dependencies: prost to 0.8 and tonic to 0.5 #818 (alamb)
- Move
hash_array
into hash_utils.rs #807 (alamb) - Fix: Update clippy lints for Rust 1.54 #794 (alamb)
- MINOR: Remove unused Ballista query execution code path #732 (andygrove)
- [fix] benchmark run with compose #666 (rdettai)
- bring back dev scripts for ballista #648 (Jimexist)
- Remove unnecessary mutex #639 (edrevo)
- round trip TPCH queries in tests #630 (houqp)
- Fix build #627 (andygrove)
- in ballista also check for UI prettier changes #578 (Jimexist)
- turn on clippy rule for needless borrow #545 (Jimexist)
- reuse datafusion physical planner in ballista building from protobuf #532 (Jimexist)
- update cargo.toml in python crate and fix unit test due to hash joins #483 (Jimexist)
- make
VOLUME
declaration in tpch datagen docker absolute #466 (crepererum) - Refactor QueryStageExec in preparation for implementing map-side shuffle #459 (andygrove)
- Simplified usage of
use arrow
in ballista. #447 (jorgecarleitao) - Benchmark subcommand to distinguish between DataFusion and Ballista #402 (jgoday)
- #352: BallistaContext::collect() logging is too noisy #394 (jgoday)
- cleanup function return type fn #350 (Jimexist)
- Update Ballista to use new physical plan formatter utility #344 (andygrove)
- Update arrow dependencies again #341 (alamb)
- Remove references to Ballista Docker images published to ballistacompute Docker Hub repo #326 (andygrove)
- Update arrow-rs deps #317 (alamb)
- Update arrow deps #269 (alamb)
- Enable redundant_field_names clippy lint #261 (Dandandan)
- Update arrow-rs deps (to fix build due to flatbuffers update) #224 (alamb)
- update arrow-rs deps to latest master #216 (alamb)
* This Changelog was automatically generated by github_changelog_generator