Skip to content

Commit ca4b6ee

Browse files
Dandandanmustafasrepoberkaysynnadaozankabakdevinjdangelo
authored
Upgrade DataFusion to latest, to include fixes for aggregation (#216)
* Cleanup logical optimizer rules. (apache#7919) * Initial commit * Address todos * Update comments * Simplifications * Minor simplifications * Address reviews * Add TableScan constructor * Minor changes * make try_new_with_schema method of Aggregate private * Use projection try_new instead of try_new_schema * Simplifications, add comment * Review changes * Improve comments * Move get_wider_type to type_coercion module * Clean up type coercion file --------- Co-authored-by: berkaysynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Parallelize Serialization of Columns within Parquet RowGroups (apache#7655) * merge main * fixes and cmt * review comments, tuning parameters, updating docs * cargo fmt * reduce default buffer size to 2 and update docs * feat: Use bloom filter when reading parquet to skip row groups (apache#7821) * feat: implement read bloom filter support * test: add unit test for read bloom filter * Simplify bloom filter application * test: add unit test for bloom filter with sql `in` * fix: imrpove bloom filter match express * fix: add more test for bloom filter * ci: rollback dependences * ci: merge main branch * fix: unit tests for bloom filter * ci: cargo clippy * ci: cargo clippy --------- Co-authored-by: Andrew Lamb <[email protected]> * fix: don't push down volatile predicates in projection (apache#7909) * fix: don't push down volatile predicates in projection * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: Andrew Lamb <[email protected]> * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: Andrew Lamb <[email protected]> * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: Andrew Lamb <[email protected]> * add suggestions * fix * fix doc * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: Jonah Gao <[email protected]> * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: Jonah Gao <[email protected]> * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: Jonah Gao <[email protected]> * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: Jonah Gao <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Jonah Gao <[email protected]> * Add `parquet` feature flag, enabled by default, and make parquet conditional (apache#7745) * Make parquet an option by adding multiple cfg attributes without significant code changes. * Extract parquet logic into submodule from execution::context * Extract parquet logic into submodule from datafusion_core::dataframe * Extract more logic into submodule from execution::context * Move tests from execution::context * Rename submodules * [MINOR]: Simplify enforce_distribution, minor changes (apache#7924) * Initial commit * Simplifications * Cleanup imports * Review --------- Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Add simple window query to sqllogictest (apache#7928) * ci: upgrade node to version 20 (apache#7918) * Change input for `to_timestamp` function to be seconds rather than nanoseconds, add `to_timestamp_nanos` (apache#7844) * Change input for `to_timestamp` function * docs * fix examples * output `to_timestamp` signature as ns * Minor: Document `parquet` crate feature (apache#7927) * Minor: reduce some #cfg(feature = "parquet") (apache#7929) * Minor: reduce use of cfg(parquet) in tests (apache#7930) * Fix CI failures on `to_timestamp()` calls (apache#7941) * Change input for `to_timestamp` function * docs * fix examples * output `to_timestamp` signature as ns * Fix CI `to_timestamp()` failed * Update datafusion/expr/src/built_in_function.rs Co-authored-by: Andrew Lamb <[email protected]> * fix typo * fix --------- Co-authored-by: Andrew Lamb <[email protected]> * minor: add a datatype casting for the updated value (apache#7922) * minor: cast the updated value to the data type of target column * Update datafusion/sqllogictest/test_files/update.slt Co-authored-by: Alex Huang <[email protected]> * Update datafusion/sqllogictest/test_files/update.slt Co-authored-by: Alex Huang <[email protected]> * Update datafusion/sqllogictest/test_files/update.slt Co-authored-by: Alex Huang <[email protected]> * fix tests --------- Co-authored-by: Alex Huang <[email protected]> * fix (apache#7946) * Add simple exclude all columns test to sqllogictest (apache#7945) * Add simple exclude all columns test to sqllogictest * Add more exclude test cases * Support Partitioning Data by Dictionary Encoded String Array Types (apache#7896) * support dictionary encoded string columns for partition cols * remove debug prints * cargo fmt * generic dictionary cast and dict encoded test * updates from review * force retry checks * try checks again * Minor: Remove array() in array_expression (apache#7961) * remove array Signed-off-by: jayzhan211 <[email protected]> * cleanup others Signed-off-by: jayzhan211 <[email protected]> * clippy Signed-off-by: jayzhan211 <[email protected]> * cleanup cast Signed-off-by: jayzhan211 <[email protected]> * fmt Signed-off-by: jayzhan211 <[email protected]> * cleanup cast Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> * Minor: simplify update code (apache#7943) * Add some initial content about creating logical plans (apache#7952) * Minor: Change from `&mut SessionContext` to `&SessionContext` in substrait (apache#7965) * Lower &mut SessionContext in substrait * rm mut ctx in tests * Fix crate READMEs (apache#7964) * Minor: Improve `HashJoinExec` documentation (apache#7953) * Minor: Improve `HashJoinExec` documentation * Apply suggestions from code review Co-authored-by: Liang-Chi Hsieh <[email protected]> --------- Co-authored-by: Liang-Chi Hsieh <[email protected]> * chore: clean useless clone baesd on clippy (apache#7973) * Add README.md to `core`, `execution` and `physical-plan` crates (apache#7970) * Add README.md to `core`, `execution` and `physical-plan` crates * prettier * Update datafusion/physical-plan/README.md * Update datafusion/wasmtest/README.md --------- Co-authored-by: Daniël Heres <[email protected]> * Move source repartitioning into `ExecutionPlan::repartition` (apache#7936) * Move source repartitioning into ExecutionPlan::repartition * cleanup * update test * update test * refine docs * fix merge * minor: fix broken links in README.md (apache#7986) * minor: fix broken links in README.md * fix proto link * Minor: Upate the `sqllogictest` crate README (apache#7971) * Minor: Upate the sqllogictest crate README * prettier * Apply suggestions from code review Co-authored-by: Jonah Gao <[email protected]> Co-authored-by: jakevin <[email protected]> --------- Co-authored-by: Jonah Gao <[email protected]> Co-authored-by: jakevin <[email protected]> * Improve MemoryCatalogProvider default impl block placement (apache#7975) * Fix `ScalarValue` handling of NULL values for ListArray (apache#7969) * Fix try_from_array data type for NULL value in ListArray * Fix * Explicitly assert the datatype * For review * Refactor of Ordering and Prunability Traversals and States (apache#7985) * simplify ExprOrdering * Comment improvements * Move map/transform comment up --------- Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Keep output as scalar for scalar function if all inputs are scalar (apache#7967) * Keep output as scalar for scalar function if all inputs are scalar * Add end-to-end tests * Fix crate READMEs for core, execution, physical-plan (apache#7990) * Update sqlparser requirement from 0.38.0 to 0.39.0 (apache#7983) * chore: Update sqlparser requirement from 0.38.0 to 0.39.0 * support FILTER Aggregates * Fix panic in multiple distinct aggregates by fixing `ScalarValue::new_list` (apache#7989) * Fix panic in multiple distinct aggregates by fixing ScalarValue::new_list * Update datafusion/common/src/scalar.rs Co-authored-by: Daniël Heres <[email protected]> --------- Co-authored-by: Daniël Heres <[email protected]> * MemoryReservation exposes MemoryConsumer (apache#8000) ... as a getter method. * fix: generate logical plan for `UPDATE SET FROM` statement (apache#7984) * Create temporary files for reading or writing (apache#8005) * Create temporary files for reading or writing * nit * addr comment --------- Co-authored-by: zhongjingxiong <[email protected]> * doc: minor fix to SortExec::with_fetch comment (apache#8011) * Fix: dataframe_subquery example Optimizer rule `common_sub_expression_eliminate` failed (apache#8016) * Fix: Optimizer rule 'common_sub_expression_eliminate' failed * nit * nit * nit --------- Co-authored-by: zhongjingxiong <[email protected]> * Percent Decode URL Paths (apache#8009) (apache#8012) * Treat ListingTableUrl as URL-encoded (apache#8009) * Update lockfile * Review feedback * Minor: Extract common deps into workspace (apache#7982) * Improve datafusion-* * More common crates * Extract async-trait * Extract more * Fix cli --------- Co-authored-by: Andrew Lamb <[email protected]> * minor: change some plan_err to exec_err (apache#7996) * minor: change some plan_err to exec_err Signed-off-by: Ruihang Xia <[email protected]> * change unreachable code to internal error Signed-off-by: Ruihang Xia <[email protected]> --------- Signed-off-by: Ruihang Xia <[email protected]> * Minor: error on unsupported RESPECT NULLs syntax (apache#7998) * Minor: error on unsupported RESPECT NULLs syntax * fix clippy * Update datafusion/sql/tests/sql_integration.rs Co-authored-by: Liang-Chi Hsieh <[email protected]> --------- Co-authored-by: Liang-Chi Hsieh <[email protected]> * GroupedHashAggregateStream breaks spill batch (apache#8004) ... into smaller chunks to decrease memory required for merging. * Minor: Add implementation examples to ExecutionPlan::execute (apache#8013) * Add implementation examples to ExecutionPlan::execute * Review feedback * address comment (apache#7993) Signed-off-by: jayzhan211 <[email protected]> * GroupedHashAggregateStream should register spillable consumer (apache#8002) * fix: single_distinct_aggretation_to_group_by fail (apache#7997) * fix: single_distinct_aggretation_to_group_by faile * fix * move test to groupby.slt * Read only enough bytes to infer Arrow IPC file schema via stream (apache#7962) * Read only enough bytes to infer Arrow IPC file schema via stream * Error checking for collect bytes func * Update datafusion/core/src/datasource/file_format/arrow.rs Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]> * Minor: remove a strange char (apache#8030) * Minor: Improve documentation for Filter Pushdown (apache#8023) * Minor: Improve documentation for Fulter Pushdown * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: jakevin <[email protected]> * Apply suggestions from code review * Update datafusion/optimizer/src/push_down_filter.rs Co-authored-by: Alex Huang <[email protected]> --------- Co-authored-by: jakevin <[email protected]> Co-authored-by: Alex Huang <[email protected]> * Minor: Improve `ExecutionPlan` documentation (apache#8019) * Minor: Improve `ExecutionPlan` documentation * Add link to Partitioning * fix: clippy warnings from nightly rust 1.75 (apache#8025) Signed-off-by: Ruihang Xia <[email protected]> * Minor: Avoid recomputing compute_array_ndims in align_array_dimensions (apache#7963) * Refactor align_array_dimensions Signed-off-by: jayzhan211 <[email protected]> * address comment Signed-off-by: jayzhan211 <[email protected]> * remove unwrap Signed-off-by: jayzhan211 <[email protected]> * address comment Signed-off-by: jayzhan211 <[email protected]> * fix rebase Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> * Minor: fix doc check (apache#8037) * Minor: remove uncessary #cfg test (apache#8036) * Minor: remove uncessary #cfg test * fmt * Update datafusion/core/src/datasource/file_format/arrow.rs Co-authored-by: Ruihang Xia <[email protected]> --------- Co-authored-by: Daniël Heres <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> * Minor: Improve documentation for `PartitionStream` and `StreamingTableExec` (apache#8035) * Minor: Improve documentation for `PartitionStream` and `StreamingTableExec` * fmt * fmt * Combine Equivalence and Ordering equivalence to simplify state (apache#8006) * combine equivalence and ordering equivalence * Remove EquivalenceProperties struct * Minor changes * all tests pass * Refactor oeq * Simplifications * Resolve linter errors * Minor changes * Minor changes * Add new tests * Simplifications window mode selection * Simplifications * Use set_satisfy api * Use utils for aggregate * Minor changes * Minor changes * Minor changes * All tests pass * Simplifications * Simplifications * Minor changes * Simplifications * All tests pass, fix bug * Remove unnecessary code * Simplifications * Minor changes * Simplifications * Move oeq join to methods * Simplifications * Remove redundant code * Minor changes * Minor changes * Simplifications * Simplifications * Simplifications * Move window to util from method, simplifications * Simplifications * Propagate meet in the union * Simplifications * Minor changes, rename * Address berkay reviews * Simplifications * Add new buggy test * Add data test for sort requirement * Add experimental check * Add random test * Minor changes * Random test gives error * Fix missing test case * Minor changes * Minor changes * Simplifications * Minor changes * Add new test case * Minor changes * Address reviews * Minor changes * Increase coverage of random tests * Remove redundant code * Simplifications * Simplifications * Refactor on tests * Solving clippy errors * prune_lex improvements * Fix failing tests * Update get_finer and get_meet * Fix window lex ordering implementation * Buggy state * Do not use output ordering in the aggregate * Add union test * Update comment * Fix bug, when batch_size is small * Review Part 1 * Review Part 2 * Change union meet implementation * Update comments * Remove redundant check * Simplify project out_expr function * Remove Option<Vec<_>> API. * Do not use project_out_expr * Simplifications * Review Part 3 * Review Part 4 * Review Part 5 * Review Part 6 * Review Part 7 * Review Part 8 * Update comments * Add new unit tests, simplifications * Resolve linter errors * Simplify test codes * Review Part 9 * Add unit tests for remove_redundant entries * Simplifications * Review Part 10 * Fix test * Add new test case, fix implementation * Review Part 11 * Review Part 12 * Update comments * Review Part 13 * Review Part 14 * Review Part 15 * Review Part 16 * Review Part 17 * Review Part 18 * Review Part 19 * Review Part 20 * Review Part 21 * Review Part 22 * Review Part 23 * Review Part 24 * Do not construct idx and sort_expr unnecessarily, Update comments, Union meet single entry * Review Part 25 * Review Part 26 * Name Changes, comment updates * Review Part 27 * Add issue links * Address reviews * Fix failing test * Update comments * SortPreservingMerge, SortPreservingRepartition only preserves given expression ordering among input ordering equivalences --------- Co-authored-by: metesynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Encapsulate `ProjectionMapping` as a struct (apache#8033) * Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, ... (apache#8040) * Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, etc * prettier * Update docs/source/user-guide/sql/scalar_functions.md Co-authored-by: comphead <[email protected]> * Update docs/source/user-guide/sql/scalar_functions.md Co-authored-by: comphead <[email protected]> --------- Co-authored-by: comphead <[email protected]> * Improve comments for `PartitionSearchMode` struct (apache#8047) * Improve comments * Make comments partition/group agnostic * General approach for Array replace (apache#8050) * checkpoint Signed-off-by: jayzhan211 <[email protected]> * optimize non-list Signed-off-by: jayzhan211 <[email protected]> * replace list ver Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * rename Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> * Minor: Remove the irrelevant note from the Expression API doc (apache#8053) * Minor: Add more documentation about Partitioning (apache#8022) * Minor: Add more documentation about Partitioning * fix typo * Apply suggestions from code review Co-authored-by: comphead <[email protected]> * Add more diagrams, improve text * undo unintended changes * undo unintended changes * fix links * Try and clarify --------- Co-authored-by: comphead <[email protected]> * Minor: improve documentation for IsNotNull, DISTINCT, etc (apache#8052) * Minor: improve documentation for IsNotNull, DISTINCT, etc * fix * Prepare 33.0.0 Release (apache#8057) * changelog * update version * update changelog * Minor: improve error message by adding types to message (apache#8065) * Minor: improve error message * add test * Minor: Remove redundant BuiltinScalarFunction::supports_zero_argument() (apache#8059) * deprecate BuiltinScalarFunction::supports_zero_argument() * unify old supports_zero_argument() impl * Add example to ci (apache#8060) * feat: add example to ci * nit * addr comments --------- Co-authored-by: zhongjingxiong <[email protected]> * Update substrait requirement from 0.18.0 to 0.19.0 (apache#8076) Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version. - [Release notes](https://github.com/substrait-io/substrait-rs/releases) - [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md) - [Commits](substrait-io/substrait-rs@v0.18.0...v0.19.0) --- updated-dependencies: - dependency-name: substrait dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix incorrect results in COUNT(*) queries with LIMIT (apache#8049) Co-authored-by: Mark Sirek <[email protected]> * feat: Support determining extensions from names like `foo.parquet.snappy` as well as `foo.parquet` (apache#7972) * feat: read files based on the file extention * fix: some the file extension might be started with . and some not * fix: rename extention to extension * chore: use exec_err * chore: rename extention to extension * chore: rename extention to extension * chore: simplify the code * fix: check table is empty * ci: fix test * fix: add err info * refactor: extract the logic to infer_types * fix: add tests for different extensions * fix: ci clippy * fix: add more tests * fix: simplify the logic * fix: ci * Use FairSpillPool for TaskContext with spillable config (apache#8072) * Minor: Improve HashJoinStream docstrings (apache#8070) * Minor: Improve HashJoinStream docstrings * fix comments * Update datafusion/physical-plan/src/joins/hash_join.rs Co-authored-by: comphead <[email protected]> * Update datafusion/physical-plan/src/joins/hash_join.rs Co-authored-by: comphead <[email protected]> --------- Co-authored-by: Daniël Heres <[email protected]> Co-authored-by: comphead <[email protected]> * Fixing broken link (apache#8085) * Fixing broken link * Update docs/source/contributor-guide/index.md Thanks for spotting this as well Co-authored-by: Liang-Chi Hsieh <[email protected]> --------- Co-authored-by: Liang-Chi Hsieh <[email protected]> * fix: DataFusion suggests invalid functions (apache#8083) * fix: DataFusion suggests invalid functions * update test * Add test for BuiltInWindowFunction * Replace macro with function for `array_repeat` (apache#8071) * General array repeat Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * add test Signed-off-by: jayzhan211 <[email protected]> * add test Signed-off-by: jayzhan211 <[email protected]> * done Signed-off-by: jayzhan211 <[email protected]> * remove test Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * fm Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> * Minor: remove unnecessary projection in `single_distinct_to_group_by` rule (apache#8061) * Minor: remove unnecessary projection * fix ci * minor: Remove duplicate version numbers for arrow, object_store, and parquet dependencies (apache#8095) * remove duplicate version numbers for arrow, object_store, and parquet dependencies * cargo update * use default features in parquet crate * disable default parquet features in wasmtest * fix: add match encode/decode scalar function type (apache#8089) * feat: Protobuf serde for Json file sink (apache#8062) * Protobuf serde for Json file sink * Fix tests * Fix test * Minor: use `Expr::alias` in a few places to make the code more concise (apache#8097) * Minor: Cleanup BuiltinScalarFunction::return_type() (apache#8088) * Expose metrics from FileSinkExec impl of ExecutionPlan --------- Signed-off-by: jayzhan211 <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: berkaysynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]> Co-authored-by: Devin D'Angelo <[email protected]> Co-authored-by: Hengfei Yang <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Huaijin <[email protected]> Co-authored-by: Jonah Gao <[email protected]> Co-authored-by: Chih Wang <[email protected]> Co-authored-by: Jeffrey <[email protected]> Co-authored-by: Marco Neumann <[email protected]> Co-authored-by: comphead <[email protected]> Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Jay Zhan <[email protected]> Co-authored-by: Andy Grove <[email protected]> Co-authored-by: yi wang <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: jakevin <[email protected]> Co-authored-by: 张林伟 <[email protected]> Co-authored-by: Berkay Şahin <[email protected]> Co-authored-by: Marko Milenković <[email protected]> Co-authored-by: jokercurry <[email protected]> Co-authored-by: zhongjingxiong <[email protected]> Co-authored-by: Weston Pace <[email protected]> Co-authored-by: Raphael Taylor-Davies <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> Co-authored-by: metesynnada <[email protected]> Co-authored-by: Yongting You <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mark Sirek <[email protected]> Co-authored-by: Mark Sirek <[email protected]> Co-authored-by: Edmondo Porcu <[email protected]> Co-authored-by: Syleechan <[email protected]> Co-authored-by: Dan Harris <[email protected]>
1 parent f430805 commit ca4b6ee

File tree

217 files changed

+12721
-7719
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

217 files changed

+12721
-7719
lines changed

.github/pull_request_template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,4 @@ If there are user-facing changes then we may require documentation to be updated
3737

3838
<!--
3939
If there are any breaking changes to public APIs, please add the `api change` label.
40-
-->
40+
-->

.github/workflows/dev.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ jobs:
4343
- uses: actions/checkout@v4
4444
- uses: actions/setup-node@v4
4545
with:
46-
node-version: "14"
46+
node-version: "20"
4747
- name: Prettier check
4848
run: |
4949
# if you encounter error, rerun the command below and commit the changes

.github/workflows/rust.yml

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -139,19 +139,7 @@ jobs:
139139
# test datafusion-sql examples
140140
cargo run --example sql
141141
# test datafusion-examples
142-
cargo run --example avro_sql --features=datafusion/avro
143-
cargo run --example csv_sql
144-
cargo run --example custom_datasource
145-
cargo run --example dataframe
146-
cargo run --example dataframe_in_memory
147-
cargo run --example deserialize_to_struct
148-
cargo run --example expr_api
149-
cargo run --example parquet_sql
150-
cargo run --example parquet_sql_multiple_files
151-
cargo run --example memtable
152-
cargo run --example rewrite_expr
153-
cargo run --example simple_udf
154-
cargo run --example simple_udaf
142+
ci/scripts/rust_example.sh
155143
- name: Verify Working Directory Clean
156144
run: git diff --exit-code
157145

@@ -527,7 +515,7 @@ jobs:
527515
rust-version: stable
528516
- uses: actions/setup-node@v4
529517
with:
530-
node-version: "14"
518+
node-version: "20"
531519
- name: Check if configs.md has been modified
532520
run: |
533521
# If you encounter an error, run './dev/update_config_docs.sh' and commit

Cargo.toml

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ members = [
3232
"datafusion/substrait",
3333
"datafusion/wasmtest",
3434
"datafusion-examples",
35+
"docs",
3536
"test-utils",
3637
"benchmarks",
3738
]
@@ -45,17 +46,50 @@ license = "Apache-2.0"
4546
readme = "README.md"
4647
repository = "https://github.com/apache/arrow-datafusion"
4748
rust-version = "1.70"
48-
version = "32.0.0"
49+
version = "33.0.0"
4950

5051
[workspace.dependencies]
5152
arrow = { version = "48.0.0", features = ["prettyprint"] }
5253
arrow-array = { version = "48.0.0", default-features = false, features = ["chrono-tz"] }
5354
arrow-buffer = { version = "48.0.0", default-features = false }
5455
arrow-flight = { version = "48.0.0", features = ["flight-sql-experimental"] }
56+
arrow-ord = { version = "48.0.0", default-features = false }
5557
arrow-schema = { version = "48.0.0", default-features = false }
56-
parquet = { version = "48.0.0", features = ["arrow", "async", "object_store"] }
57-
sqlparser = { version = "0.38.0", features = ["visitor"] }
58+
async-trait = "0.1.73"
59+
bigdecimal = "0.4.1"
60+
bytes = "1.4"
61+
ctor = "0.2.0"
62+
datafusion = { path = "datafusion/core" }
63+
datafusion-common = { path = "datafusion/common" }
64+
datafusion-expr = { path = "datafusion/expr" }
65+
datafusion-sql = { path = "datafusion/sql" }
66+
datafusion-optimizer = { path = "datafusion/optimizer" }
67+
datafusion-physical-expr = { path = "datafusion/physical-expr" }
68+
datafusion-physical-plan = { path = "datafusion/physical-plan" }
69+
datafusion-execution = { path = "datafusion/execution" }
70+
datafusion-proto = { path = "datafusion/proto" }
71+
datafusion-sqllogictest = { path = "datafusion/sqllogictest" }
72+
datafusion-substrait = { path = "datafusion/substrait" }
73+
dashmap = "5.4.0"
74+
doc-comment = "0.3"
75+
env_logger = "0.10"
76+
futures = "0.3"
77+
half = "2.2.1"
78+
indexmap = "2.0.0"
79+
itertools = "0.11"
80+
log = "^0.4"
81+
num_cpus = "1.13.0"
82+
object_store = { version = "0.7.0", default-features = false }
83+
parking_lot = "0.12"
84+
parquet = { version = "48.0.0", default-features = false, features = ["arrow", "async", "object_store"] }
85+
rand = "0.8"
86+
rstest = "0.18.0"
87+
serde_json = "1"
88+
sqlparser = { version = "0.39.0", features = ["visitor"] }
89+
tempfile = "3"
90+
thiserror = "1.0.44"
5891
chrono = { version = "0.4.31", default-features = false }
92+
url = "2.2"
5993

6094
[profile.release]
6195
codegen-units = 1
@@ -74,3 +108,4 @@ opt-level = 3
74108
overflow-checks = false
75109
panic = 'unwind'
76110
rpath = false
111+

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ Default features:
4747
- `compression`: reading files compressed with `xz2`, `bzip2`, `flate2`, and `zstd`
4848
- `crypto_expressions`: cryptographic functions such as `md5` and `sha256`
4949
- `encoding_expressions`: `encode` and `decode` functions
50+
- `parquet`: support for reading the [Apache Parquet] format
5051
- `regex_expressions`: regular expression functions, such as `regexp_match`
5152
- `unicode_expressions`: Include unicode aware functions such as `character_length`
5253

@@ -59,6 +60,7 @@ Optional features:
5960
- `simd`: enable arrow-rs's manual `SIMD` kernels (requires Rust `nightly`)
6061

6162
[apache avro]: https://avro.apache.org/
63+
[apache parquet]: https://parquet.apache.org/
6264

6365
## Rust Version Compatibility
6466

benchmarks/Cargo.toml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
[package]
1919
name = "datafusion-benchmarks"
2020
description = "DataFusion Benchmarks"
21-
version = "32.0.0"
21+
version = "33.0.0"
2222
edition = { workspace = true }
2323
authors = ["Apache Arrow <[email protected]>"]
2424
homepage = "https://github.com/apache/arrow-datafusion"
@@ -34,20 +34,20 @@ snmalloc = ["snmalloc-rs"]
3434

3535
[dependencies]
3636
arrow = { workspace = true }
37-
datafusion = { path = "../datafusion/core", version = "32.0.0" }
38-
datafusion-common = { path = "../datafusion/common", version = "32.0.0" }
39-
env_logger = "0.10"
40-
futures = "0.3"
41-
log = "^0.4"
37+
datafusion = { path = "../datafusion/core", version = "33.0.0" }
38+
datafusion-common = { path = "../datafusion/common", version = "33.0.0" }
39+
env_logger = { workspace = true }
40+
futures = { workspace = true }
41+
log = { workspace = true }
4242
mimalloc = { version = "0.1", optional = true, default-features = false }
43-
num_cpus = "1.13.0"
44-
parquet = { workspace = true }
43+
num_cpus = { workspace = true }
44+
parquet = { workspace = true, default-features = true }
4545
serde = { version = "1.0.136", features = ["derive"] }
46-
serde_json = "1.0.78"
46+
serde_json = { workspace = true }
4747
snmalloc-rs = { version = "0.3", optional = true }
4848
structopt = { version = "0.3", default-features = false }
4949
test-utils = { path = "../test-utils/", version = "0.1.0" }
5050
tokio = { version = "^1.0", features = ["macros", "rt", "rt-multi-thread", "parking_lot"] }
5151

5252
[dev-dependencies]
53-
datafusion-proto = { path = "../datafusion/proto", version = "32.0.0" }
53+
datafusion-proto = { path = "../datafusion/proto", version = "33.0.0" }

ci/scripts/rust_example.sh

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#!/usr/bin/env bash
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
20+
set -ex
21+
cd datafusion-examples/examples/
22+
cargo fmt --all -- --check
23+
24+
files=$(ls .)
25+
for filename in $files
26+
do
27+
example_name=`basename $filename ".rs"`
28+
# Skip tests that rely on external storage and flight
29+
# todo: Currently, catalog.rs is placed in the external-dependence directory because there is a problem parsing
30+
# the parquet file of the external parquet-test that it currently relies on.
31+
# We will wait for this issue[https://github.com/apache/arrow-datafusion/issues/8041] to be resolved.
32+
if [ ! -d $filename ]; then
33+
cargo run --example $example_name
34+
fi
35+
done

0 commit comments

Comments
 (0)