17.0.0 (2023-01-27)
Breaking changes:
- Implemented a ReadOptions trait for cleaner code. #5025 (saikrishna1-bidgely)
Implemented enhancements:
- Add null-equals-null JOIN support in Substrait producer/consumer #5084
- Cleaner code for Read Options in reader methdos. #5024
- Substrait donation follow-on work #4897
- Add
len
method toDataFrame
#1926
Fixed bugs:
- Clippy failures in master branch and in PRs (due to new nightly Rust) #5080
Merged pull requests:
- Add null-equals-null join support #5085 (nseekhao)
- Optimize returned plan in roundtrip_fill_na function #5083 (nseekhao)
- fix clippy failures #5081 [sql] (andygrove)
- Add NULL literal support for decimal and integers #5077 (nseekhao)
- DataFrame count method #5071 (Jefffrey)
- [sqllogictests] Port orderby.rs to sqllogictests #5062 (alamb)
17.0.0-rc1 (2023-01-26)
Breaking changes:
- Change ExecutionPlan::maintains_input_order to return vector (to support multi children executors better) #5035 (mustafasrepo)
- Allow overriding error type in DataFusion Result #5000 (tustvold)
- Add dictionary_expresions feature (#4386) #4999 (tustvold)
Implemented enhancements:
- Retain the ordering of fields in the table schema when creating the projection for an update plan #5052
- [sqllogictest] Remove
integration-tests
directory #5011 - [sqllogictest] Consolidate normalization code for the postgres and non-postgres paths #5010
- [sqllogictest] Don't orchestrate the postgres containers with rust / docker #5009
- check external table exist before creating a table #4997
- Implement
std::error::Error
for DataFusionError #4991 - Return Vec<bool> instead of bool in ExecutionPlan::maintains_input_order #4980
- Add support for linear range search #4979
- Add support for bounded execution when window query involves UNBOUNDED PRECEDING #4978
- Infer prepared statement parameter types for insert queries with values clauses #4976
- The filter of outer table happens multiple time after optimizing in-subquery to join #4914
- Support Describe FILE in datafusion-cli #4913
- Release DataFusion 16 #4776
- Support writing lists in the arrow csv writer #4502
- Replace python based integration test with sqllogictest #4462
- Support CREATE TABLE table_name(...schema_fields) #4396
- Make Binary Dictionary Operations Optional #4386
- Improve / Cleanup DataFusion CI #3045
- More frequent DataFusion releases to crates.io (discussion) #2327
Fixed bugs:
- UPDATE statment for non existent column doesn't error out #5068
- Limit doesn't drop on first batch when limit size == fetch size. #5064
- Performance regressions since DataFusion 15.x #5060
- Quoted schema and table names result in double-quoted names in logical plan. #5058
- Homebrew release script has the amount of arguments being incorrect #5043
- CI Failing with Out of Disk #5040
- Doc links to LogicalPlan in the core package need updating. #5036
- explain analyze can not see csvexec execution time metrics #5014
- AVG(nulls) returns 0 rather than NULL #5007
- Invalid Placeholders return internal error (rather than Plan error) #5005
- select * from csv error #4996
- Incorrect nested error wrapped to
ArrowError:External
variant for joins #4981
Documentation updates:
- MINOR: Add Substrait to feature list in README #4955 (andygrove)
- Minor: comma engineering in Readme #4954 (alamb)
- Update main DataFusion README #4903 (alamb)
- Docs: Add known user - Kamu #4899 (sergiimk)
Closed issues:
- Support sub directories in sqllogictest runner #4709
- Bug displaying fractional seconds in
IntervalMonthDayNano
#4220
Merged pull requests:
- Add
release-crates.sh
script #5070 (iajoiner) - Validate assignment target column existence for UPDATE statements #5069 [sql] (gruuya)
- Fix limit when size of batch to poll == skip/fetch value #5066 (Dandandan)
- Fix CREATE SCHEMA schema name double quoting issue. #5059 [sql] (neumark)
- Minor: Move some aggregate error tests to sqllogictests #5055 (alamb)
- Add decimal support to substrait serde #5054 (andygrove)
- Retain schema order in projection #5053 [sql] (avantgardnerio)
- Improve join type support in substrait #5051 (andygrove)
- [Substrait] ReadRel. Get column names from TableScan source #5050 (andygrove)
- Ensure insert projections are of correct type #5049 [sql] (avantgardnerio)
- Remove unnecessary pyo3 dependency from datafusion crate #5048 (tustvold)
- Cleanup CI (#5040) #5047 (tustvold)
- Fix homebrew publish script #5044 (iajoiner)
- Update docs links to logical plans module. #5037 (vincev)
- [sqllogictest] Read subdirectories in
test_files
#5033 (melgenek) - minor: Fix docs for create_default_catalog_and_schema #5032 (alamb)
- Remove python based posgres comparsion
integration-test
#5031 (alamb) - [sqllogictest] Create empty tables #5026 [sql] (melgenek)
- Simplify the
PushDownLimit
. #5021 (HaoYang670) - [BugFix] fix explain csv/json/avro exec can not see metrics bug #5018 (xiaoyong-z)
- Check placeholder __timeTo and return Datafusion::Plan error #5017 [sql] (matthias-Q)
- [sqllogictets] Remove postgres container orchestration #5015 (alamb)
- Sqllogictest: use the same normalization for all tests #5013 (melgenek)
- Minor: Remove invalid comments #5012 [sql] (xudong963)
- AVG(null) is NULL (not zero) #5008 (alamb)
- Minor: improve internal error message #5006 (alamb)
- Support for bounded execution when window frame involves UNBOUNDED PRECEDING #5003 (mustafasrepo)
- Bump sqllogictest to v0.11.1 #5002 (xudong963)
- Minor: Document how to create
ListingTables
#5001 (alamb) - [Enhancement] early check table exist before create #4998 (xiaoyong-z)
- [Feature] support describe file #4995 [sql] (xiaoyong-z)
- Implement
std::error::Error::source()
forDataFusionError
, makeDataFusionError::find_root
more generic #4992 (alamb) - Add support for linear range calculation in WINDOW functions #4989 (mustafasrepo)
- re-export substrait crate #4988 (jdye64)
- minor: Update data type support documentation #4984 (alamb)
- fix(4981): incorrect error wrapping in
OnceFut
#4983 (DDtKey) - Infer values for inserts #4977 [sql] (avantgardnerio)
- Simplify GroupByHash implementation (to prepare for more work) #4972 (alamb)
- Add DataFusionError::Substrait variant to DataFusionError enum #4971 (jdye64)
- refactor: display input partitions for
RepartitionExec
#4969 (crepererum) - Upgrade to Substrait 0.4.0 #4966 (mbrobbel)
- Expose
sql_to_statement
andstatement_to_plan
onSessionState
#4958 (avantgardnerio) - Minor: Make messages consistent for LogicalPlan::Dml #4953 [sql] (alamb)
- Do not resort inputs to
UnionExec
if they are already sorted #4946 (alamb) - Minor: Reduce even more redundancy creating window_agg in sort_enforcement tests #4945 (alamb)
- Only add outer filter once when transforming exists/in subquery to join #4944 (ygf11)
- fix:
FieldNotFound
error message without valid fields #4942 [sql] (DDtKey) - Propagate planning error back to user #4940 (fsdvh)
- Make it able to specify a session id for SessionState #4933 (yahoNanJing)
- SUPPORT SEMI/ANTI JOIN SQL syntax in DataFusion #4932 [sql] (mingmwang)
- Support gs:// as GCS schema #4930 (jychen7)
- Upgrade object_store from 0.5.0 to 0.5.3 #4929 (jychen7)
- Reduce redundancy in sort_enforcement tests #4928 (alamb)
- Update to arrow 31 #4927 [sql] (tustvold)
- Unify Row hash and hash implementation #4924 (mustafasrepo)
- Support join-filter pushdown for semi/anti join #4923 (ygf11)
- Minor add ticket link to broken test #4919 (alamb)
- Improve documentation for ExprVisitor, port simple uses to new walking function #4916 (alamb)
- Add substrait label to PRs #4915 (alamb)
- Executing ProjectionExec with no column should not return an Err #4912 (viirya)
- Refactor:
Add LogicalPlan::observe_expressions
to walk expressions #4906 (alamb) - Minor: Port information schema tests to sqllogictest #4905 (alamb)
- Add insert/update/delete to LogicalPlan and add SQL planner support #4902 [sql] (avantgardnerio)
- fix: Visit subqueries in
Expr::Alias
#4900 (askoa) - [Substrait] Change API to return LogicalPlan instead of DataFrame #4896 (andygrove)
- Upgrade to substrait 0.3 #4895 (andygrove)
- Add datafusion-substrait crate to workspace #4893 (andygrove)
- refactor and add simple function to deserialize and serialize proto b… #4892 (jdye64)
- Update
optimize_children
to returnResult<Option<LogicalPlan>>
#4888 (HaoYang670) - Do not repartition inputs whose sort order is required #4885 (alamb)
- Minor: Add docstrings to UnionExec #4884 (alamb)
- Update datafusion-substrait crate to build against repo version of DataFusion #4879 (andygrove)
- Fix column indices in EnforceDistribution optimizer in Partial AggregateMode #4878 (jonmmease)
- refactor: improve repartition buffering #4867 (crepererum)
- Rewrite coerce_plan_expr_for_schema to fix union type coercion #4862 (ygf11)
- (#4462) Postgres compatibility tests using sqllogictest #4834 (melgenek)
- Support non-tuple expression for in-subquery to join #4826 (ygf11)
- Update to arrow
30.0.1
#4818 [sql] (tustvold) - Refine the statistics estimation for the limit and aggregate operator #4716 (yahoNanJing)
- Infer prepared statement parameter types #4701 [sql] (avantgardnerio)
- Add datafusion-substrait crate #4543 (andygrove)
- Refactor loser tree code in SortPreservingMerge per PR comments #4407 (alamb)