6.0.0 (2021-11-13)

Full Changelog

Breaking changes:

Removed deprecated with_concurrency #1200 (rdettai)
File partitioning for ListingTable #1141 (rdettai)
Add function volatility to Signature #1071 [sql] (pjmore)
fix: allow duplicate field names in table join, fix output with duplicated names #1023 (houqp)
Make TableProvider.scan() and PhysicalPlanner::create_physical_plan() async #1013 (rdettai)
Reorganize table providers by table format #1010 (rdettai)
Make Metrics::labels() public #999 (alamb)
Rename NthValue::{first_value,last_value,nth_value} to satisfy clippy in Rust 1.55 #986 (alamb)
Move CBOs and Statistics to physical plan #965 (rdettai)
Update to sqlparser v 0.10.0 #934 [sql] (alamb)
FilePartition and PartitionedFile for scanning flexibility #932 [sql] (yjshen)
Improve SQLMetric APIs, port existing metrics #908 (alamb)
Add support for EXPLAIN ANALYZE #858 [sql] (alamb)
Rename concurrency to target_partitions #706 (andygrove)

Implemented enhancements:

Add booleans support to the CASE statement #1156
Implement General Purpose Constant Folding with the Expression Evaluator #1070
Mark volatility categories of functions #1069
Add "show" support to DataFrame API #937
Add support for TRIM BOTH/LEADING/TRAILING #935
Add "baseline" metrics to all built in operators #866
Add SQL support for referencing fields in structs #119
add filename completer for create table statement #1278 (Jimexist)
Add drop table support #1266 [sql] (viirya)
Dataframe supports except and update readme #1261 (xudong963)
Implement EXCEPT & EXCEPT DISTINCT #1259 [sql] (xudong963)
Add DataFrame support for INTERSECT and update readme #1258 (xudong963)
use arrow 6.1.0 #1255 (Jimexist)
fix 1250, add editor support for datafusion cli with validation #1251 (Jimexist)
Add support for create table as via MemTable #1243 [sql] (Dandandan)
Add cli show columns command to describe tables #1231 (Jimexist)
datafusion-cli to add list table command #1229 (Jimexist)
datafusion cli to handle EoF and interrupt signal #1225 (Jimexist)
add \q as quit command and add ? for help #1224 (Jimexist)
Add algebraic simplifications to constant_folding #1208 (matthewmturner)
Improve GetIndexedFieldExpr adding utf8 key based access for struct v… #1204 [sql] (Igosuki)
Fix between in select query #1202 [sql] (capkurmagati)
Move code to fold Stable functions like now() from Simplifier to ConstEvaluator #1176 (alamb)
DataFrame supports window function #1167 [sql] (xudong963)
add values list expression #1165 [sql] (Jimexist)
Add booleans support to the CASE statement #1161 (xudong963)
Improve error messages when operations are not supported #1158 (alamb)
Generic constant expression evaluation #1153 (alamb)
python lit function to support bool and byte vec #1152 (Jimexist)
[nit] simplify datafusion optimizer module codes #1146 (panarch)
Add ScalarValue support for arbitrary list elements #1142 (jonmmease)
Multiple files per partitions for CSV Avro Json #1138 (rdettai)
Implement INTERSECT & INTERSECT DISTINCT #1135 [sql] (xudong963)
Simplify file struct abstractions #1120 (rdettai)
Implement is [not] distinct from #1117 [sql] (Dandandan)
Clean up spawned task on drop for RepartitionExec, SortPreservingMergeExec, WindowAggExec #1112 (crepererum)
add hyperloglog implementation (add and count) #1095 (Jimexist)
Add ScalarValue::Struct variant #1091 (jonmmease)
add digest(utf8, method) function and refactor all current hash digest functions #1090 (Jimexist)
[crypto] add blake3 algorithm to digest function #1086 (Jimexist)
[crypto] add blake2b and blake2s functions #1081 (Jimexist)
[nit] make schema qualifier error message in field lookup more readable #1079 (Jimexist)
[window function] add percent_rank window function #1077 (Jimexist)
[window function] add cume_dist implementation #1076 (Jimexist)
Add a LogicalPlanBuilder::schema() function #1075 (alamb)
Add support for UNION [DISTINCT] sql #1068 [sql] (xudong963)
fix: fix joins on Float32/Float64 columns bug #1054 (francis-du)
Update sqlparser-rs to 0.11 #1052 [sql] (alamb)
Support querying CSV files without providing the schema #1050 [sql] (xudong963)
remove hard coded partition count in ballista logicalplan deserialization #1044 (xudong963)
feat: add lit_timestamp_nanosecond #1030 (NGA-TRAN)
Ignore metadata on schema merge #1024 (Smurphy000)
add ExecutionConfig.with_optimizer_rules #1022 (seddonm1)
Add baseline execution stats to WindowAggExec and UnionExec, and fixup CoalescePartitionsExec #1018 (alamb)
Derive PartialOrd for Expr #1015 (alamb)
Indexed field access for List #1006 [sql] (Igosuki)
Add metrics for Limit and Projection, and CoalesceBatches #1004 (alamb)
Update DataFusion to arrow 6.0 #984 (alamb)
Implement Display for Expr, improve operator display #971 [sql] (matthewmturner)
Add metrics for FilterExec #960 (alamb)
Change compound column field name rules #952 (waynexia)
ObjectStore API to read from remote storage systems #950 (yjshen)
Add baseline metrics to SortPreservingMergeExec #948 (alamb)
Add support for TRIM LEADING/TRAILING/BOTH syntax #947 [sql] (adsharma)
fixes #933 replace placeholder fmt_as fr ExecutionPlan impls #939 (tiphaineruy)
Add metrics for SortExect + HashAggregateExec #938 (alamb)
Add some additional asserts in utils::from_plan #930 (alamb)
Avro Table Provider #910 [sql] (Igosuki)
Add BaselineMetrics, Timestamp metrics, add for CoalescePartitionsExec, rename output_time -> elapsed_compute #909 (alamb)
add cross join support to ballista #891 (houqp)
Add Ballista support to DataFusion CLI #889 (andygrove)
support like on DictionaryArray #876 (b41sh)
Register table based on known schema without file IO #872 (Dandandan)
Add support for PostgreSQL regex match #870 [sql] (b41sh)
Include planning time in datafusion-cli printing #860 (Dandandan)
Implement basic common subexpression eliminate optimization #792 (waynexia)
Impl ops::Not for expr #763 (Jimexist)

Fixed bugs:

Can not use between in the select list: #1196
ORDER BY does not work with literals: Sort operation is not applicable to scalar value 'foo' #1195
window functions with NULL literals in partition by and order by do not work: Internal("Sort operation is not applicable to scalar value NULL") #1194
Operation name not included in internal errors -- Internal("Data type Boolean not supported for binary operation on dyn arrays") #1157
Physical plan explain UNION query says "ExecutionPlan(PlaceHolder)" #933
Can not use LIKE on DictionaryArray encoded strings #815
physical_plan::repartition::tests::repartition_with_dropping_output_stream failing locally #614
Fix some BuiltinScalarFunction panics with zero arguments #1249 (capkurmagati)
fix: not do boolean folding on NULL and/or expr #1245 (NGA-TRAN)
ignore case of with header row in sql when creating external table #1237 [sql] (lichuan6)
fix: Min/Max aggregation data type should not be dictionary #1235 (NGA-TRAN)
Fix build with --no-default-features #1219 (alamb)
Prevent "future cannot be sent between threads safely" compilation error #1155 (jonmmease)
Clean up spawned task on drop for AnalyzeExec, CoalescePartitionsExec, HashAggregateExec #1121 (crepererum)
Clean up spawned task on SortStream drop #1105 (crepererum)
fix UNION ALL bug: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', ./src/datatypes/schema.rs:165:10 #1088 (xudong963)
python: fix generated table name in dataframe creation #1078 (houqp)
fix subquery alias #1067 [sql] (xudong963)
fix pattern handling in regexp_match function #1065 (houqp)
fix: joins on Timestamp columns #1055 (francis-du)
Fix metric name typo #943 (alamb)
EXPLAIN ANALYZE should run all Optimizer passes #929 (alamb)

Documentation updates:

update docs to fix DataFusion User Guide link #1238 (jiangzhx)
[docs] datafusion cli run via homebrew #1198 (Jimexist)
add support for unary and binary values in values list, update docs #1172 [sql] (Jimexist)
Add additional docstring comments to from_plan #1168 (alamb)
[nit] fix document issue for approx_distinct #1110 (Jimexist)
implement approx_distinct function using HyperLogLog #1087 (Jimexist)
Remove unused use statements from examples #1032 (alamb)
consolidate datafusion docs with sphinx #993 (houqp)
Updated user-guide library docs with optimized config #976 (matthewmturner)
Improve User Guide #954 (andygrove)
[MINOR] Fix typos in doc comments #945 (alamb)
[DataFusion] - Add show and show_limit function for DataFrame #923 (francis-du)
Typo fix in DataFusion crate documentation #914 (antoinewdg)

Performance improvements:

Improve avro reader performance by avoiding some cloning on avro_rs::Value #1206 (Igosuki)
optimize build profile for datafusion python binding, cli and ballista #1137 (houqp)
Avoid stack overflow by reducing stack usage of BinaryExpr::evaluate in debug builds #1047 (alamb)
Add ScalarValue::eq_array optimized comparison function #844 (alamb)
Rework GroupByHash to for faster performance and support grouping by nulls #808 (alamb)

Closed issues:

InList expr with NULL literals do not work #1190
update the homepage README to include values, approx_distinct, etc. #1171
[Python]: Inconsistencies with Python package name #1011
Wanting to contribute to project where to start? #983
delete redundant code #973
How to build DataFusion python wheel #853
Add support for partition pruning #204
[Datafusion] Support joins on TimestampMillisecond columns #187
TPC-H Query 21 #173
TPC-H Query 13 #164
TPC-H Query 8 #162
implement split_part(string, delimiter, position) #157
Join Statement: Schema contains duplicate unqualified field name #155
ParquetTable should avoid scanning all files twice #136
Add support for reading partitioned Parquet files #133
Add support for Parquet schema merging #132
Catalog abstraction #126
Optimizer rules should work with qualified column names #125
Add optional qualifier to Expr::Column #121
Implement modulus expression #99
[Rust] Add constant folding to expressions during logically planning #98
[Rust] Implement pretty print for physical query plan #93
Can not group by boolean columns (add boolean to valid keys of groupBy) #91
improve performance of building literal arrays #90
[rust][datafusion] optimize count(*) queries on parquet sources #89
Produce a design for a metrics framework #21

Merged pull requests:

Add timezome string to stablize test #1265 (viirya)
numerical_coercion pattern match optimize #1256 (Jimexist)
fix and update window function sql tests #1059 (Jimexist)
reduce ScalarValue from trait boilerplate with macro #989 (houqp)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6.0.0.md

6.0.0.md

6.0.0 (2021-11-13)

Files

6.0.0.md

Latest commit

History

6.0.0.md

File metadata and controls

6.0.0 (2021-11-13)