6.0.0 (2021-11-13)
Breaking changes:
- Removed deprecated with_concurrency #1200 (rdettai)
- File partitioning for ListingTable #1141 (rdettai)
- Add function volatility to Signature #1071 [sql] (pjmore)
- fix: allow duplicate field names in table join, fix output with duplicated names #1023 (houqp)
- Make TableProvider.scan() and PhysicalPlanner::create_physical_plan() async #1013 (rdettai)
- Reorganize table providers by table format #1010 (rdettai)
- Make Metrics::labels() public #999 (alamb)
- Rename NthValue::{first_value,last_value,nth_value} to satisfy clippy in Rust 1.55 #986 (alamb)
- Move CBOs and Statistics to physical plan #965 (rdettai)
- Update to sqlparser v 0.10.0 #934 [sql] (alamb)
- FilePartition and PartitionedFile for scanning flexibility #932 [sql] (yjshen)
- Improve SQLMetric APIs, port existing metrics #908 (alamb)
- Add support for EXPLAIN ANALYZE #858 [sql] (alamb)
- Rename concurrency to target_partitions #706 (andygrove)
Implemented enhancements:
- Add booleans support to the
CASE
statement #1156 - Implement General Purpose Constant Folding with the Expression Evaluator #1070
- Mark volatility categories of functions #1069
- Add "show" support to DataFrame API #937
- Add support for TRIM BOTH/LEADING/TRAILING #935
- Add "baseline" metrics to all built in operators #866
- Add SQL support for referencing fields in structs #119
- add filename completer for create table statement #1278 (Jimexist)
- Add drop table support #1266 [sql] (viirya)
- Dataframe supports except and update readme #1261 (xudong963)
- Implement EXCEPT & EXCEPT DISTINCT #1259 [sql] (xudong963)
- Add DataFrame support for
INTERSECT
and update readme #1258 (xudong963) - use arrow 6.1.0 #1255 (Jimexist)
- fix 1250, add editor support for datafusion cli with validation #1251 (Jimexist)
- Add support for
create table as
via MemTable #1243 [sql] (Dandandan) - Add cli show columns command to describe tables #1231 (Jimexist)
- datafusion-cli to add list table command #1229 (Jimexist)
- datafusion cli to handle EoF and interrupt signal #1225 (Jimexist)
- add \q as quit command and add ? for help #1224 (Jimexist)
- Add algebraic simplifications to constant_folding #1208 (matthewmturner)
- Improve GetIndexedFieldExpr adding utf8 key based access for struct v… #1204 [sql] (Igosuki)
- Fix
between
in select query #1202 [sql] (capkurmagati) - Move code to fold Stable functions like
now()
fromSimplifier
toConstEvaluator
#1176 (alamb) - DataFrame supports window function #1167 [sql] (xudong963)
- add values list expression #1165 [sql] (Jimexist)
- Add booleans support to the CASE statement #1161 (xudong963)
- Improve error messages when operations are not supported #1158 (alamb)
- Generic constant expression evaluation #1153 (alamb)
- python
lit
function to support bool and byte vec #1152 (Jimexist) - [nit] simplify datafusion optimizer module codes #1146 (panarch)
- Add ScalarValue support for arbitrary list elements #1142 (jonmmease)
- Multiple files per partitions for CSV Avro Json #1138 (rdettai)
- Implement INTERSECT & INTERSECT DISTINCT #1135 [sql] (xudong963)
- Simplify file struct abstractions #1120 (rdettai)
- Implement
is [not] distinct from
#1117 [sql] (Dandandan) - Clean up spawned task on drop for
RepartitionExec
,SortPreservingMergeExec
,WindowAggExec
#1112 (crepererum) - add hyperloglog implementation (
add
andcount
) #1095 (Jimexist) - Add ScalarValue::Struct variant #1091 (jonmmease)
- add digest(utf8, method) function and refactor all current hash digest functions #1090 (Jimexist)
- [crypto] add
blake3
algorithm todigest
function #1086 (Jimexist) - [crypto] add blake2b and blake2s functions #1081 (Jimexist)
- [nit] make schema qualifier error message in field lookup more readable #1079 (Jimexist)
- [window function] add
percent_rank
window function #1077 (Jimexist) - [window function] add
cume_dist
implementation #1076 (Jimexist) - Add a LogicalPlanBuilder::schema() function #1075 (alamb)
- Add support for UNION [DISTINCT] sql #1068 [sql] (xudong963)
- fix: fix joins on Float32/Float64 columns bug #1054 (francis-du)
- Update sqlparser-rs to 0.11 #1052 [sql] (alamb)
- Support querying CSV files without providing the schema #1050 [sql] (xudong963)
- remove hard coded partition count in ballista logicalplan deserialization #1044 (xudong963)
- feat: add lit_timestamp_nanosecond #1030 (NGA-TRAN)
- Ignore metadata on schema merge #1024 (Smurphy000)
- add ExecutionConfig.with_optimizer_rules #1022 (seddonm1)
- Add baseline execution stats to
WindowAggExec
andUnionExec
, and fixupCoalescePartitionsExec
#1018 (alamb) - Derive PartialOrd for Expr #1015 (alamb)
- Indexed field access for List #1006 [sql] (Igosuki)
- Add metrics for Limit and Projection, and CoalesceBatches #1004 (alamb)
- Update DataFusion to arrow 6.0 #984 (alamb)
- Implement Display for Expr, improve operator display #971 [sql] (matthewmturner)
- Add metrics for FilterExec #960 (alamb)
- Change compound column field name rules #952 (waynexia)
- ObjectStore API to read from remote storage systems #950 (yjshen)
- Add baseline metrics to
SortPreservingMergeExec
#948 (alamb) - Add support for TRIM LEADING/TRAILING/BOTH syntax #947 [sql] (adsharma)
- fixes #933 replace placeholder fmt_as fr ExecutionPlan impls #939 (tiphaineruy)
- Add metrics for SortExect + HashAggregateExec #938 (alamb)
- Add some additional asserts in
utils::from_plan
#930 (alamb) - Avro Table Provider #910 [sql] (Igosuki)
- Add BaselineMetrics, Timestamp metrics, add for
CoalescePartitionsExec
, rename output_time -> elapsed_compute #909 (alamb) - add cross join support to ballista #891 (houqp)
- Add Ballista support to DataFusion CLI #889 (andygrove)
- support like on DictionaryArray #876 (b41sh)
- Register table based on known schema without file IO #872 (Dandandan)
- Add support for PostgreSQL regex match #870 [sql] (b41sh)
- Include planning time in datafusion-cli printing #860 (Dandandan)
- Implement basic common subexpression eliminate optimization #792 (waynexia)
- Impl
ops::Not
forexpr
#763 (Jimexist)
Fixed bugs:
- Can not use
between
in the select list: #1196 - ORDER BY does not work with literals: Sort operation is not applicable to scalar value 'foo' #1195
- window functions with NULL literals in
partition by
andorder by
do not work: Internal("Sort operation is not applicable to scalar value NULL") #1194 - Operation name not included in internal errors -- Internal("Data type Boolean not supported for binary operation on dyn arrays") #1157
- Physical plan explain UNION query says "ExecutionPlan(PlaceHolder)" #933
- Can not use LIKE on DictionaryArray encoded strings #815
- physical_plan::repartition::tests::repartition_with_dropping_output_stream failing locally #614
- Fix some
BuiltinScalarFunction
panics with zero arguments #1249 (capkurmagati) - fix: not do boolean folding on NULL and/or expr #1245 (NGA-TRAN)
- ignore case of
with header row
in sql when creating external table #1237 [sql] (lichuan6) - fix: Min/Max aggregation data type should not be dictionary #1235 (NGA-TRAN)
- Fix build with
--no-default-features
#1219 (alamb) - Prevent "future cannot be sent between threads safely" compilation error #1155 (jonmmease)
- Clean up spawned task on drop for
AnalyzeExec
,CoalescePartitionsExec
,HashAggregateExec
#1121 (crepererum) - Clean up spawned task on
SortStream
drop #1105 (crepererum) - fix UNION ALL bug: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', ./src/datatypes/schema.rs:165:10 #1088 (xudong963)
- python: fix generated table name in dataframe creation #1078 (houqp)
- fix subquery alias #1067 [sql] (xudong963)
- fix pattern handling in regexp_match function #1065 (houqp)
- fix: joins on Timestamp columns #1055 (francis-du)
- Fix metric name typo #943 (alamb)
- EXPLAIN ANALYZE should run all Optimizer passes #929 (alamb)
Documentation updates:
- update docs to fix DataFusion User Guide link #1238 (jiangzhx)
- [docs] datafusion cli run via homebrew #1198 (Jimexist)
- add support for unary and binary values in values list, update docs #1172 [sql] (Jimexist)
- Add additional docstring comments to
from_plan
#1168 (alamb) - [nit] fix document issue for
approx_distinct
#1110 (Jimexist) - implement
approx_distinct
function using HyperLogLog #1087 (Jimexist) - Remove unused
use
statements from examples #1032 (alamb) - consolidate datafusion docs with sphinx #993 (houqp)
- Updated user-guide library docs with optimized config #976 (matthewmturner)
- Improve User Guide #954 (andygrove)
- [MINOR] Fix typos in doc comments #945 (alamb)
- [DataFusion] - Add show and show_limit function for DataFrame #923 (francis-du)
- Typo fix in DataFusion crate documentation #914 (antoinewdg)
Performance improvements:
- Improve avro reader performance by avoiding some cloning on avro_rs::Value #1206 (Igosuki)
- optimize build profile for datafusion python binding, cli and ballista #1137 (houqp)
- Avoid stack overflow by reducing stack usage of
BinaryExpr::evaluate
in debug builds #1047 (alamb) - Add ScalarValue::eq_array optimized comparison function #844 (alamb)
- Rework GroupByHash to for faster performance and support grouping by nulls #808 (alamb)
Closed issues:
- InList expr with NULL literals do not work #1190
- update the homepage README to include values,
approx_distinct
, etc. #1171 - [Python]: Inconsistencies with Python package name #1011
- Wanting to contribute to project where to start? #983
- delete redundant code #973
- How to build DataFusion python wheel #853
- Add support for partition pruning #204
- [Datafusion] Support joins on TimestampMillisecond columns #187
- TPC-H Query 21 #173
- TPC-H Query 13 #164
- TPC-H Query 8 #162
- implement split_part(string, delimiter, position) #157
- Join Statement: Schema contains duplicate unqualified field name #155
- ParquetTable should avoid scanning all files twice #136
- Add support for reading partitioned Parquet files #133
- Add support for Parquet schema merging #132
- Catalog abstraction #126
- Optimizer rules should work with qualified column names #125
- Add optional qualifier to Expr::Column #121
- Implement modulus expression #99
- [Rust] Add constant folding to expressions during logically planning #98
- [Rust] Implement pretty print for physical query plan #93
- Can not group by boolean columns (add boolean to valid keys of groupBy) #91
- improve performance of building literal arrays #90
- [rust][datafusion] optimize count(*) queries on parquet sources #89
- Produce a design for a metrics framework #21
Merged pull requests: