Skip to content

Latest commit

 

History

History
204 lines (186 loc) · 25.5 KB

6.0.0.md

File metadata and controls

204 lines (186 loc) · 25.5 KB

6.0.0 (2021-11-13)

Full Changelog

Breaking changes:

  • Removed deprecated with_concurrency #1200 (rdettai)
  • File partitioning for ListingTable #1141 (rdettai)
  • Add function volatility to Signature #1071 [sql] (pjmore)
  • fix: allow duplicate field names in table join, fix output with duplicated names #1023 (houqp)
  • Make TableProvider.scan() and PhysicalPlanner::create_physical_plan() async #1013 (rdettai)
  • Reorganize table providers by table format #1010 (rdettai)
  • Make Metrics::labels() public #999 (alamb)
  • Rename NthValue::{first_value,last_value,nth_value} to satisfy clippy in Rust 1.55 #986 (alamb)
  • Move CBOs and Statistics to physical plan #965 (rdettai)
  • Update to sqlparser v 0.10.0 #934 [sql] (alamb)
  • FilePartition and PartitionedFile for scanning flexibility #932 [sql] (yjshen)
  • Improve SQLMetric APIs, port existing metrics #908 (alamb)
  • Add support for EXPLAIN ANALYZE #858 [sql] (alamb)
  • Rename concurrency to target_partitions #706 (andygrove)

Implemented enhancements:

  • Add booleans support to the CASE statement #1156
  • Implement General Purpose Constant Folding with the Expression Evaluator #1070
  • Mark volatility categories of functions #1069
  • Add "show" support to DataFrame API #937
  • Add support for TRIM BOTH/LEADING/TRAILING #935
  • Add "baseline" metrics to all built in operators #866
  • Add SQL support for referencing fields in structs #119
  • add filename completer for create table statement #1278 (Jimexist)
  • Add drop table support #1266 [sql] (viirya)
  • Dataframe supports except and update readme #1261 (xudong963)
  • Implement EXCEPT & EXCEPT DISTINCT #1259 [sql] (xudong963)
  • Add DataFrame support for INTERSECT and update readme #1258 (xudong963)
  • use arrow 6.1.0 #1255 (Jimexist)
  • fix 1250, add editor support for datafusion cli with validation #1251 (Jimexist)
  • Add support for create table as via MemTable #1243 [sql] (Dandandan)
  • Add cli show columns command to describe tables #1231 (Jimexist)
  • datafusion-cli to add list table command #1229 (Jimexist)
  • datafusion cli to handle EoF and interrupt signal #1225 (Jimexist)
  • add \q as quit command and add ? for help #1224 (Jimexist)
  • Add algebraic simplifications to constant_folding #1208 (matthewmturner)
  • Improve GetIndexedFieldExpr adding utf8 key based access for struct v… #1204 [sql] (Igosuki)
  • Fix between in select query #1202 [sql] (capkurmagati)
  • Move code to fold Stable functions like now() from Simplifier to ConstEvaluator #1176 (alamb)
  • DataFrame supports window function #1167 [sql] (xudong963)
  • add values list expression #1165 [sql] (Jimexist)
  • Add booleans support to the CASE statement #1161 (xudong963)
  • Improve error messages when operations are not supported #1158 (alamb)
  • Generic constant expression evaluation #1153 (alamb)
  • python lit function to support bool and byte vec #1152 (Jimexist)
  • [nit] simplify datafusion optimizer module codes #1146 (panarch)
  • Add ScalarValue support for arbitrary list elements #1142 (jonmmease)
  • Multiple files per partitions for CSV Avro Json #1138 (rdettai)
  • Implement INTERSECT & INTERSECT DISTINCT #1135 [sql] (xudong963)
  • Simplify file struct abstractions #1120 (rdettai)
  • Implement is [not] distinct from #1117 [sql] (Dandandan)
  • Clean up spawned task on drop for RepartitionExec, SortPreservingMergeExec, WindowAggExec #1112 (crepererum)
  • add hyperloglog implementation (add and count) #1095 (Jimexist)
  • Add ScalarValue::Struct variant #1091 (jonmmease)
  • add digest(utf8, method) function and refactor all current hash digest functions #1090 (Jimexist)
  • [crypto] add blake3 algorithm to digest function #1086 (Jimexist)
  • [crypto] add blake2b and blake2s functions #1081 (Jimexist)
  • [nit] make schema qualifier error message in field lookup more readable #1079 (Jimexist)
  • [window function] add percent_rank window function #1077 (Jimexist)
  • [window function] add cume_dist implementation #1076 (Jimexist)
  • Add a LogicalPlanBuilder::schema() function #1075 (alamb)
  • Add support for UNION [DISTINCT] sql #1068 [sql] (xudong963)
  • fix: fix joins on Float32/Float64 columns bug #1054 (francis-du)
  • Update sqlparser-rs to 0.11 #1052 [sql] (alamb)
  • Support querying CSV files without providing the schema #1050 [sql] (xudong963)
  • remove hard coded partition count in ballista logicalplan deserialization #1044 (xudong963)
  • feat: add lit_timestamp_nanosecond #1030 (NGA-TRAN)
  • Ignore metadata on schema merge #1024 (Smurphy000)
  • add ExecutionConfig.with_optimizer_rules #1022 (seddonm1)
  • Add baseline execution stats to WindowAggExec and UnionExec, and fixup CoalescePartitionsExec #1018 (alamb)
  • Derive PartialOrd for Expr #1015 (alamb)
  • Indexed field access for List #1006 [sql] (Igosuki)
  • Add metrics for Limit and Projection, and CoalesceBatches #1004 (alamb)
  • Update DataFusion to arrow 6.0 #984 (alamb)
  • Implement Display for Expr, improve operator display #971 [sql] (matthewmturner)
  • Add metrics for FilterExec #960 (alamb)
  • Change compound column field name rules #952 (waynexia)
  • ObjectStore API to read from remote storage systems #950 (yjshen)
  • Add baseline metrics to SortPreservingMergeExec #948 (alamb)
  • Add support for TRIM LEADING/TRAILING/BOTH syntax #947 [sql] (adsharma)
  • fixes #933 replace placeholder fmt_as fr ExecutionPlan impls #939 (tiphaineruy)
  • Add metrics for SortExect + HashAggregateExec #938 (alamb)
  • Add some additional asserts in utils::from_plan #930 (alamb)
  • Avro Table Provider #910 [sql] (Igosuki)
  • Add BaselineMetrics, Timestamp metrics, add for CoalescePartitionsExec, rename output_time -> elapsed_compute #909 (alamb)
  • add cross join support to ballista #891 (houqp)
  • Add Ballista support to DataFusion CLI #889 (andygrove)
  • support like on DictionaryArray #876 (b41sh)
  • Register table based on known schema without file IO #872 (Dandandan)
  • Add support for PostgreSQL regex match #870 [sql] (b41sh)
  • Include planning time in datafusion-cli printing #860 (Dandandan)
  • Implement basic common subexpression eliminate optimization #792 (waynexia)
  • Impl ops::Not for expr #763 (Jimexist)

Fixed bugs:

  • Can not use between in the select list: #1196
  • ORDER BY does not work with literals: Sort operation is not applicable to scalar value 'foo' #1195
  • window functions with NULL literals in partition by and order by do not work: Internal("Sort operation is not applicable to scalar value NULL") #1194
  • Operation name not included in internal errors -- Internal("Data type Boolean not supported for binary operation on dyn arrays") #1157
  • Physical plan explain UNION query says "ExecutionPlan(PlaceHolder)" #933
  • Can not use LIKE on DictionaryArray encoded strings #815
  • physical_plan::repartition::tests::repartition_with_dropping_output_stream failing locally #614
  • Fix some BuiltinScalarFunction panics with zero arguments #1249 (capkurmagati)
  • fix: not do boolean folding on NULL and/or expr #1245 (NGA-TRAN)
  • ignore case of with header row in sql when creating external table #1237 [sql] (lichuan6)
  • fix: Min/Max aggregation data type should not be dictionary #1235 (NGA-TRAN)
  • Fix build with --no-default-features #1219 (alamb)
  • Prevent "future cannot be sent between threads safely" compilation error #1155 (jonmmease)
  • Clean up spawned task on drop for AnalyzeExec, CoalescePartitionsExec, HashAggregateExec #1121 (crepererum)
  • Clean up spawned task on SortStream drop #1105 (crepererum)
  • fix UNION ALL bug: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', ./src/datatypes/schema.rs:165:10 #1088 (xudong963)
  • python: fix generated table name in dataframe creation #1078 (houqp)
  • fix subquery alias #1067 [sql] (xudong963)
  • fix pattern handling in regexp_match function #1065 (houqp)
  • fix: joins on Timestamp columns #1055 (francis-du)
  • Fix metric name typo #943 (alamb)
  • EXPLAIN ANALYZE should run all Optimizer passes #929 (alamb)

Documentation updates:

Performance improvements:

  • Improve avro reader performance by avoiding some cloning on avro_rs::Value #1206 (Igosuki)
  • optimize build profile for datafusion python binding, cli and ballista #1137 (houqp)
  • Avoid stack overflow by reducing stack usage of BinaryExpr::evaluate in debug builds #1047 (alamb)
  • Add ScalarValue::eq_array optimized comparison function #844 (alamb)
  • Rework GroupByHash to for faster performance and support grouping by nulls #808 (alamb)

Closed issues:

  • InList expr with NULL literals do not work #1190
  • update the homepage README to include values, approx_distinct, etc. #1171
  • [Python]: Inconsistencies with Python package name #1011
  • Wanting to contribute to project where to start? #983
  • delete redundant code #973
  • How to build DataFusion python wheel #853
  • Add support for partition pruning #204
  • [Datafusion] Support joins on TimestampMillisecond columns #187
  • TPC-H Query 21 #173
  • TPC-H Query 13 #164
  • TPC-H Query 8 #162
  • implement split_part(string, delimiter, position) #157
  • Join Statement: Schema contains duplicate unqualified field name #155
  • ParquetTable should avoid scanning all files twice #136
  • Add support for reading partitioned Parquet files #133
  • Add support for Parquet schema merging #132
  • Catalog abstraction #126
  • Optimizer rules should work with qualified column names #125
  • Add optional qualifier to Expr::Column #121
  • Implement modulus expression #99
  • [Rust] Add constant folding to expressions during logically planning #98
  • [Rust] Implement pretty print for physical query plan #93
  • Can not group by boolean columns (add boolean to valid keys of groupBy) #91
  • improve performance of building literal arrays #90
  • [rust][datafusion] optimize count(*) queries on parquet sources #89
  • Produce a design for a metrics framework #21

Merged pull requests:

  • Add timezome string to stablize test #1265 (viirya)
  • numerical_coercion pattern match optimize #1256 (Jimexist)
  • fix and update window function sql tests #1059 (Jimexist)
  • reduce ScalarValue from trait boilerplate with macro #989 (houqp)