|
| 1 | +<!-- |
| 2 | +Licensed to the Apache Software Foundation (ASF) under one |
| 3 | +or more contributor license agreements. See the NOTICE file |
| 4 | +distributed with this work for additional information |
| 5 | +regarding copyright ownership. The ASF licenses this file |
| 6 | +to you under the Apache License, Version 2.0 (the |
| 7 | +"License"); you may not use this file except in compliance |
| 8 | +with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | +Unless required by applicable law or agreed to in writing, |
| 13 | +software distributed under the License is distributed on an |
| 14 | +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | +KIND, either express or implied. See the License for the |
| 16 | +specific language governing permissions and limitations |
| 17 | +under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# DataFusion Comet 0.5.0 Changelog |
| 21 | + |
| 22 | +This release consists of 69 commits from 15 contributors. See credits at the end of this changelog for more information. |
| 23 | + |
| 24 | +**Fixed bugs:** |
| 25 | + |
| 26 | +- fix: Unsigned type related bugs [#1095](https://github.com/apache/datafusion-comet/pull/1095) (kazuyukitanimura) |
| 27 | +- fix: Use RDD partition index [#1112](https://github.com/apache/datafusion-comet/pull/1112) (viirya) |
| 28 | +- fix: Various metrics bug fixes and improvements [#1111](https://github.com/apache/datafusion-comet/pull/1111) (andygrove) |
| 29 | +- fix: Don't create CometScanExec for subclasses of ParquetFileFormat [#1129](https://github.com/apache/datafusion-comet/pull/1129) (Kimahriman) |
| 30 | +- fix: Fix metrics regressions [#1132](https://github.com/apache/datafusion-comet/pull/1132) (andygrove) |
| 31 | +- fix: Enable scenarios accidentally commented out in CometExecBenchmark [#1151](https://github.com/apache/datafusion-comet/pull/1151) (mbutrovich) |
| 32 | +- fix: Spark 4.0-preview1 SPARK-47120 [#1156](https://github.com/apache/datafusion-comet/pull/1156) (kazuyukitanimura) |
| 33 | +- fix: Document enabling comet explain plan usage in Spark (4.0) [#1176](https://github.com/apache/datafusion-comet/pull/1176) (parthchandra) |
| 34 | +- fix: stddev_pop should not directly return 0.0 when count is 1.0 [#1184](https://github.com/apache/datafusion-comet/pull/1184) (viirya) |
| 35 | +- fix: fix missing explanation for then branch in case when [#1200](https://github.com/apache/datafusion-comet/pull/1200) (rluvaton) |
| 36 | +- fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [#1253](https://github.com/apache/datafusion-comet/pull/1253) (andygrove) |
| 37 | +- fix: Fall back to Spark for distinct aggregates [#1262](https://github.com/apache/datafusion-comet/pull/1262) (andygrove) |
| 38 | +- fix: disable initCap by default [#1276](https://github.com/apache/datafusion-comet/pull/1276) (kazuyukitanimura) |
| 39 | + |
| 40 | +**Performance related:** |
| 41 | + |
| 42 | +- perf: Stop passing Java config map into native createPlan [#1101](https://github.com/apache/datafusion-comet/pull/1101) (andygrove) |
| 43 | +- feat: Make native shuffle compression configurable and respect `spark.shuffle.compress` [#1185](https://github.com/apache/datafusion-comet/pull/1185) (andygrove) |
| 44 | +- perf: Improve query planning to more reliably fall back to columnar shuffle when native shuffle is not supported [#1209](https://github.com/apache/datafusion-comet/pull/1209) (andygrove) |
| 45 | +- feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support [#1192](https://github.com/apache/datafusion-comet/pull/1192) (andygrove) |
| 46 | +- feat: Implement custom RecordBatch serde for shuffle for improved performance [#1190](https://github.com/apache/datafusion-comet/pull/1190) (andygrove) |
| 47 | + |
| 48 | +**Implemented enhancements:** |
| 49 | + |
| 50 | +- feat: support array_insert [#1073](https://github.com/apache/datafusion-comet/pull/1073) (SemyonSinchenko) |
| 51 | +- feat: enable decimal to decimal cast of different precision and scale [#1086](https://github.com/apache/datafusion-comet/pull/1086) (himadripal) |
| 52 | +- feat: Improve ScanExec native metrics [#1133](https://github.com/apache/datafusion-comet/pull/1133) (andygrove) |
| 53 | +- feat: Add Spark-compatible implementation of SchemaAdapterFactory [#1169](https://github.com/apache/datafusion-comet/pull/1169) (andygrove) |
| 54 | +- feat: Improve shuffle metrics (second attempt) [#1175](https://github.com/apache/datafusion-comet/pull/1175) (andygrove) |
| 55 | +- feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups. [#1021](https://github.com/apache/datafusion-comet/pull/1021) (Kontinuation) |
| 56 | +- feat: Reenable tests for filtered SMJ anti join [#1211](https://github.com/apache/datafusion-comet/pull/1211) (comphead) |
| 57 | +- feat: add support for array_remove expression [#1179](https://github.com/apache/datafusion-comet/pull/1179) (jatin510) |
| 58 | + |
| 59 | +**Documentation updates:** |
| 60 | + |
| 61 | +- docs: Update documentation for 0.4.0 release [#1096](https://github.com/apache/datafusion-comet/pull/1096) (andygrove) |
| 62 | +- docs: Fix readme typo FGPA -> FPGA [#1117](https://github.com/apache/datafusion-comet/pull/1117) (gstvg) |
| 63 | +- docs: Add more technical detail and new diagram to Comet plugin overview [#1119](https://github.com/apache/datafusion-comet/pull/1119) (andygrove) |
| 64 | +- docs: Add some documentation explaining how shuffle works [#1148](https://github.com/apache/datafusion-comet/pull/1148) (andygrove) |
| 65 | +- docs: Update TPC-H benchmark results [#1257](https://github.com/apache/datafusion-comet/pull/1257) (andygrove) |
| 66 | + |
| 67 | +**Other:** |
| 68 | + |
| 69 | +- chore: Add changelog for 0.4.0 [#1089](https://github.com/apache/datafusion-comet/pull/1089) (andygrove) |
| 70 | +- chore: Prepare for 0.5.0 development [#1090](https://github.com/apache/datafusion-comet/pull/1090) (andygrove) |
| 71 | +- build: Skip installation of spark-integration and fuzz testing modules [#1091](https://github.com/apache/datafusion-comet/pull/1091) (parthchandra) |
| 72 | +- minor: Add hint for finding the GPG key to use when publishing to maven [#1093](https://github.com/apache/datafusion-comet/pull/1093) (andygrove) |
| 73 | +- chore: Include first ScanExec batch in metrics [#1105](https://github.com/apache/datafusion-comet/pull/1105) (andygrove) |
| 74 | +- chore: Improve CometScan metrics [#1100](https://github.com/apache/datafusion-comet/pull/1100) (andygrove) |
| 75 | +- chore: Add custom metric for native shuffle fetching batches from JVM [#1108](https://github.com/apache/datafusion-comet/pull/1108) (andygrove) |
| 76 | +- chore: Remove unused StringView struct [#1143](https://github.com/apache/datafusion-comet/pull/1143) (andygrove) |
| 77 | +- test: enable more Spark 4.0 tests [#1145](https://github.com/apache/datafusion-comet/pull/1145) (kazuyukitanimura) |
| 78 | +- chore: Refactor cast to use SparkCastOptions param [#1146](https://github.com/apache/datafusion-comet/pull/1146) (andygrove) |
| 79 | +- chore: Move more expressions from core crate to spark-expr crate [#1152](https://github.com/apache/datafusion-comet/pull/1152) (andygrove) |
| 80 | +- chore: Remove dead code [#1155](https://github.com/apache/datafusion-comet/pull/1155) (andygrove) |
| 81 | +- chore: Move string kernels and expressions to spark-expr crate [#1164](https://github.com/apache/datafusion-comet/pull/1164) (andygrove) |
| 82 | +- chore: Move remaining expressions to spark-expr crate + some minor refactoring [#1165](https://github.com/apache/datafusion-comet/pull/1165) (andygrove) |
| 83 | +- chore: Add ignored tests for reading complex types from Parquet [#1167](https://github.com/apache/datafusion-comet/pull/1167) (andygrove) |
| 84 | +- test: enabling Spark tests with offHeap requirement [#1177](https://github.com/apache/datafusion-comet/pull/1177) (kazuyukitanimura) |
| 85 | +- minor: move shuffle classes from common to spark [#1193](https://github.com/apache/datafusion-comet/pull/1193) (andygrove) |
| 86 | +- minor: refactor to move decodeBatches to broadcast exchange code as private function [#1195](https://github.com/apache/datafusion-comet/pull/1195) (andygrove) |
| 87 | +- minor: refactor prepare_output so that it does not require an ExecutionContext [#1194](https://github.com/apache/datafusion-comet/pull/1194) (andygrove) |
| 88 | +- minor: remove unused source files [#1202](https://github.com/apache/datafusion-comet/pull/1202) (andygrove) |
| 89 | +- chore: Upgrade to DataFusion 44.0.0-rc2 [#1154](https://github.com/apache/datafusion-comet/pull/1154) (andygrove) |
| 90 | +- chore: Add safety check to CometBuffer [#1050](https://github.com/apache/datafusion-comet/pull/1050) (viirya) |
| 91 | +- chore: Remove unreachable code [#1213](https://github.com/apache/datafusion-comet/pull/1213) (andygrove) |
| 92 | +- test: Enable Comet by default except some tests in SparkSessionExtensionSuite [#1201](https://github.com/apache/datafusion-comet/pull/1201) (kazuyukitanimura) |
| 93 | +- chore: extract `struct` expressions to folders based on spark grouping [#1216](https://github.com/apache/datafusion-comet/pull/1216) (rluvaton) |
| 94 | +- chore: extract static invoke expressions to folders based on spark grouping [#1217](https://github.com/apache/datafusion-comet/pull/1217) (rluvaton) |
| 95 | +- chore: Follow-on PR to fully enable onheap memory usage [#1210](https://github.com/apache/datafusion-comet/pull/1210) (andygrove) |
| 96 | +- chore: extract agg_funcs expressions to folders based on spark grouping [#1224](https://github.com/apache/datafusion-comet/pull/1224) (rluvaton) |
| 97 | +- chore: extract datetime_funcs expressions to folders based on spark grouping [#1222](https://github.com/apache/datafusion-comet/pull/1222) (rluvaton) |
| 98 | +- chore: Upgrade to DataFusion 44.0.0 from 44.0.0 RC2 [#1232](https://github.com/apache/datafusion-comet/pull/1232) (rluvaton) |
| 99 | +- chore: extract strings file to `strings_func` like in spark grouping [#1215](https://github.com/apache/datafusion-comet/pull/1215) (rluvaton) |
| 100 | +- chore: extract predicate_functions expressions to folders based on spark grouping [#1218](https://github.com/apache/datafusion-comet/pull/1218) (rluvaton) |
| 101 | +- build(deps): bump protobuf version to 3.21.12 [#1234](https://github.com/apache/datafusion-comet/pull/1234) (wForget) |
| 102 | +- chore: extract json_funcs expressions to folders based on spark grouping [#1220](https://github.com/apache/datafusion-comet/pull/1220) (rluvaton) |
| 103 | +- test: Enable shuffle by default in Spark tests [#1240](https://github.com/apache/datafusion-comet/pull/1240) (kazuyukitanimura) |
| 104 | +- chore: extract hash_funcs expressions to folders based on spark grouping [#1221](https://github.com/apache/datafusion-comet/pull/1221) (rluvaton) |
| 105 | +- build: Fix test failure caused by merging conflicting PRs [#1259](https://github.com/apache/datafusion-comet/pull/1259) (andygrove) |
| 106 | + |
| 107 | +## Credits |
| 108 | + |
| 109 | +Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor. |
| 110 | + |
| 111 | +``` |
| 112 | + 37 Andy Grove |
| 113 | + 10 Raz Luvaton |
| 114 | + 7 KAZUYUKI TANIMURA |
| 115 | + 3 Liang-Chi Hsieh |
| 116 | + 2 Parth Chandra |
| 117 | + 1 Adam Binford |
| 118 | + 1 Dharan Aditya |
| 119 | + 1 Himadri Pal |
| 120 | + 1 Jagdish Parihar |
| 121 | + 1 Kristin Cowalcijk |
| 122 | + 1 Matt Butrovich |
| 123 | + 1 Oleks V |
| 124 | + 1 Sem |
| 125 | + 1 Zhen Wang |
| 126 | + 1 gstvg |
| 127 | +``` |
| 128 | + |
| 129 | +Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release. |
0 commit comments