Skip to content

Commit c311cf5

Browse files
Rachelintalamb
andauthored
Implement GroupColumn support for StringView / ByteView (faster grouping performance) (#12809)
* define `ByteGroupValueViewBuilder`. * impl append. * impl equal to. * fix compile. * fix comments. * impl take_n. * impl build. * impl rest functions in `GroupColumn`. * fix output when panic. * add e2e sql tests. * add unit tests. * switch to a really elegant style codes from alamb. * fix take_n. * improve comments. * fix compile. * fix clippy. * define more testcases in `test_byte_view_take_n`. * connect up. * fix doc. * Do not re-validate output is utf8 * switch to unchecked when building array. * improve naming. * use let else to make the codes clearer. * fix typo. * improve unit test coverage for `ByteViewGroupValueBuilder`. --------- Co-authored-by: Andrew Lamb <[email protected]>
1 parent e5cdc17 commit c311cf5

File tree

3 files changed

+755
-6
lines changed

3 files changed

+755
-6
lines changed

datafusion/physical-plan/src/aggregates/group_values/column.rs

+15-3
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,16 @@
1616
// under the License.
1717

1818
use crate::aggregates::group_values::group_column::{
19-
ByteGroupValueBuilder, GroupColumn, PrimitiveGroupValueBuilder,
19+
ByteGroupValueBuilder, ByteViewGroupValueBuilder, GroupColumn,
20+
PrimitiveGroupValueBuilder,
2021
};
2122
use crate::aggregates::group_values::GroupValues;
2223
use ahash::RandomState;
2324
use arrow::compute::cast;
2425
use arrow::datatypes::{
25-
Date32Type, Date64Type, Float32Type, Float64Type, Int16Type, Int32Type, Int64Type,
26-
Int8Type, UInt16Type, UInt32Type, UInt64Type, UInt8Type,
26+
BinaryViewType, Date32Type, Date64Type, Float32Type, Float64Type, Int16Type,
27+
Int32Type, Int64Type, Int8Type, StringViewType, UInt16Type, UInt32Type, UInt64Type,
28+
UInt8Type,
2729
};
2830
use arrow::record_batch::RecordBatch;
2931
use arrow_array::{Array, ArrayRef};
@@ -119,6 +121,8 @@ impl GroupValuesColumn {
119121
| DataType::LargeBinary
120122
| DataType::Date32
121123
| DataType::Date64
124+
| DataType::Utf8View
125+
| DataType::BinaryView
122126
)
123127
}
124128
}
@@ -184,6 +188,14 @@ impl GroupValues for GroupValuesColumn {
184188
let b = ByteGroupValueBuilder::<i64>::new(OutputType::Binary);
185189
v.push(Box::new(b) as _)
186190
}
191+
&DataType::Utf8View => {
192+
let b = ByteViewGroupValueBuilder::<StringViewType>::new();
193+
v.push(Box::new(b) as _)
194+
}
195+
&DataType::BinaryView => {
196+
let b = ByteViewGroupValueBuilder::<BinaryViewType>::new();
197+
v.push(Box::new(b) as _)
198+
}
187199
dt => {
188200
return not_impl_err!("{dt} not supported in GroupValuesColumn")
189201
}

0 commit comments

Comments
 (0)