Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support array_concat for Utf8View #14378

Merged
merged 4 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 1 addition & 51 deletions datafusion/expr-common/src/type_coercion/binary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,7 @@ use arrow::datatypes::{
DECIMAL128_MAX_SCALE, DECIMAL256_MAX_PRECISION, DECIMAL256_MAX_SCALE,
};
use datafusion_common::types::NativeType;
use datafusion_common::{
exec_datafusion_err, exec_err, internal_err, plan_datafusion_err, plan_err, Result,
};
use datafusion_common::{exec_err, internal_err, plan_datafusion_err, plan_err, Result};
use itertools::Itertools;

/// The type signature of an instantiation of binary operator expression such as
Expand Down Expand Up @@ -869,54 +867,6 @@ fn get_wider_decimal_type(
}
}

/// Returns the wider type among arguments `lhs` and `rhs`.
/// The wider type is the type that can safely represent values from both types
/// without information loss. Returns an Error if types are incompatible.
pub fn get_wider_type(lhs: &DataType, rhs: &DataType) -> Result<DataType> {
Copy link
Contributor Author

@alamb alamb Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was used in array_concat only to compute the return type, and array_concat errors at runtime anyways if you pass it arrays with different types.

Thus this code is unnecessary so I propose removing it

> select array_concat([arrow_cast('1', 'LargeUtf8')], ['2']);
Arrow error: Invalid argument error: It is not possible to concatenate arrays of different data types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best optimization .. no code

use arrow::datatypes::DataType::*;
Ok(match (lhs, rhs) {
(lhs, rhs) if lhs == rhs => lhs.clone(),
// Right UInt is larger than left UInt.
(UInt8, UInt16 | UInt32 | UInt64) | (UInt16, UInt32 | UInt64) | (UInt32, UInt64) |
// Right Int is larger than left Int.
(Int8, Int16 | Int32 | Int64) | (Int16, Int32 | Int64) | (Int32, Int64) |
// Right Float is larger than left Float.
(Float16, Float32 | Float64) | (Float32, Float64) |
// Right String is larger than left String.
(Utf8, LargeUtf8) |
// Any right type is wider than a left hand side Null.
(Null, _) => rhs.clone(),
// Left UInt is larger than right UInt.
(UInt16 | UInt32 | UInt64, UInt8) | (UInt32 | UInt64, UInt16) | (UInt64, UInt32) |
// Left Int is larger than right Int.
(Int16 | Int32 | Int64, Int8) | (Int32 | Int64, Int16) | (Int64, Int32) |
// Left Float is larger than right Float.
(Float32 | Float64, Float16) | (Float64, Float32) |
// Left String is larger than right String.
(LargeUtf8, Utf8) |
// Any left type is wider than a right hand side Null.
(_, Null) => lhs.clone(),
(List(lhs_field), List(rhs_field)) => {
let field_type =
get_wider_type(lhs_field.data_type(), rhs_field.data_type())?;
if lhs_field.name() != rhs_field.name() {
return Err(exec_datafusion_err!(
"There is no wider type that can represent both {lhs} and {rhs}."
));
}
assert_eq!(lhs_field.name(), rhs_field.name());
let field_name = lhs_field.name();
let nullable = lhs_field.is_nullable() | rhs_field.is_nullable();
List(Arc::new(Field::new(field_name, field_type, nullable)))
}
(_, _) => {
return Err(exec_datafusion_err!(
"There is no wider type that can represent both {lhs} and {rhs}."
));
}
})
}

/// Convert the numeric data type to the decimal data type.
/// We support signed and unsigned integer types and floating-point type.
fn coerce_numeric_type_to_decimal(numeric_type: &DataType) -> Option<DataType> {
Expand Down
45 changes: 26 additions & 19 deletions datafusion/functions-nested/src/concat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ use datafusion_common::{
cast::as_generic_list_array, exec_err, not_impl_err, plan_err, utils::list_ndims,
};
use datafusion_expr::{
type_coercion::binary::get_wider_type, ColumnarValue, Documentation, ScalarUDFImpl,
ColumnarValue, Documentation, ScalarUDFImpl,
Signature, Volatility,
};
use datafusion_macros::user_doc;
Expand Down Expand Up @@ -276,25 +276,32 @@ impl ScalarUDFImpl for ArrayConcat {
let mut expr_type = DataType::Null;
let mut max_dims = 0;
for arg_type in arg_types {
match arg_type {
DataType::List(field) => {
if !field.data_type().equals_datatype(&DataType::Null) {
let dims = list_ndims(arg_type);
expr_type = match max_dims.cmp(&dims) {
Ordering::Greater => expr_type,
Ordering::Equal => get_wider_type(&expr_type, arg_type)?,
Ordering::Less => {
max_dims = dims;
arg_type.clone()
}
};
let DataType::List(field) = arg_type else {
return plan_err!(
"The array_concat function can only accept list as the args."
);
};
if !field.data_type().equals_datatype(&DataType::Null) {
let dims = list_ndims(arg_type);
expr_type = match max_dims.cmp(&dims) {
Ordering::Greater => expr_type,
Ordering::Equal => {
if expr_type == DataType::Null {
arg_type.clone()
} else if !expr_type.equals_datatype(arg_type) {
return plan_err!(
"It is not possible to concatenate arrays of different types. Expected: {}, got: {}", expr_type, arg_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementation of array_concat requires the field types of ListArray to be the same and errors at runtime if they aren't

Previously the return type was calculated only to get a runtime error. I simply moved the check to planning time

);
} else {
expr_type
}
}
}
_ => {
return plan_err!(
"The array_concat function can only accept list as the args."
)
}

Ordering::Less => {
max_dims = dims;
arg_type.clone()
}
};
}
}

Expand Down
41 changes: 41 additions & 0 deletions datafusion/sqllogictest/test_files/array.slt
Original file line number Diff line number Diff line change
Expand Up @@ -2848,6 +2848,47 @@ select array_concat([]);
----
[]

# Concatenating strings arrays
query ?
select array_concat(
['1', '2'],
['3']
);
----
[1, 2, 3]

# Concatenating string arrays
query ?
select array_concat(
[arrow_cast('1', 'LargeUtf8'), arrow_cast('2', 'LargeUtf8')],
[arrow_cast('3', 'LargeUtf8')]
);
----
[1, 2, 3]

# Concatenating stringview
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these three queries work fine on main

query ?
select array_concat(
[arrow_cast('1', 'Utf8View'), arrow_cast('2', 'Utf8View')],
[arrow_cast('3', 'Utf8View')]
);
----
[1, 2, 3]

# Concatenating Mixed types (doesn't work)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these queries fail on main though the errors are different for the string view one (see the first commit of this PR)

It seems to me like this display is 🤮 due to the Display impl of DataType being crap.

I'll file an upstream ticket to make this easier to understand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query error DataFusion error: Error during planning: It is not possible to concatenate arrays of different types\. Expected: List\(Field \{ name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), got: List\(Field \{ name: "item", data_type: LargeUtf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\)
select array_concat(
[arrow_cast('1', 'Utf8'), arrow_cast('2', 'Utf8')],
[arrow_cast('3', 'LargeUtf8')]
);

# Concatenating Mixed types (doesn't work)
query error DataFusion error: Error during planning: It is not possible to concatenate arrays of different types\. Expected: List\(Field \{ name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), got: List\(Field \{ name: "item", data_type: Utf8View, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\)
select array_concat(
[arrow_cast('1', 'Utf8'), arrow_cast('2', 'Utf8')],
[arrow_cast('3', 'Utf8View')]
);

# array_concat error
query error DataFusion error: Error during planning: The array_concat function can only accept list as the args\.
select array_concat(1, 2);
Expand Down
Loading