-
Notifications
You must be signed in to change notification settings - Fork 205
fix: get_struct field is incorrect when struct in array #1687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
25bf8d7
dd214e0
1726636
2f14be0
b940692
bb92d38
14162e8
0b75dfd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,8 @@ | |
// under the License. | ||
|
||
use crate::execution::operators::ExecutionError; | ||
use arrow::array::ListArray; | ||
use arrow::compute::can_cast_types; | ||
use arrow::{ | ||
array::{ | ||
cast::AsArray, new_null_array, types::Int32Type, types::TimestampMicrosecondType, Array, | ||
|
@@ -156,13 +158,30 @@ fn cast_array( | |
}; | ||
let from_type = array.data_type(); | ||
|
||
// Try Comet specific handlers first, then arrow-rs cast if supported, | ||
// return uncasted data otherwise | ||
match (from_type, to_type) { | ||
(Struct(_), Struct(_)) => Ok(cast_struct_to_struct( | ||
array.as_struct(), | ||
from_type, | ||
to_type, | ||
parquet_options, | ||
)?), | ||
(List(_), List(to_inner_type)) => { | ||
let list_arr: &ListArray = array.as_list(); | ||
let cast_field = cast_array( | ||
Arc::clone(list_arr.values()), | ||
to_inner_type.data_type(), | ||
parquet_options, | ||
)?; | ||
|
||
Ok(Arc::new(ListArray::new( | ||
Arc::clone(to_inner_type), | ||
list_arr.offsets().clone(), | ||
cast_field, | ||
list_arr.nulls().cloned(), | ||
))) | ||
} | ||
(Timestamp(TimeUnit::Microsecond, None), Timestamp(TimeUnit::Microsecond, Some(tz))) => { | ||
Ok(Arc::new( | ||
array | ||
|
@@ -171,7 +190,11 @@ fn cast_array( | |
.with_timezone(Arc::clone(tz)), | ||
)) | ||
} | ||
_ => Ok(cast_with_options(&array, to_type, &PARQUET_OPTIONS)?), | ||
// If Arrow cast supports the cast, delegate the cast to Arrow | ||
_ if can_cast_types(from_type, to_type) => { | ||
Ok(cast_with_options(&array, to_type, &PARQUET_OPTIONS)?) | ||
} | ||
_ => Ok(array), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be an error rather than ignoring the cast request? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well it depends, for now we just ignore the casting as we cannot find a proper handler leaving the data as it comes from Spark and propagating the responsibility back to the caller. Returning Error also makes sense to identify such unsupported pairs of type sooner than later by getting the error. I dont have a strong opinion here, WDYT which one would serve better? |
||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍