-
Notifications
You must be signed in to change notification settings - Fork 205
fix: get_struct field is incorrect when struct in array #1687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1687 +/- ##
============================================
+ Coverage 56.12% 58.64% +2.51%
- Complexity 976 1140 +164
============================================
Files 119 130 +11
Lines 11743 12676 +933
Branches 2251 2368 +117
============================================
+ Hits 6591 7434 +843
- Misses 4012 4055 +43
- Partials 1140 1187 +47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@andygrove @parthchandra @mbutrovich if you have time to look into it |
) -> DataFusionResult<ArrayRef> { | ||
match (from_type, to_type) { | ||
(DataType::List(_), DataType::List(to_inner_type)) => { | ||
let cast_field = cast_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will recurse back into cast_array
where we have the special handling for casting struct-to-struct 👍
parquet_options: &SparkParquetOptions, | ||
) -> DataFusionResult<ArrayRef> { | ||
match (from_type, to_type) { | ||
(DataType::List(_), DataType::List(to_inner_type)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could just pass to_inner_type
directly into this method to avoid matching on List
again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also from_type
doesn't need to be passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplified
@@ -171,7 +178,36 @@ fn cast_array( | |||
.with_timezone(Arc::clone(tz)), | |||
)) | |||
} | |||
_ => Ok(cast_with_options(&array, to_type, &PARQUET_OPTIONS)?), | |||
// If Arrow cast supports the cast, delegate the cast to Arrow | |||
_ if can_cast_types(from_type, to_type) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
_ if can_cast_types(from_type, to_type) => { | ||
Ok(cast_with_options(&array, to_type, &PARQUET_OPTIONS)?) | ||
} | ||
_ => Ok(array), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an error rather than ignoring the cast request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it depends, for now we just ignore the casting as we cannot find a proper handler leaving the data as it comes from Spark and propagating the responsibility back to the caller.
Returning Error also makes sense to identify such unsupported pairs of type sooner than later by getting the error.
I dont have a strong opinion here, WDYT which one would serve better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some nits / questions. This is a huge step forward 🚀.
Which issue does this PR close?
Closes #1681 .
Rationale for this change
What changes are included in this PR?
How are these changes tested?