-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Part 3 of expression based transform: Use computed transform #613
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #613 +/- ##
==========================================
- Coverage 84.05% 84.02% -0.03%
==========================================
Files 75 75
Lines 17251 17248 -3
Branches 17251 17248 -3
==========================================
- Hits 14500 14493 -7
- Misses 2052 2059 +7
+ Partials 699 696 -3 ☔ View full report in Codecov by Sentry. |
@@ -429,5 +427,12 @@ pub unsafe extern "C" fn visit_scan_data( | |||
callback, | |||
}; | |||
// TODO: return ExternResult to caller instead of panicking? | |||
visit_scan_files(data, selection_vec, &transforms.transforms, context_wrapper, rust_callback).unwrap(); | |||
visit_scan_files( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this formatting that should have been applied in an earlier PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was applied earlier, it's included in part-2. not quite sure why it's showing up here but it'll go away once earlier PRs merge
kernel/src/scan/mod.rs
Outdated
&all_fields, | ||
have_partition_cols, | ||
); | ||
let logical = if let Some(ref transform) = scan_file.transform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: Is there some way to factor out some of the duplicated logic between this default engine, and the sync engine example above? (I'm guessing this new code just adds to existing duplication, so best addressed in a separate PR)
have_partition_cols: bool, | ||
) -> DeltaResult<Box<dyn EngineData>> { | ||
let physical_schema = global_state.physical_schema.clone(); | ||
if !have_partition_cols && global_state.column_mapping_mode == ColumnMappingMode::None { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this code was annoying when I added column mapping support for expression eval. I think this was the only code left that specifically tracked or cared about column mapping mode, because of the new way logical -> physical transforms are performed.
Recommend to audit the caller chain and see what other code can be simplified, now that we don't need column mapping logic here any more.
kernel/tests/read.rs
Outdated
) | ||
.unwrap(); | ||
// to transform the physical data into the correct logical form | ||
let logical = if let Some(ref transform) = scan_file.transform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is (at least) three places that do ~exactly the same thing. Is there a way to factor out a helper method that everyone can use?
6d72b75
to
ec0d671
Compare
…and return it. (#607) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md 2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`. 3. Ensure you have added or run the appropriate tests for your PR. 4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 5. Be sure to keep the PR description updated to reflect all changes. --> ## What changes are proposed in this pull request? <!-- Please clarify what changes you are proposing and why the changes are needed. The purpose of this section is to outline the changes, why they are needed, and how this PR fixes the issue. If the reason for the change is already explained clearly in an issue, then it does not need to be restated here. 1. If you propose a new API or feature, clarify the use case for a new API or feature. 2. If you fix a bug, you can clarify why it is a bug. --> This is the initial part of moving to using expressions to express transformations when reading data. What this PR does is: - Compute a "static" transform, which is just a set of column expressions that need to be passed directly through without change, or enough metadata for lower levels to fill in a "fixup" expression - The static transform is passed into the iterator that parses each `Add` file - When parsing the `Add` file, if there are needed fix-ups (just partition columns today), the correct expression is created, and inserted into a row indexed map - This map is returned so the caller can find out for a given row what, if any, expression needs to be applied when reading the specified row Follow-up PRs: * #612: Propagate this information through when using `visit_scan_files` * #613: Actually use the data to do transformation and remove `transform_to_logical` entirely * #614: Make this work over ffi and use it * (TODO): Clean up any existing code that's now over complicated in the scan building Each of those are more invasive and end up touching significant code, so I'm staging this as much as possible to make reviews easier. <!-- Uncomment this section if there are any changes affecting public APIs: ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested? <!-- Please make sure to add test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested, ideally via a reproducible test documented in the PR description. --> Unit tests, and inspection of resultant expressions when run on tables
ec0d671
to
24a45c9
Compare
Stacked PR, only review these commits
What changes are proposed in this pull request?
Use computed transforms and remove
transform_to_logical
. Just a draft, might need more tests, although all existing tests pass, which exercise this extensively.How was this change tested?