Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make ExpressionHandler::get_evaluator fallible #577

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

cg-cognition
Copy link

@cg-cognition cg-cognition commented Dec 7, 2024

Fixes #566 . Makes get_evaluator() return DeltaResult<Arc> to properly handle potential errors. This change is intended to support get_evaluator() doing eager validation checking, but doesn't do the validation itself.

What changes are proposed in this pull request?

Here's a comprehensive summary of all changes made:

  1. Core Interface Change (lib.rs):
  • Modified ExpressionHandler trait to make get_evaluator return DeltaResult<Arc>
  • This change makes error handling explicit at the trait level
  1. Implementation Changes (arrow_expression.rs):
  • Updated ArrowExpressionHandler to return Ok(Arc::new(...))
  • Simple change since this implementation can't fail
  1. Error Handling Improvements (data_skipping.rs):
  • Simplified error handling using ok()? operator
  • Made the code more idiomatic while maintaining error propagation
  • Applied to select_stats_evaluator, skipping_evaluator, and filter_evaluator
  1. Iterator Changes (log_replay.rs):
  • Changed to boxed iterator to handle both success and error cases
  • Added explicit error handling path using iter::once(Err(e))
  • Maintains original behavior while properly propagating errors
  1. Call Site Updates:
  • Updated all get_evaluator calls to handle Result type
  • Modified in transaction.rs, scan/mod.rs, and table_changes/log_replay.rs
  • Added proper error propagation using ? operator

How was this change tested?

Tested with cargo test --all-features --all-targets -- --skip read_table_version_hdfs. Had an issue with getting Java setup but that test doesn't seem relevant to the PR. No new tests need to be written since it's a no-op refactor.

This PR affects the following public APIs

Allows get_evaluator to return a Result type, which opens up the possibility to do eager schema validation.

Additional Context

This PR was entirely written by Devin with a little bit of review from me. Happy to address any feedback and get this over the finish line.

Original Run: https://preview.devin.ai/sessions/e839a4e9bcb444b69b99205babdcf1af

Make get_evaluator() return DeltaResult<Arc<dyn ExpressionEvaluator>> to properly handle potential errors. Update all call sites to handle the Result type and simplify error handling using the ok()? operator where appropriate.
@github-actions github-actions bot added the breaking-change Change that will require a version bump label Dec 9, 2024
kernel/src/scan/data_skipping.rs Show resolved Hide resolved
kernel/src/scan/log_replay.rs Outdated Show resolved Hide resolved
@scovich scovich requested a review from nicklan December 10, 2024 20:38
devin-ai-integration bot and others added 8 commits December 11, 2024 01:14
…tions

- Modified scan_action_iter to return DeltaResult<impl Iterator>
- Fixed code formatting in log_replay.rs files
- Reordered imports in mod.rs for better readability
- Improved error propagation in transaction.rs

Co-Authored-By: Calvin Giroud <[email protected]>
Added warning logs when evaluator creation fails in DataSkippingFilter:
- Log stats selector evaluator failures
- Log skipping evaluator failures
- Log filter evaluator failures

This improves debuggability when data skipping does not occur as expected.

Co-Authored-By: Calvin Giroud <[email protected]>
@cg-cognition cg-cognition requested a review from scovich December 11, 2024 02:43
Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR description to correctly follow the breaking change template -- our release management scrapes PR descriptions to build the changelog. It looks like the commented-out template was deleted from the PR description, so you'll need to dig it up from some other PR.

Comment on lines 110 to 113
.map_err(|e| {
warn!("Failed to create stats selector evaluator: {}", e);
e
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map_err(|e| {
warn!("Failed to create stats selector evaluator: {}", e);
e
})
.inspect_err(|e| warn!("Failed to create stats selector evaluator: {e}"))

(more below)

action_iter
)?;

Ok(action_iter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's very subjective, but in general I find this sort of multi-line, multi-operation Ok (or Some, etc) to be unnecessarily hard to read because of all the nesting. Factoring it out as a value is easier to read (and change later):

let actions = action_iter
      ...
    .filter(...);
Ok(actions)

Wrapping a single function call or struct creation isn't so bad, because conceptually only one thing is happening, e.g. this code above is not particularly difficult to read:

Ok(Arc::new(DefaultExpressionEvaluator {
    ...
}))

Clippy seems to take a similar stance on monadic chains -- it will almost always force newlines between chained function calls, if more than one of the functions takes args or even if the line gets very long (long before the 100-char line limit).

@cg-cognition cg-cognition requested a review from scovich December 11, 2024 17:47
@cg-cognition
Copy link
Author

Updated the PR description, not quite sure if I did it right! Let me know after you review.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one new type annotation that seems unnecessary?

kernel/src/scan/log_replay.rs Outdated Show resolved Hide resolved
…devin-open-source/delta-kernel-rs into devin/1733609186-fallible-get-evaluator
Copy link

codecov bot commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 66.03774% with 18 lines in your changes missing coverage. Please review.

Project coverage is 83.36%. Comparing base (be1453f) to head (fa7c9ff).

Files with missing lines Patch % Lines
kernel/src/scan/data_skipping.rs 70.83% 0 Missing and 7 partials ⚠️
kernel/src/scan/mod.rs 37.50% 1 Missing and 4 partials ⚠️
kernel/src/transaction.rs 0.00% 0 Missing and 2 partials ⚠️
kernel/src/engine/default/mod.rs 0.00% 0 Missing and 1 partial ⚠️
kernel/src/scan/log_replay.rs 90.90% 0 Missing and 1 partial ⚠️
kernel/src/table_changes/log_replay.rs 0.00% 0 Missing and 1 partial ⚠️
kernel/src/table_changes/scan.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #577      +/-   ##
==========================================
- Coverage   83.46%   83.36%   -0.11%     
==========================================
  Files          74       74              
  Lines       16858    16854       -4     
  Branches    16858    16854       -4     
==========================================
- Hits        14071    14050      -21     
- Misses       2129     2130       +1     
- Partials      658      674      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change that will require a version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Engine::get_evaluator should be fallible
2 participants