Skip to content

feat: add macros for DataFusionError variants #15946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Chen-Yuan-Lai
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

As #15491 says, add two macros for

  • External
  • ExecutionJoin

Moreover, the macro for ResourcesExhausted already existed, so we don't need to add it in the PR

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the common Related to common crate label May 5, 2025
@@ -808,12 +808,18 @@ make_error!(plan_err, plan_datafusion_err, Plan);
// Exposes a macro to create `DataFusionError::Internal` with optional backtrace
make_error!(internal_err, internal_datafusion_err, Internal);

// Exposes a macro to create `DataFusionError::IoError` with optional backtrace
make_error!(external_err, external_datafusion_err, External);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't there any current usage of this error types it to replace with macros?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @comphead, I noticed that using make_error! isn't suitable for these error variants because External and ExecutionJoin are designed to wrap original error objects, not just error messages.

Instead, I've implemented custom macros for External and ExecutionJoin errors, similar to #15796, and replaced a few use cases as examples. Should we replace all the cases?

@Chen-Yuan-Lai Chen-Yuan-Lai marked this pull request as draft May 6, 2025 17:56
@github-actions github-actions bot added core Core DataFusion crate datasource Changes to the datasource crate labels May 7, 2025
@Chen-Yuan-Lai Chen-Yuan-Lai marked this pull request as ready for review May 7, 2025 09:41
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key point of this macros is preserving backtrace which makes debugging easier if you got an error from datafusion among multiple layers of processing.

The variant below is for non string errors preserving the backtrace

#[macro_export]
macro_rules! schema_err {
    ($ERR:expr $(; diagnostic = $DIAG:expr)?) => {{
        let err = $crate::error::DataFusionError::SchemaError(
            $ERR,
            Box::new(Some($crate::error::DataFusionError::get_back_trace())),
        );
        $(
            let err = err.with_diagnostic($DIAG);
        )?
        Err(err)
    }
    };
}

@Chen-Yuan-Lai
Copy link
Contributor Author

@comphead Thank you for your review. You're right - my current implementation doesn't preserve backtraces.

To preserve backtraces, I think there are two options:

  1. Add Context wrapping: We could add backtrace by using Context

    let err = DataFusionError::External(Box::new($ERR))
        .context(format!("backtrace: {}", DataFusionError::get_back_trace()));

    However, this would create double wrapping and might affect error handling logic.

  2. Modify DataFusionError enum definitions: Change error variants to include backtrace fields

    External(GenericError, Option<String>),
    ExecutionJoin(JoinError, Option<String>),

    But this would need to change the usage of this error type in the codebase

Do you think which approach is better, or would you recommend a different solution?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Chen-Yuan-Lai I think second option is good way to go, more over we already have similar invariants

    /// Error returned by arrow.
    ///
    /// 2nd argument is for optional backtrace
    ArrowError(ArrowError, Option<String>),

@Chen-Yuan-Lai Chen-Yuan-Lai marked this pull request as draft May 9, 2025 04:52
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation ffi Changes to the ffi crate labels May 9, 2025
@Chen-Yuan-Lai
Copy link
Contributor Author

Chen-Yuan-Lai commented May 12, 2025

Summary of this change

  1. Enhancement of DataFusionError Enum Variants

    • Updated the ExecutionJoin and External variants to maintain backtrace
  2. Addition of Utility Method for Error Processing

    • Implemented a new method DataFusionError::box_error_if_needed() that handles:
      • Automatic boxing for regular error types
      • Prevention of double-boxing for already boxed errors
      • Support for string-based error messages
  3. Improvement of Error-related Macros

    • Capture and include backtraces
    • Utilize the new box_error_if_needed() method
    • Replace all the usage with new macros

@Chen-Yuan-Lai Chen-Yuan-Lai marked this pull request as ready for review May 12, 2025 06:54
where
E: Into<GenericError>,
{
err.into()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I'm not sure about if this method is needed, can we just inline it into the macros where it is being used?

Copy link
Contributor Author

@Chen-Yuan-Lai Chen-Yuan-Lai May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but to prevent the compiler from resolving the target type for e.into(), I use GenericError::from(e) to explicitly provide the target type

let err = $crate::error::DataFusionError::External(
// covert input to GenericError
$crate::error::GenericError::from($CONVERTIBLE_TO_ERR),
Some($crate::error::DataFusionError::get_back_trace())
);

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Chen-Yuan-Lai I think we really close to it

return Err(DataFusionError::External(
"PrintFormat::Table is not implemented".to_string().into(),
));
return external_err!("PrintFormat::Table is not implemented");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noticed the lack of this macro recently too -- thank you for adding it 🙏

@alamb alamb added the api change Changes the API exposed to users of the crate label May 16, 2025
@alamb
Copy link
Contributor

alamb commented May 16, 2025

Thank you for this PR @Chen-Yuan-Lai and @comphead -- very nice

I am a little worried about changing the DataFusionError variant as it which will be a disruptive downstream change (users have to update all match statements that currently match on DataFusionError and they won't really see any new functionality

Rather than change the DataFusionError variant, maybe we could model adding a backtrace using DataFusionError::Context

So keep DataFusionError::External and potentially add a backtrace like

DataFusionError::External(..)
  .context(add_backtrace())

Or something like that

@Chen-Yuan-Lai Chen-Yuan-Lai marked this pull request as draft May 21, 2025 14:32
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate execution Related to the execution crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-plan Changes to the physical-plan crate proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add missing error macro
3 participants