Skip to content

[EPIC] Attach Diagnostic to more errors #14429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 9 tasks
eliaperantoni opened this issue Feb 3, 2025 · 3 comments
Open
3 of 9 tasks

[EPIC] Attach Diagnostic to more errors #14429

eliaperantoni opened this issue Feb 3, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@eliaperantoni
Copy link
Contributor

eliaperantoni commented Feb 3, 2025

Is your feature request related to a problem or challenge?

In #13664 we introduced the Diagnostic type and DataFusionError::Diagnostic. They allow enriching errors with messages meant for consumption by end users of an application built on top of DataFusion, by providing rich information and context that directly references locations in the SQL query. They enable features like:

See datafusion/sql/tests/cases/diagnostic for examples on how to extract and use diagnostics:

#[test]
fn test_table_not_found() -> Result<()> {
let query = "SELECT * FROM /*a*/personx/*a*/";
let spans = get_spans(query);
let diag = do_query(query);
assert_eq!(diag.message, "table 'personx' not found");
assert_eq!(diag.span, Some(spans["a"]));
Ok(())
}

In that PR, we only implemented diagnostics for:

  • Unresolved table references
  • Unresolved column references (qualified and non)
  • Non-aggregate expressions missing from GROUP BY clause
  • Ambiguous column references
  • Wrong number of columns in set expression (e.g. UNION)
  • Incompatible types in binary expressions

This issue is about using Diagnostic in more places, and adding related tests to datafusion/sql/tests/cases/diagnostic. We think we should at least implement the following, but suggestions are welcome and encouraged:

Describe the solution you'd like

The implementation should follow the steps of #13664, by calling DataFusionError.with_diagnostic to attach a Diagnostic to an error that is currently being returned. Tests should be added to datafusion/sql/tests/cases/diagnostic for each newly supported scenario.

For some of these items, it might be necessary to enrich the logical types with the Span information coming from the parser. This should be done using the datafusion::common::Spans type (note the "s"), introduced in #13664 to add span information to datafusion::common::Column.

It is desirable that the implementation is as little invasive as possible, in that it shouldn't require changing tons of function calls and types, unless absolutely necessary. The public facing API shouldn't change. The Diagnostic should be attached as soon as possible to the creation of the wrapped DataFusionError (i.e. deep in the call stack) and every error should ideally have just one Diagnostic.

Describe alternatives you've considered

No response

Additional context

No response

@jsai28
Copy link
Contributor

jsai28 commented Mar 19, 2025

Hi, is this still a potential GSoC project? It looks like many of the tickets in this epic have an open pull request and are close to completion. If you know of any other areas that could use some DX improvements, I'd be interested to work on them!

@eliaperantoni
Copy link
Contributor Author

Hey @jsai28! You're right, most of them are being worked on. But just recently I've filed #15276. I was planning on doing it myself but you could take it you'd like :) In that case, let's coordinate on the implementation details 😊

@jsai28
Copy link
Contributor

jsai28 commented Mar 19, 2025

@eliaperantoni Sure, I'll take a look at that! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants