Skip to content

Potential changes to nom error handling story for 6.0 #1136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yaahc opened this issue Apr 6, 2020 · 20 comments
Open

Potential changes to nom error handling story for 6.0 #1136

yaahc opened this issue Apr 6, 2020 · 20 comments
Milestone

Comments

@yaahc
Copy link

yaahc commented Apr 6, 2020

I haven't tested this stuff out yet so I'm not sure how feasible any of it is but here are the ideas I think are worth investigating for 6.0:

Move off of nom::error::ParseError and use std::error::Error instead

  • see if instead of append and add_context combinators that need to merge errors can instead use std::ops::Add (remove ParseError::add_context #1131).
    • I'm unsure of what the difference between append and or is in practice, so this may instead apply to or.
  • use From instead of from_error_kind for converting from a root cause error to a user provided error type
    • this could possibly involve introducing a new root cause error type that functionally replaces the (I, ErrorKind) tuple.

Inter-operate with foreign error types instead of discarding

  • change map_res to try to use From into the user Error type from the error type of the closure, specifically make sure to support preserving std ParseIntError and similar errors
  • change verify to work with Result instead of bool to allow custom error messages upon verify failures.

Look into interoperating with error reporting mechanisms that require 'static

Probably with the ToOwned trait in std, I'm not sure how easy it would be to write the impl for this as a user in practice.

Alternatively: we can look into removing the lifetime bound on errors that keep the input with them, my understanding is that the main reason that these carry around references is to do offsets against the original reference, we could instead carry a ptr and require that the original reference is provided when printing the final error report to get the nicely rendered contexts.

Stop treating context as Errors

The specific location in a program / context of what it was doing is not an error, or indicative of a failure, and I don't think it should be stored in the error chain/tree the same way the various parse failures are. Instead I'd like to see it gathered in a similar way to how I'm proposing we support Error Return Traces

Additional Comments

I'm not positive is now is the best time to apply these changes. I'm currently working on this RFC rust-lang/rfcs#2895 to add a way to get custom context out of dyn Errors, and a few of the changes I'm imagining would probably want to use this functionality as part of the error reporting mechanism, though it gets a little hairy because nom errors like to have references in them, and the context extraction mechanisms would work for any members with lifetimes in them...

The vague idea im throwing around in my head is something like this.

/// Assuming the error has been made owned
///
/// if the input references in leaf errors are stored entirely for the offset, could we instead
/// store ptrs?
///
/// using println here to be lazy, would actually use `std::fmt` machinery
fn example_nom_error_report(input: &I, error: &dyn std::error::Error + 'static) {
    let mut cur = Some(error);
    let mut ind = 0;

    eprintln!("Error:");
    while let Some(error) = cur {
        eprintln!("    {}: {}", ind, error);

        if let Some(causes) = error.context::<[nom::error::Error]>() {
            eprintln!("    Various parse failures:");
            for (ind, cause) in causes.iter().enumerate() {
                eprintln!("        {}: {}", ind, cause.print_with(input));
            }
        }

        if let Some(contexts) = error.context::<[nom::error::Context]>() {
            eprintln!("    Various contexts:");
            for (ind, context) in contexts.iter().enumerate() {
                eprintln!("        {}: {}", ind, context.print_with(input));
            }
        }

        cur = error.source();
        ind += 1;
    }
}

This would hopefully support printing all these errors

Error:
    0: Unable to parse Type from Input
       Various parse failures:
           0: at line 2:
             "c": { 1"hello" : "world"
                    ^
           expected '}', found 1
       Various contexts:
           1: at line 2, in map:
             "c": { 1"hello" : "world"
                  ^

           2: at line 0, in map:
             { "a" : 42,

Error:
    0: Unable to parse Type from Input
    1: invalid digit found in string

Error:
    0: Unable to parse Type from Input
       Various contexts:
           1: at line 2, in map:
             "c": { 1"hello" : "world"
                  ^

           2: at line 0, in map:
             { "a" : 42,
    1: failed verify for some specific reason

The specific syntax and error report shape are not really the point here, this would probably need a lot of tweeking, but hopefully this gets the idea across.

@Geal Geal added this to the 6.0 milestone Apr 8, 2020
@Geal
Copy link
Collaborator

Geal commented Apr 8, 2020

see if instead of append and add_context combinators that need to merge errors can instead use std::ops::Add (#1131).
I'm unsure of what the difference between append and or is in practice, so this may instead apply to or.

append builds a stack trace of parsers (what you'd access with backtrace()), while or combines errors coming from various branches of alt. It might make sense to move or out of ParseError like I did with add_context in #1131, since it's only used in alt.
I like the idea of using operator traits, BitOr would work for or(), but for append(), I'm not sure about Add, is it commonly used to accumulate errors?

use From instead of from_error_kind

good idea

this could possibly involve introducing a new root cause error type that functionally replaces the (I, ErrorKind) tuple

what would that type do?

change map_res to try to use From into the user Error type

this might be an issue: if people use error types provided by nom (most of them will), they won't be able to provide From implementations for errors from other libraries. I can provide some of those inside nom, but it will quickly become a maintenance issue (people asking nom to support converting from other libraries, etc)

change verify to work with Result instead of bool

will do

Look into interoperating with error reporting mechanisms that require 'static

carrying a reference to the input is useful for parsers, and passing the original input when processing the error will not always work: some libraries us as input type a wrapper that counts lines and provides user friendly spans. Maybe the error could be converted to a 'static friendly one right after the parser returned its result?

Stop treating context as Errors

For parser generators, a parse error will generally have all of the information because it knows exactly how it got there, but for parser combinators like nom, a parser will not know why it has been called (like, a function name parser could be used when parsing its declaration, or in an expression). That's why I added that "stacktrace" for parsers, to pattern match on it. In most cases it is not needed, but for programming languages it's hard to have proper errors without the context.
I'd be interested in more flexible approaches though.

Side note: nom's VerboseError accumulates the error trace in a Vec because doing it as a linked list in the beginning was very bad for performance. Then that Vec pushed towards a design with error post processing

I'm not positive is now is the best time to apply these changes

especially considering that parts of std::error::Error are still nighty only :D
The custom context idea look interesting, so if I manage to implement std::error::Error we'll be able to test it. About lifetimes, is it still an issue when all the errors from a parser will have the same lifetime (since they will refer to the same input)?

@yaahc
Copy link
Author

yaahc commented Apr 8, 2020

Gonna reply to these comments when I'm at my computer but I just wanted to drop this link in and another possible idea http://sled.rs/errors

Use Result<Result<T, Nonterminal error>, TerminalError> instead of nom::Error to represent multiple error channels

The API grossness might be worse than the current state but I've been trying to figure out how to support a tristate Try type for weeks now and this blog post was the first one that really seems like a good fit to me

@yaahc yaahc closed this as completed Apr 8, 2020
@yaahc yaahc reopened this Apr 8, 2020
@yaahc
Copy link
Author

yaahc commented Apr 8, 2020

I'm not sure about Add, is it commonly used to accumulate errors?

Not within rust afaik, but I think haskell might use if more commonly.

what would that type do?

I'm mostly worried about coherence, but also it would be a nice to have to impl Error i think, and it might also be helpful for doing the 'a -> 'static conversion for working with error reporters.

this might be an issue: if people use error types provided by nom (most of them will), they won't be able to provide From implementations for errors from other libraries. I can provide some of those inside nom, but it will quickly become a maintenance issue (people asking nom to support converting from other libraries, etc)

The error types provided by nom might be able to implement the current discard behavior as part of their api, where a From<E: Error> just discards the error and becomes Kind::ForeignError, tho theres a good chance this will not be coherent...

We might be able to do some cheeky stuff with Into<NomError> to allow ppl to impl conversions into nom errors and then provide manual impls for various std types that are commonly encountered like ParseIntError.

.. but for parser combinators like nom, a parser will not know why it has been called ..

Oh, yea I didn't mean get rid of the context, I just want to change how nom models context, I am convinced we can have all the same behavior and 🤞 the same performance characteristics while also keeping accumulated errors and accumulated context separate.

About lifetimes, is it still an issue when all the errors from a parser will have the same lifetime (since they will refer to the same input)?

I think it might work, if you want we can probably test it out, nika and I have a proof of concept for the error member access that doesn't go thru Any which I think lets us keep lifetime information across the trait boundary. We should be able to slap it onto the error trait in nom and test it out in the short term.

@Geal
Copy link
Collaborator

Geal commented Apr 10, 2020

this could possibly involve introducing a new root cause error type that functionally replaces the (I, ErrorKind) tuple

I just checked, and... yes 😭
Lots of tests to rewrite

@yaahc
Copy link
Author

yaahc commented Apr 10, 2020

I'm so sorry, feel free to assign error handling issues to me btw, I'd like to help out but I'm bad at staying organized or prioritizing things well.

@thirtythreeforty
Copy link

If you wouldn't mind my adding my own two cents here, I would welcome an overhaul of the error handling. (&[u8], ErrorKind) is proving extremely difficult to integrate into my programs - all is fine as long as you call other Nom functions, but there is a big impedance mismatch between Nom errors and Rust errors.

This is partially because the ParseError trait isn't very useful - there's no way to convert it to a std::Error, and because it's a trait, it's difficult to put into my own error types. Instead I'm hardcoding my errors to do a clumsy mapping between (&[u8], ErrorKind), but there don't seem to even be any helper facilities for this error type.

If there's a better way to interoperate with std::Result, please let me know as it would eliminate a lot of contortions in my code.

@danjenson
Copy link

I'm a bit new to rust, but I'd love to help on this issue. I've been using nom to write a parser for a compiler I've been building. @yaahc has mentioned that she will mentor. If that's ok, I'll probably need a few days to start understanding all the issues and then maybe try a few small PRs. I'll likely have a lot of questions.

@Geal
Copy link
Collaborator

Geal commented Jul 4, 2020

@yaahc did anything interesting change in the error handling space in the past months? I'm planning to finish this release soon, so if some new ideas came, I could implement them right away

@yaahc
Copy link
Author

yaahc commented Jul 4, 2020

Not really anything new to report. I am going to be starting up a error handling project group soon so we will hopefully start getting more work done but it will mostly be work on stabilizing what we already have and consolidating the ecosystem a bit better. The ideas in this issue are still applicable IMO, so if you wanna try implementing any of them for this release that would be good but I think its also okay if you punt on it.

@Geal
Copy link
Collaborator

Geal commented Jul 6, 2020

so, on the plus side: changing the (I, ErrorKind) basic error type to a struct is easy, I do not need to update all the tests since type inference works properly.

But changing the result type to pub type IResult<I, O, E = error::Error<I>> = Result<Result<(I, O), Needed>, Err<E>>; gives me error: aborting due to 1588 previous errors 😬
The change is doable, but I'll have to check if it makes the API easier right now

@yaahc
Copy link
Author

yaahc commented Jul 6, 2020

😬 indeed

@torokati44
Copy link

If I may add my two cents to this issue...
In my mind, logically, opt(), and many* are branches just the same as alt(), so the user might be interested in the "complete" error tree (why exactly the opt() parser failed, or why many_* stopped where it did) but instead, only alt() provides this info.
It might be because with alt(), all the branches are right there, in one place, so it's easier to do - hence or() in the error type.
For the other combinators, the branching is spread out a bit (opt() doesn't know about what might be accepted after, if the optional element wasn't), so there might be a need to add this kind of tracing to the error type?
I have looked at nom-trace and nom-tracable for similar effect, but they are a bit too macro-heavy.
Is there a possibility to make this error accumulation work out of the box with all branching parsers, the same as with alt()?

@Geal
Copy link
Collaborator

Geal commented Aug 24, 2020

so, here are th recent changes around error handling:

  • 9a555fe: the finish() method that transforms Result<(I,O), nom::Err<E>> to Result<(I, O), E>, by merging the Error and Failure cases, and panicking if it encounters Incomplete (because Incomplete should never get out of the parsing/IO loop)
  • 981d036: moving the basic error type from (I, ErrorKind) to a struct, so I can implement std::error::Error on it
  • bf83d5d: implementing std::error::Error on nom::error::Error and nom::error::VerboseError. Those implementations are not particularly great, but we can improve them

@thirtythreeforty what ind of integration do you have in mind, and why are you trying to add the ParseError trait instead of the parser error type you chose?
I'm probably failing to see the method, because generally, in my own code, the parse error stop at the IO layer, and do not bubble up to the user

@torokati44 you might need a custom error type with a wrapper combinator you insert here and there for that, because recording errors for many* or opt rarely make sense: you're expecting an error here, either as a terminating signal, or because the data you're looking for is not there but the next piece of data is still valid for the next parser.
For what you want nom-trace can still work, there's a combinator compatible with nom 5 functions: https://docs.rs/nom-trace/0.2.1/nom_trace/#nom-5-functions-support

@thirtythreeforty
Copy link

My understanding of ParseError is that it's the "generic" supertype of all the actual error types. I thought the point of it was to use ParseError wherever possible (even in my own combinators) so it was possible to switch between Error and VerboseError with a cargo flag or similar.

It turned out that I didn't end up using it because it proved difficult. The changes you describe will really help though.

@Geal
Copy link
Collaborator

Geal commented Aug 30, 2020

with af7d8fe custom error types will now be able to store the error from map_res

@Geal
Copy link
Collaborator

Geal commented Aug 30, 2020

@thirtythreeforty being able to switch between error types would only be useful if you make a library of combinators, otherwise you should directly use a single error type. If you want to switch with a flag, it would be easier to have a custom error type, and have its content and ParseError implementation depend on the flag

@yaahc
Copy link
Author

yaahc commented Aug 31, 2020

@Geal out of curiosity, why did you add a FromExternalError trait instead of using From?

@Geal
Copy link
Collaborator

Geal commented Aug 31, 2020

The error type will also get the input position and an ErrorKind with the external error (it might be used with others than map_res). I guess I could require From<(Input, ErrorKind, OtherError)> instead

@yaahc
Copy link
Author

yaahc commented Aug 31, 2020

The error type will also get the input position and an ErrorKind with the external error (it might be used with others than map_res). I guess I could require From<(Input, ErrorKind, OtherError)> instead

That or if instead of using a tuple you provided a type for bundling these you could do something like

struct NomBaseError(Input, ErrorKind, OtherError); followed by From<NomBaseError>

@Lucretiel
Copy link
Contributor

while or combines errors coming from various branches of alt

At least, ostensibly; none of the implementations that nom ships (as of nom 5 and nom 6-alpha1) do anything with or besides discard the old branch error in favor of the incoming one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants