Feedback using winnow
to implement hcl-edit
#230
Replies: 9 comments 16 replies
-
Thank you, this is very useful feedback! I'll break down my comments by section so its easier to follow the threads |
Beta Was this translation helpful? Give feedback.
-
It is good to hear that this worked out for you! And you are using the latest version which solved the type inference problems with this. That was my first concern when I saw the next section |
Beta Was this translation helpful? Give feedback.
-
Yeah, I was torn when writing these on which use case to optimize for. As for providing an ergonomic variant, the concern I have is making the API overwhelming. This is a problem we have in clap where there are so many knobs that you can't discover what you want exists. I've had a similar problem in using |
Beta Was this translation helpful? Give feedback.
-
I've run into this myself. My first thought was to see if we could have a function on One possible route is #98. I've been thinking of creating a |
Beta Was this translation helpful? Give feedback.
-
I think I've found a framing that should keep things ergonomic. The nice thing is most of The new framing is: on error, This would also allow users to drop
I hadn't thought of that but that is a good point! I've linked to this discussion from the performance issue to make sure we include it. I also likely should go back and review some |
Beta Was this translation helpful? Give feedback.
-
I think these are mostly even. The one case if unexpected performance with different trait implementations, I plan to just remove (#226) |
Beta Was this translation helpful? Give feedback.
-
Another compiler performance footgun I can remember is returning a long combinator chain from a function, e.g. something like this: fn foo() -> impl Parser<I, O, E> {
(...some chained combinators...)
} vs. this: fn foo() -> impl Parser<I, O, E> {
move |input: I| {
(...some chained combinators...).parse_next(input)
}
} I had a case where the former variant increased the compile time in debug mode by around 30 seconds for an, if I remember correctly relatively innocent looking chain of combinators. I guess this generates very long deeply nested generic types that |
Beta Was this translation helpful? Give feedback.
-
Yeah, right now hand-written parsers are the main way to deal with this. Which isn't necessarily bad in my opinion but if its used enough, we should likely help with it. With all the caveats we discussed in the other post about API bloat. I think I'd general this to
|
Beta Was this translation helpful? Give feedback.
-
@martinohmann in case you are interested in error recovery, I thought I'd let you know that I've started mapping out a design, see #96 (comment) |
Beta Was this translation helpful? Give feedback.
-
Hi there @epage!
In #223 I promised you some feedback about using
winnow
to create the parser forhcl-edit
. I didn't get to it in the past few weeks due to other obligations, but now I've got a little bit of time to note down some of the things I learnt.Ergonomics
First of all I'd like to point out that the ergonomics of using
winnow
are much better than what I observed when initially starting out withnom
. I like it a lot that&str
,char
,byte
etc. also implement theParser
trait, which avoids a lot of wrapping in other combinators likechar()
ortag()
.Also, converting some of the combinators into methods on the
Parser
trait really helps ergnonomics.is certainly easier to follow than
Type inference
Type inference works really well in most cases, but the
many*
andseparated*
combinators in particular require to help the compiler via turbofish here and there. This is sometimes a bit cumbersome and adds noise since these functions have a ton of type parameters, but only the parameter with theAccumulate
trait bound needs a type hint.For example, I'm parsing whitespace interleaved with comments as follows, which wouldn't typecheck without the turbofish:
I hit this issue in quite a few other places where I just want to accumulate sequences into
()
and then pick them up with.span()
or.recognize()
later, so worked around this with a helper function that gets optimized away:Combinators
I found that some combinators might be missing or are behaving differently to what I observed when using
nom
.recognize
One example are
.recognize()
and.with_recognized()
. They returnStream::Slice
instead ofStream
(which would be equivalent to thenom
version of it returning the original input type).Initially, i was doing something like this in
nom
where the input type wasnom_locate::Located<&str>
:Here the
second
parser will receive an input of typenom_locate::Located<&str>
.In winnow, the equivalent of this would be to use
winnow::stream::Located<&str>
as input and do something like this:Now the problem is the following:
second
will receive a&str
as input instead ofwinnow::stream::Located<&str>
, which means I'm losing span information in thesecond
parser. Even if I used.map_res()
instead of.and_then()
and did wrap the input in there viaLocated::new(output_of_first)
and fed it intosecond
, the span information is incorrect since it will reset the offset to 0.But maybe I'm stupid and there is a way to work around this that I missed. 😉
If there would be a variant of
.recognize()
that returned aStream
instead ofStream::Slice
it would have prevented me from doing this gnarly crime: https://github.com/martinohmann/hcl-rs/blob/503e827a6a7c1f9073aced44ea4977ea90265f13/crates/hcl-edit/src/parser/template.rs#L35-L77. This was much shorter and more elegant usingnom
andmap_parser
.map
-like combinator to work withStateful
When using
winnow::stream::Stateful<I, S>
I didn't find a way to access and update the state within amap
closure. I was thinking that it should be possible to have aParser
method like.map_with_state(f)
that accepts a closure which receives the parser's output and a mutable reference to the state.I'm not sure though if the borrow checker would like that, but was wondering if you experimented with something like that.
Why I'm bringing this up is because I copied your approach from
toml_edit
using aParserState
(e.g. here: https://github.com/martinohmann/hcl-rs/blob/2f5c9ad0e3f62edd59ac434ffd7942f4f252edb8/crates/hcl-edit/src/parser/expr.rs#L30-L89) to work around performance issues when large return types are involved (see below).If it would be possible to manipulate the state of a
Stateful
input inside amap
-like combinator, we could avoid littering parser functions signatures with the additional reference to theParserState
.Performance footguns
I guess #191 would be really useful for people aiming to implement a performant parser. I found a couple of things that might be worth documenting. Maybe some of them can even be fixed.
Large return types
Since
IResult<I, O, E>
is essentially aResult<(I, O), E>
, return type size is both influenced by the size of the input and the output types. You already track this via #72 which already shows thatLocated<&[u8]>
increased parse time by 30% compared to&[u8]
albeit being only twice as large.Something like
fn (&mut I) -> Result<O, E>
vs.fn (I) -> Result<(I, O), E>
might help to close this performance gap, but i'm wondering if it would make writing parser much less ergonomic.One thing that further decreases performance are large output types. Surprisingly, I've seen quite big performance improvements by
Box
-ing large types that are passed though multiple layers of parser code. Of course, boxing by default is a performance footgun and usually makes the parser substantially slower.In addition to that, tuple parsers should be used carefully as they can run into the same performance issues when multiple large output types are involved or if they simply have too many items in them.
E.g. i replaced something like
with
in quite a few places to resolve performance issues.
"a"
vs.'a'
vs.b'a'
vs.b"a"
I had some confusion around parsers that accept all of these:
b'a'
,'a'
,b"a"
,"a"
. Which variant would yield the best performance? If there's a clear answer for the common case, maybe it's worth documenting that in the performance topic as well.I've settled for using the byte variants in most cases but can remember that using the "wrong" variant in certain places had noticeable impacts on parser performance. Sadly I cannot find an example anymore where it really mattered.
Final words
I hope I didn't exclusively mention things that you're already aware of and this feedback is useful for you to further improve
winnow
.If I happen to remember something else not already mentioned I'll try to add it here. 😄
Beta Was this translation helpful? Give feedback.
All reactions