Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opportunistic zero copy parsing for escaped_transform using Cow #543

Open
2 tasks done
VorpalBlade opened this issue Jun 15, 2024 · 3 comments
Open
2 tasks done

Opportunistic zero copy parsing for escaped_transform using Cow #543

VorpalBlade opened this issue Jun 15, 2024 · 3 comments
Labels
A-combinator Area: combinators C-enhancement Category: Raise on the bar on expectations M-breaking-change Meta: Implementing or merging this will introduce a breaking change.

Comments

@VorpalBlade
Copy link
Contributor

Please complete the following tasks

winnow version

0.6.13

Describe your use case

Something like this works:

/// Unquoted string value
fn unquoted_string_with_escapes(i: &mut &str) -> PResult<String> {
    escaped_transform(take_till(1.., [' ', '\t', '\n', '\r', '\\']), '\\', escapes).parse_next(i)
}

fn escapes<'input>(i: &mut &'input str) -> PResult<&'input str> {
    alt((
        "n".value("\n"),
        "r".value("\r"),
        "t".value("\t"),
        " ".value(" "),
        "\"".value("\""),
        "\\".value("\\"),
    ))
    .parse_next(i)
}

But this doesn't compile:

fn unquoted_string_with_escapes<'input>(i: &mut &'input str) -> PResult<Cow<'input, str>> {
    escaped_transform(take_till(1.., [' ', '\t', '\n', '\r', '\\']), '\\', escapes).parse_next(i)
}

fn escapes<'input>(i: &mut &'input str) -> PResult<&'input str> {
    alt((
        "n".value("\n"),
        "r".value("\r"),
        "t".value("\t"),
        " ".value(" "),
        "\"".value("\""),
        "\\".value("\\"),
    ))
    .parse_next(i)
}

Describe the solution you'd like

It would be nice if escaped_transform could have zero copy parsing when the string happens to contain no escapes (which is the common case for my use case).

Alternatives, if applicable

I could try to write my own parser from scratch instead of using escaped_transform and make it support this. I would rather not and also it would be nice to have something that is reusable and available for others too.

Additional Context

No response

@VorpalBlade VorpalBlade added the C-enhancement Category: Raise on the bar on expectations label Jun 15, 2024
@VorpalBlade VorpalBlade changed the title Opportunistic zero copy parsing for escaped_transform Opportunistic zero copy parsing for escaped_transform using Cow Jun 15, 2024
@epage epage added A-combinator Area: combinators M-breaking-change Meta: Implementing or merging this will introduce a breaking change. labels Jun 20, 2024
@epage
Copy link
Collaborator

epage commented Jun 20, 2024

This would require having more knowledge of the underlying types than is currently supported. At minimum its a breaking change. I suspect this would require a new trait which I'm hesitant to add for this one specific parser.

My own biases also lean towards not addressing this. When I'm prototyping, I use whatever gets the job done. If I'm writing a parser for a grammar, I implement everything myself so there is a clear connection between the grammar and my code (example). This makes me see escaped_transform as more of a tool for prototyping than one for performant production code.

@VorpalBlade
Copy link
Contributor Author

So I had a bit of a think about your response. And I don't think I agree. To me it seems that functionality should be as fast as it can possible be. You shouldn't have to reinvent the wheel to be fast in Rust.

Unfortunately that would (if it needs to know the underlying type more precisely) currently require specialisation as I understand it. That would indeed be a blocker for making escaped_transform do this. Otherwise it could perhaps be a separate function escaped_transform_str?

@epage
Copy link
Collaborator

epage commented Jun 27, 2024

Yes, in general winnow aims for speed. However, it isn't our highest priority. For example, we likely could be made even faster with GATs, like chumsky and nom v8 use, but they run counter to more important principles, like being a toolbox instead of a framework and having the built-in parsers be written in a way similar to how users would.

We have a general implementation of escaped_transform. It isn't the fastest for all cases but it supports a large variety of cases. That leaves us with

  • Narrow supported cases
  • Have a second implementation
  • Do nothing
  • Remove it completely

For me, it feels too specialized to have a second implementation (and, in general, I've been trying to avoid that).

I also feel that the current version has the right ergonomics for most users, so I would prefer not to narrow supported cases.

I'm assuming people won't want it removed.

That leaves "do nothing".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-combinator Area: combinators C-enhancement Category: Raise on the bar on expectations M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Projects
None yet
Development

No branches or pull requests

2 participants