Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better API for handling input that has already been parsed #22

Open
jimblandy opened this issue May 1, 2022 · 7 comments · May be fixed by #23
Open

A better API for handling input that has already been parsed #22

jimblandy opened this issue May 1, 2022 · 7 comments · May be fixed by #23

Comments

@jimblandy
Copy link

The Naga crate uses hexf_parse to handle hex float literals in WGSL. Our front end has already verified that the text conforms to WGSL's grammar for hex float literals, but since that grammar has subtle differences from this library's grammar, we have to use format! to reassemble the pieces into what parse_hexf64 expects.

It'd be nicer if this library could accept a struct like this:

struct HexFloatParts<'a> {
    negative: bool,
    integer: Option<&'a str>,
    fraction: Option<&'a str>,
    exponent: Option<&a str>,
}
@youknowone
Copy link
Collaborator

Do you think splitting parse to 4 sub functions can be a proper approach?
Any suggestion or contribution will be welcome

@jimblandy
Copy link
Author

Looking at the code for hexf_parse::parse, it returns a broken-out triple (sign, mantissa, exponent) which is really close to ideal for us. The parse code makes the (reasonable) decision to simultaneously parse the number and convert digit characters to numbers, whereas ideally, Naga's lexer would only need to identify the sections of text, and could leave all numeric handling to hexf_parse.

Simply exposing the convert_hexf32 and convert_hexf64 functions would be very close to what we want. I just worry about subtleties in overflow detection, adjusting the exponent, and so on.

Maybe the happy medium would be to do the digit -> number conversion in Naga, but let convert_hexfNN take over exponent handling. I'll try putting together a PR for that.

@jimblandy
Copy link
Author

@youknowone What would you think of having parse produce a type like this?

/// The parsed form of a floating-point number.
#[derive(Debug, Eq, PartialEq)]
pub struct Parsed {
    negative: bool,
    integral: u64,
    fractional: u64,
    num_fractional_digits: isize,
    exponent: isize,
}

Adapting the tests is quite a lot of not-so-interesting work, so I thought I'd check here before proceeding.

@jimblandy
Copy link
Author

See #23 for what this might look like.

@teoxoy
Copy link

teoxoy commented May 10, 2022

There is this PR aldanor/fast-float-rust#25 for fast-float-rust (which got its internals into rust's std lib) that might be a source of inspiration API wise. They seem to be going for:

pub fn parse_from_parts<T: FastFloat, S: AsRef<[u8]>>(integral: S, fractional: S, exponent: i64, negative: bool) -> T;

I'm not sure what's best here (for instance why is the exponent an i64 instead of another S) but so far the Parsed struct approach feels like it's leaking some internals (for instance users of the API would have to handle errors that hexf was previously handling).

What do you guys think about the following signature (it's similar to the initial proposal without the Options since we can just pass in empty &strs)?

pub fn parse_from_parts_hexf64(negative: bool, integral: &str, fractional: &str, exponent: &str) -> f64;

@jimblandy
Copy link
Author

jimblandy commented May 10, 2022

As to the suggested signature, here's why I moved away from &str to u64:

It seems desirable that this new API should be suitable for use by the hexf_parse crate itself, to connect its parsing and float-assembly steps. If the API isn't good enough for hexf_parse itself to use, then it won't be good enough for some its clients.

The thing about passing &str is that it means you need to first parse out the strings of digits - possibly making a copy in a temporary buffer if you want to skip separators - and then make a pass over them again to find their value. It's easy to accumulate the bits as you parse the digits.

users of the API would have to handle errors that hexf was previously handling

This is a fair point. Not optimal.

But as long as the function that consumes Parsed structures fully validates the values received there, then the only error checking the caller actually needs to do is making sure it doesn't drop bits. It looks to me like the error checks in hexf_parse::parse fall into two categories:

  • syntax errors, which parsing front ends want to handle themselves - that's the whole point, we have our own lexer with its own rules

  • bit loss errors, which are not that hard to catch.

@teoxoy
Copy link

teoxoy commented May 28, 2022

What I had in mind is extracting parts of the original parse function that could run on substrings of the original input; then the original parse function and also the new parse_from_parts function could use those (this would remove the need of having to go through each character more than once). I briefly looked at the code and this seems quite doable.

Regarding the separator, if we'd like to add that as part of the parse_from_parts, it could be a new Option<char> arg.

@jimblandy @youknowone what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants