A better API for handling input that has already been parsed #22

jimblandy · 2022-05-01T21:57:00Z

The Naga crate uses hexf_parse to handle hex float literals in WGSL. Our front end has already verified that the text conforms to WGSL's grammar for hex float literals, but since that grammar has subtle differences from this library's grammar, we have to use format! to reassemble the pieces into what parse_hexf64 expects.

It'd be nicer if this library could accept a struct like this:

struct HexFloatParts<'a> {
    negative: bool,
    integer: Option<&'a str>,
    fraction: Option<&'a str>,
    exponent: Option<&a str>,
}

The text was updated successfully, but these errors were encountered:

youknowone · 2022-05-06T20:21:26Z

Do you think splitting parse to 4 sub functions can be a proper approach?
Any suggestion or contribution will be welcome

jimblandy · 2022-05-10T17:04:54Z

Looking at the code for hexf_parse::parse, it returns a broken-out triple (sign, mantissa, exponent) which is really close to ideal for us. The parse code makes the (reasonable) decision to simultaneously parse the number and convert digit characters to numbers, whereas ideally, Naga's lexer would only need to identify the sections of text, and could leave all numeric handling to hexf_parse.

Simply exposing the convert_hexf32 and convert_hexf64 functions would be very close to what we want. I just worry about subtleties in overflow detection, adjusting the exponent, and so on.

Maybe the happy medium would be to do the digit -> number conversion in Naga, but let convert_hexfNN take over exponent handling. I'll try putting together a PR for that.

jimblandy · 2022-05-10T20:19:14Z

@youknowone What would you think of having parse produce a type like this?

/// The parsed form of a floating-point number.
#[derive(Debug, Eq, PartialEq)]
pub struct Parsed {
    negative: bool,
    integral: u64,
    fractional: u64,
    num_fractional_digits: isize,
    exponent: isize,
}

Adapting the tests is quite a lot of not-so-interesting work, so I thought I'd check here before proceeding.

Fixes lifthrasiir#22.

jimblandy · 2022-05-10T20:38:06Z

See #23 for what this might look like.

teoxoy · 2022-05-10T21:34:30Z

There is this PR aldanor/fast-float-rust#25 for fast-float-rust (which got its internals into rust's std lib) that might be a source of inspiration API wise. They seem to be going for:

pub fn parse_from_parts<T: FastFloat, S: AsRef<[u8]>>(integral: S, fractional: S, exponent: i64, negative: bool) -> T;

I'm not sure what's best here (for instance why is the exponent an i64 instead of another S) but so far the Parsed struct approach feels like it's leaking some internals (for instance users of the API would have to handle errors that hexf was previously handling).

What do you guys think about the following signature (it's similar to the initial proposal without the Options since we can just pass in empty &strs)?

pub fn parse_from_parts_hexf64(negative: bool, integral: &str, fractional: &str, exponent: &str) -> f64;

jimblandy · 2022-05-10T22:50:02Z

As to the suggested signature, here's why I moved away from &str to u64:

It seems desirable that this new API should be suitable for use by the hexf_parse crate itself, to connect its parsing and float-assembly steps. If the API isn't good enough for hexf_parse itself to use, then it won't be good enough for some its clients.

The thing about passing &str is that it means you need to first parse out the strings of digits - possibly making a copy in a temporary buffer if you want to skip separators - and then make a pass over them again to find their value. It's easy to accumulate the bits as you parse the digits.

users of the API would have to handle errors that hexf was previously handling

This is a fair point. Not optimal.

But as long as the function that consumes Parsed structures fully validates the values received there, then the only error checking the caller actually needs to do is making sure it doesn't drop bits. It looks to me like the error checks in hexf_parse::parse fall into two categories:

syntax errors, which parsing front ends want to handle themselves - that's the whole point, we have our own lexer with its own rules
bit loss errors, which are not that hard to catch.

teoxoy · 2022-05-28T10:24:01Z

What I had in mind is extracting parts of the original parse function that could run on substrings of the original input; then the original parse function and also the new parse_from_parts function could use those (this would remove the need of having to go through each character more than once). I briefly looked at the code and this seems quite doable.

Regarding the separator, if we'd like to add that as part of the parse_from_parts, it could be a new Option<char> arg.

@jimblandy @youknowone what do you think?

jimblandy mentioned this issue May 1, 2022

[wgsl-in] Overhaul number lexing / parsing gfx-rs/naga#1863

Merged

jimblandy added a commit to jimblandy/hexf that referenced this issue May 10, 2022

Make float assembly functions take a parsed representation.

a71d968

Fixes lifthrasiir#22.

jimblandy linked a pull request May 10, 2022 that will close this issue

Make float assembly functions take a parsed representation. #23

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A better API for handling input that has already been parsed #22

A better API for handling input that has already been parsed #22

jimblandy commented May 1, 2022

youknowone commented May 6, 2022

jimblandy commented May 10, 2022

jimblandy commented May 10, 2022

jimblandy commented May 10, 2022

teoxoy commented May 10, 2022

jimblandy commented May 10, 2022 •

edited

Loading

teoxoy commented May 28, 2022

A better API for handling input that has already been parsed #22

A better API for handling input that has already been parsed #22

Comments

jimblandy commented May 1, 2022

youknowone commented May 6, 2022

jimblandy commented May 10, 2022

jimblandy commented May 10, 2022

jimblandy commented May 10, 2022

teoxoy commented May 10, 2022

jimblandy commented May 10, 2022 • edited Loading

teoxoy commented May 28, 2022

jimblandy commented May 10, 2022 •

edited

Loading