Skip to content

Replace (usize, Option<usize>) with custom type SizeHint and eliminate try_size_hint(). #218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kpreid
Copy link
Contributor

@kpreid kpreid commented Apr 10, 2025

The SizeHint type provides these advantages:

  • Easier to construct by composition, using + and | or methods.
  • More self-explanatory.
  • Distinguishes between “unknown size” and “known to be unbounded” to aid debugging.

Since this is a breaking change to the trait, this PR also makes the closely related change of combining try_size_hint() and size_hint() into a single fallible function. This will make it easier to correctly implement the trait.

Also, while I was touching the macro code, I made all the crate paths absolute (::arbitrary), as they should be for the maximum compatibility.

Fixes #201.
Part of #217.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for whipping up this PR!

I think we still have some design questions to answer before we commit to a particular approach. See inline comments below.

src/size_hint.rs Outdated
Comment on lines 90 to 105
/// When the depth is not too deep, calls `f` with `depth + 1` to calculate the
/// size hint.
///
/// Otherwise, returns an error.
///
/// This should be used when implementing [`Arbitrary::size_hint()`]
/// for generic types.
pub fn recursion_guard(
depth: usize,
f: impl FnOnce(usize) -> Result<Self, crate::MaxRecursionReached>,
) -> Result<Self, crate::MaxRecursionReached> {
if depth > MAX_DEPTH {
Err(crate::MaxRecursionReached {})
} else {
f(depth + 1)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's switch from depth to fuel, from incrementing to decrementing, and from depth > MAX_DEPTH to fuel == 0.

I think we will may also want to preserve the fuel count across size_hint and recursion_guard invocations such that if we are given fuel = 100 that doesn't mean "process type trees of up to depth 100" but instead "call up to 100 size_hint/recursion_guard methods". The latter is a better overall limit on compute spent on size hints. The former isn't very useful because a depth of 10 already gets to binary trees of size 1024 but also isn't very deep at all for non-pathological/non-exponential cases.

This probably involves defining and threading a new context type through size_hint calls and removing public constructors for SizeHint and replacing them with methods on the context, something like

pub struct SizeHintContext {
    fuel: usize,
}

impl SizeHintContext {
    pub fn new(fuel: usize) -> Self {
        Self { fuel }
    }

    fn consume_fuel(&mut self) -> Result<()> {
        if *self.fuel == 0 {
            return Err(());
        }
        *self.fuel -= 1;
        Ok(()) 
    }
    
    pub fn exactly(&mut self, size: usize) -> Result<SizeHint> {
        self.consume_fuel()?;
        Ok(SizeHint {
            lower: size,
            upper: Upper::Bounded(size),
        })
    }

    pub fn at_least(&mut self, size: usize) -> Result<SizeHint> {
        self.consume_fuel()?;
        Ok(SizeHint {
            lower: size,
            upper: Upper::Unbounded,
        })
    }
    
    // Other `SizeHint` constructors and combinators...
}

// ...

pub trait Arbitrary {
    fn size_hint(context: &mut SizeHintContext) -> Result<SizeHint> {
        SizeHint {
            lower: 0,
            upper: Upper::Unbounded,
        }
    }
}

It might even make sense to actually put the SizeHint inside the SizeHintContext and just have methods that mutate the size hint being built, rather than returning new size hints that we then need to combine. This would further disincentivize constructing a new SizeHintBuilder in the middle of a size_hint method implementation, which would defeat the purpose of having fuel in the first place. Doing it this way would also allow extracting the half-built SizeHint and getting at least an idea of the lower bound, even when we run out of fuel. (Maybe the consume_fuel method should set upper to unbounded if it returns an error?)

This approach effectively turns the context into a builder:

pub struct SizeHintBuilder {
    fuel: usize,
    lower: usize,
    upper: Upper,
}

impl SizeHintBuilder {
    pub fn new(fuel: usize) -> Self {
        Self {
            fuel,
            lower: 0,
            upper: Upper::Bounded(0),
        }
    }
    
    fn consume_fuel(&mut self) -> Result<()> {
        if *self.fuel == 0 {
            return Err(crate::Error::SizeHintOutOfFuel);
        }
        *self.fuel -= 1;
        Ok(()) 
    }
    
    pub fn finish(self) -> SizeHint {
        SizeHint {
            lower: self.lower,
            upper: self.upper,
        }
    }
    
    pub fn exactly(&mut self, size: usize) -> Result<()> {
        self.consume_fuel()?;
        self.lower = self.lower.checked_add(size).ok_or(())?
        self.upper += Upper::Bounded(size);
        Ok(())
    }

    pub fn at_least(&mut self, size: usize) -> Result<()> {
        self.consume_fuel()?;
        self.lower = self.lower.checked_add(size).ok_or(())?
        self.upper += Upper::Unbounded;
        Ok(())
    }
    
    pub fn unknown(&mut self) -> Result<()> {
        self.consume_fuel()?;
        self.upper += Upper::Unknown;
        Ok(())
    }
    
    // Other `SizeHint` constructors and combinators...
}

pub trait Arbitrary {
    fn size_hint(builder: &mut SizeHintBuilder) -> Result<()> {
        builder.unknown()
    }
}

What do other arbitrary maintainers think of the above approaches? I'm kind of partial to the builder-style API myself.

cc @Manishearth @nagisa

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the and_all and or_all combinators get a little trickier with the builder style API rather than the context-style API. We would either need to create sub-builders, which makes it easy to defeat the purpose of a single shared fuel value, or we would have to do some nasty generics magic which makes the API impenetrable. Or some other funky API to thread the fuel through.

This is making me lean back towards the context-style approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builder seems fine, I'm worried about too much additional computation in what is essentially pre-fuzz infrastructure

Copy link
Contributor Author

@kpreid kpreid Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the and_all and or_all combinators get a little trickier with the builder style API

Even or() is difficult to do that way because you need to take the minimum of multiple independent sums.

That said, I think it would make sense to track fuel in, not a “builder” (makes one output value) but a “factory” (makes many); methods of the factory are constructors of SizeHint values (or error). I will see about trying that out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Manishearth

I'm worried about too much additional computation in what is essentially pre-fuzz infrastructure

Nothing here should be any more computationally complex than what we have today, it is all just hiding implementation details in non-public struct fields for the most part.

Copy link
Member

@fitzgen fitzgen Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpreid

That said, I think it would make sense to track fuel in, not a “builder” (makes one output value) but a “factory” (makes many); methods of the factory are constructors of SizeHint values (or error).

This sounds like the context-style API above, no? If not, then I am curious what you are imagining that is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you are correct, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting back to working on this (sorry for the long delay). I think I have discovered an elegant further improvement: If the Context interposes and counts fuel in every recursive call (that is, each size_hint() implementation calls context.get::<FieldType>() to recurse, not FieldType::size_hint(context)), then there is no need to use Result and the ? operator to handle short-circuiting, and if we do it this way

  • size_hint()s which hit the computation limit can have a better chance of returning a lower bound with partial information, rather than only an error
  • If we don't also count fuel in constructors, then “leaf” size_hint() methods don't use the context at all and will always return the correct answer even if out of fuel, aiding the previous point
  • no top-level callers ever have to decide how to handle Err and perhaps do it incorrectly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Please take a look and tell me what you think.

src/size_hint.rs Outdated
Comment on lines 26 to 35
/// The size is known to be unbounded (or overflow [`usize`]).
Unbounded,
/// The size is unknown because a more specific size hint was not given.
/// This is functionally identical to `Unbounded` but prints differently.
Unknown,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that there is a meaningful difference between these variants. Just having a different Debug print doesn't really seem worth it to me.

Copy link
Contributor Author

@kpreid kpreid Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included this feature precisely because, if it had existed, it would have helped me troubleshoot a missing size hint method.

…size>)>`.

Introducing the `SizeHint` type provides these advantages:

* Easier to construct by composition, using `+` and `|` or methods.
* More self-explanatory.
* Distinguishes between “unknown size” and “known to be unbounded”
  to aid debugging.

The `Context` instead of `depth` provides these advantages:

* Fully bounded computation with no dependence on the recursion
  breadth.
* Elimination of `Result`, making `size_hint()` implementors simpler
  and allowing reporting of partially-known lower bounds rather than
  discarding all information on error.

Since there are no longer any `Result`s, `try_size_hint()` is of course
gone, making it easier to correctly implement the trait. But that could
also have been done by adding the `Result` return to `size_hint()` if we
were keeping it.

Also, while I was touching the macro code, I made all the crate paths
absolute (`::arbitrary`), as they should be for the maximum compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider a custom type for easier size hints
3 participants