Skip to content

Add guaranteed-reproducible PRNGs to rand? #1588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dhardy opened this issue Feb 14, 2025 · 13 comments
Open

Add guaranteed-reproducible PRNGs to rand? #1588

dhardy opened this issue Feb 14, 2025 · 13 comments
Labels
E-question Participation: opinions wanted

Comments

@dhardy
Copy link
Member

dhardy commented Feb 14, 2025

This question came up recently regarding a possible adoption to libstd (read from here), but I'm not sure we ever really asked the question of rand.

StdRng and SmallRng are deterministic but not reproducible (and in the latter case also not portable). Should we add a PRNG with guaranteed reproducibility as a new item under rand::rngs?

We already have five PRNGs available in rand if you count the ChaCha variants:

  • ChaCha8Rng, ChaCha12Rng, ChaCha20Rng
  • Xoshiro128PlusPlus, Xoshiro256PlusPlus

I'm not sure if we should ever add a guaranteed-reproducible ChaCha PRNG in rand since if we ever wanted to change the generator behind ThreadRng it would add dependencies. Given how long we've been using ChaCha in this role this may be less of an issue now.

The Xoshiro variants are more acceptable (if only because they require a lot less code; both are directly implemented in rand), though selecting one of these is likely sufficient, e.g. rang::rngs::Xoshiro256PlusPlus.

CC @hanna-kruppe @joshtriplett in case of interest

@dhardy dhardy added the E-question Participation: opinions wanted label Feb 14, 2025
@benjamin-lieser
Copy link
Member

If you want guaranteed reproducibility can't you just use the named PRNG? Maybe I am misunderstanding the question.

@dhardy
Copy link
Member Author

dhardy commented Feb 14, 2025

Yes — except that none of those named PRNGs are currently publicly export from rand.

Motivation is partially convenience and partially to make it more obvious how users may set up a reproducible PRNG (currently another crate must be added as a dependency).

@benjamin-lieser
Copy link
Member

Ah true, I remember having to do this.

I would say exporting Xoshiro256PlusPlus would be a good idea, also under this name.

@newpavlov
Copy link
Member

As argued in the linked issue, I don't think we need it and we should recommend use of a concrete PRNG crate (we could reference them in StdRng/SmallRng docs).

@hanna-kruppe
Copy link

When I'm sufficiently worried about long-term reproducibility that I'd opt for a generator with such a guarantee, I generally wouldn't be satisfied if the guarantee only covered RngCore methods or something like that. What I care about is that my program overall remains reproducible, which means e.g. any sampling Rng methods my program uses (and the trait impls backing them) can't have value-breaking changes either. Even if rand was willing to guarantee that for a larger subset of its APIs, it's very difficult for me as a user to ensure that I'm only using the guaranteed-stable subset. Depending directly on a specific rand_foo PRNG crate only solves this if you can make do with only rand_core::RngCore and avoid depending on rand entirely, but that's rare in my experience.

So I think rand, as a general-purpose crate that has good reasons to make value-breaking changes from time to time, is not in a good position to try and address the need for reproducibility. Offering it only for the simple cases, but not for the other APIs that come along for the ride, will result in just as many people being mistaken about whether their rand-using program will be reproducible with future releases of rand. That's not helping anyone.

@dhardy
Copy link
Member Author

dhardy commented Feb 14, 2025

So I think rand, as a general-purpose crate that has good reasons to make value-breaking changes from time to time, is not in a good position to try and address the need for reproducibility.

The same is true of any library offering a wide variety of random algorithms? The solution here is simple enough: use a fixed version of rand. We should not make value-breaking changes in patch releases (outside of security concerns, though this was never yet an issue).

@hanna-kruppe
Copy link

Using a fixed version is not great because it means I'll effectively be on my own with maintaining that code once upstream (quite reasonably) stops doing so. Whether value-breaking changes are made in patch releases or only in minor releases is immaterial -- eventually I'll have to choose between eating a value-breaking change or sticking with an unmaintained version of the library. This won't be an issue if my code stops being actively developed before upstream moves on, but in many cases I don't want to make assumptions about that. And if I end up having to vendor the library, I'd always prefer one that is as small and simple as possible for my specific use case over a library that does basically everything.

The only way around this is if a library is aligned with my priorities w.r.t. reproducibility: making a credible promise to avoid value-breaking changes, by only adding new APIs without changing the old ones (possibly deprecating them but ideally without the implication that they'll be removed eventually). Of course, that's undesirable for everyone who doesn't need long-term reproducibility and wants to get improvements automatically. But it's not inherently impossible for a maintainer to do that, if that's their priority.

@dhardy
Copy link
Member Author

dhardy commented Feb 14, 2025

eventually I'll have to choose between eating a value-breaking change or sticking with an unmaintained version of the library. [...] And if I end up having to vendor the library, I'd always prefer one that is as small and simple as possible for my specific use case over a library that does basically everything.

If you're talking about rand (not rand_distr), then unless you care about nightly features, there isn't much to maintain — about the only thing in rand v0.8 which "broke" is that gen will soon be a reserved keyword. As for bug fixes, v0.9 includes a couple of portability fixes and one single bug fix to IteratorRandom::choose_multiple_weighted for extremely small seeds (a value-breaking change, thus this could not be back-ported).

(I'm assuming you're not talking about maintenance of security — but even here nothing of note happened in the last four years, and if it did I expect that we would release a patch.)

So I don't buy your argument that rand is not a good choice if you care about long-term reproducibility.

@hanna-kruppe
Copy link

I don't know how easy or hard it would be for me to take over bugfix-only maintenance of a specific rand version. Since I'm not familiar with the code base or its history, determining that for myself would take non-trivial effort. I appreciate you sharing information about this now, but imagine if we weren't having this conversation and I'd just be looking at docs.rs/rand to make my decision. That part is just less daunting with a library that's less than, say, 1K lines of code.

In any case, if I'm happy to use a fixed version of a library then it doesn't matter if the library offers reproducibility guarantees across its releases (of course, consistent results across platforms still matter). If I'll be using rand 0.8.5 forever, then I'm not affected by value-breaking changes in later releases. Conversely, if I want to avoid pinning a specific version and instead keep updating rand, then I need reproducibility guarantees for all APIs that I'm using or might use by accident in the future, not just for the RngCore impls. That's what my first comment was about: to enable meaningful long-term reproducibility without version pinning, rand would have to make a much stronger commitment than just keeping some specific PRNG impls intact. I don't think rand can reasonably do that without unduly compromising on competing priorities.

@cmcqueen
Copy link

cmcqueen commented Apr 1, 2025

I have made a simplerandom crate with the goal of reproducible RNGs, that are cross-platform and even cross-language (so far I have made C, Python and Rust).

@hanna-kruppe
Copy link

hanna-kruppe commented Apr 1, 2025

I’m a bit puzzled by some choices made in your library. Most importantly, while the raw next_{u32,u64} outputs may be the same across languages because the same basic algorithm is implemented in each language, any other data types and distributions (even very commonly needed ones like “integer in range” or “floating point number”) appear to be missing or are implemented differently in different languages. In particular, the Rust version just implemented RngCore (and used rand_core helpers for that), so users of your crate (1) can’t meaningfully reproduce results obtained in Rust with the Python or C versions, and (2) even if they stick to Rust they’ll be affected by value-breaking changes in rand unless the rand version stays pinned. So I don’t see how your library helps with what I consider the difficult part of RNG reproducibility, as laid out in previous comments.

(The second thing that makes it unlikely for me to use your crate is the large selection of algorithms, none of which seem particularly good by today’s standards. I understand that you started this project many years ago, but this is a perfect example of the downsides of “long term reproducibility” - eventually it becomes “only used for legacy reasons”)

@dhardy
Copy link
Member Author

dhardy commented Apr 1, 2025

rust-random/book#82 is possibly of interest to people here.

It technically reduces the guarantees, allowing SmallRng and StdRng algorithms to be changed in any patch release post-1.0. In theory, this reduces the number of reasons we might have to make a new minor release, thereby hopefully improving the longevity of 1.0 (and any potential successor).

It also includes a note about supporting old releases with new patches, including the possibility of back-porting new (compatible) features.

This is about the best compromise I can see between stability guarantees and avoiding stagnation.

@cmcqueen
Copy link

cmcqueen commented Apr 1, 2025

@hanna-kruppe yes you're right, I have focused on reproducibility for the core integer generator, but not the floating-point API or other derivative APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-question Participation: opinions wanted
Projects
None yet
Development

No branches or pull requests

5 participants