-
-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move PRNG algorithms out of Rand #431
Comments
Why not have one crate per RNG?
…On Sun, May 6, 2018, 15:40 Paul Dicker ***@***.***> wrote:
As decided in dhardy#58 <dhardy#58>.
The idea is to only keep the StdRng and SmallRng wrappers in Rand, and to
not expose PRNG algorithms directly.
The field of PRNG algorithms is not static, it seems like a good idea to
cut Rand partly loose from the developements to improve our
backward-compatability and usability story. Advantages (from the original
issue)
- rand provide easy to reason generator(s) to satisfy the bulk of the
users
- rand provides low-friction upgrades and keeps the optimal-ish
algorithms readily available
- rand will not accumulate and/or deprecate specific algorithms
throughout time
- Specific rand Rng names won't bleed out to the world, like in forum
posts, stackoverflow, etc..
Instead we want to provide an official companion crate, rand_rngs, with a
couple of blessed implementations.
Whether Rand should depend on rand_rngs or copy the two algorithms used
by StdRng and SmallRng is not fully decided. It might reduce code size if
a crate depends on both StdRng and its algorithm (currently Hc128Rng)
directly, but that seems rare and not worth much. It has the disadvantage
that both implementations should remain in sync. Maybe the CI can check
this?
If Rand depends on rand_rngs, it would restrict rand_rngs to only depend
on rand_core. But that is what rand_core is for. I am for having Rand
depend on rand_rngs.
rand_rngs
rand_rngs should provide a good selection of PRNGs.
Including multiple PRNGs in one crate has the advantage of having a single
place to provide guidance, as it helps comparing different algorithms with
each other. Also it gives one place to offer consistent benchmarks, to run
PRNG test suites, to keep up a common quality level, and possibly to
develop functionality that is useful for more than one PRNG (e.g. jumping).
Which PRNGs to include has been the topic of endless discussions. A few
are basically decided. I fully expect the number of PRNGs to grow over
time. But we also should be careful not to expose too many, as there are
hundreds. Every PRNG should have one thing that gives it a clear advantage
over others, otherwise it is not worth the inclusion.
Normal PRNGs
dhardy#60 <dhardy#60> explored normal
PRNGs. The following 5 I feel comfortable about for an initial version of
rand_rngs:
- PCG-XSH-RR 64/32 (LCG)
- PCG-XSL-RR 128/64 (MCG)
- SFC (32-bit variant)
- SFC (64-bit variant)
- Xoroshiro128+
The two PCG variants offer good quality and reasonable performance. SFC
provides high performance and a chaotic PRNG (not a fixed period).
Xoroshiro128+ has acceptable quality but high performance.
An PRNG put together by me, XoroshiroMT, is also good quality and sits
between PCG and Xoroshiro128+ qua performance. But now that there is just a
new Xoshiro PRNG, it may be worth evaluating that one first, as it is said
to also be good quality.
CSPRNGs
For CSPRNGs we currently have two good implementations in Rand
- ChaCha20
- HC-128
ChaCha20 offers reasonably good performance and uses little memory, while
HC-128 brings high performance at the cost of using much memory.
I would also like to see some implementation of AES-CTR DRBG eventually,
as it is commonly used, and has hardware support on modern desktop
processors.
Other / deprecated PRNGs
We currently have the ISAAC, ISAAC-64 and Xorshift128/32 PRNGs in rand.
They have no real advantages over the PRNGs metioned before. It is better
to use HC-128 instead of ISAAC, and Xoroshiro128+ or PCG instead of
Xorshift. We can include them in something like a deprecated module, but
I propose to move them to stand-alone crates outside the Rand repository.
Steps to take
- move the prngs module to a seperate rand_rngs crate in the Rand
repository, similar to rand_core.
- move generators benchmarks over.
- lift PRNG implementations from
https://github.com/pitdicker/small-rngs (I have updated it to master a
while ago, but still have to push the changes)
- lift Xoroshiro128+ from https://github.com/vks/xoroshiro/
Does this sound about right?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#431>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACCtJsl20y6PoCkWCeYCSRhM0dRq4Pvks5tvv0ygaJpZM4T0Cak>
.
|
See the first post and the linked issue. |
I actually see no reason that we shouldn't have one crate per RNG (or perhaps family of RNGs), but I think it's worth having these in the Rand repo and the crates owned by the Rand team (currently me and the libs team). Or we could try organising more and have a |
Both the new xoshiro and xoroshiro** variants I think are worth consideration. Vigna claims xoroshiro256** has a higher degree polynomial than PCG 128 variants https://www.reddit.com/r/programming/comments/8gx2d3/new_line_of_fast_prngs_released_by_the_author_of/dygkkng/, so xoshiro variants are probably even stronger. Edit: poor wording |
Is the separate repository is merely a transitional detail? It improves code size if any code that gets used actually gets exposed somewhere, even if only in another crate. It would improve tests if all this lives in the rand repository, right? |
@TheIronBorn this issue isn't about what algorithms we include, it's just about organisation and presentation. |
@burdges I'm not sure what you mean about transitional; no. The easiest option is I think just to have sub-projects in this repo (like I'm not even sure if it does help with testing having everything in the same repo; in one way it makes it worse since CI must run all unit-tests on any change (although with API-breaking changes this may be necessary). Once we have stable core APIs it may make more sense to use separate repos since most changes (docs, new features, tweaks to other features) will not affect other crates. I.e. once |
In this issue and the previous one I have given arguments that it would really benefit users to have one crate, as it helps comparing and contrasting them. And real but smaller benefits for us. Are those reasons wrong? Maybe it is good to also collect the reasons multiple crates can be a good idea. Two related thoughts:
The unit tests we have now for PRNGs run in a fraction of a second, easily a 100 times less than the set-up time for Travis. |
There is precedence to have one crate per algorithm in [RustCrypto](
https://github.com/RustCrypto/hashes?files=1). You can still have a crate
reexporting them all, but I'm not sure how useful that is. It might make
sense to have some comparison in Rand's documentation, where we advertise
the recommended RNGs and their crates.
…On Mon, May 7, 2018, 08:32 Paul Dicker ***@***.***> wrote:
In this issue and the previous one I have given arguments that it would
really benefit users to have one crate, as it helps comparing and
contrasting them. And real but smaller benefits for us. Are those reasons
wrong?
Maybe it is good to also collect the reasons multiple crates can be a good
idea.
Two related thoughts:
- most non-cryptographic PRNGs only need 5-10 lines for the algorithm.
About 30 additional lines to implement the necessary traits. Then a handful
of tests, documentation, license header, and benchmarks. Seems a bit small
for a crate to me, but that is not unique.
- how do we show separate crates are similar enough to compare
directly (benchmarks, quality, support)? Anyone is free to create a
rand_* crate.
CI must run all unit-tests on any change.
The unit tests we have now for PRNGs run in a fraction of a second, easily
a 100 times less than the set-up time for Travis.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#431 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACCtKxNDruFXra7bvopRZ0gZTadbl2_ks5tv-qSgaJpZM4T0Cak>
.
|
So sounds like
How many crates do we want? Only 1-3 (e.g. crypto, non-crypto and deprecated)? One crate per family of RNGs? One crate per RNG?
@newpavlov do you have anything to say about the many-small-crates design of RustCrypto's hashes (and other projects)? |
In my opinion small crates work pretty well (but not too small, i.e. crates for family of algorithms, not for each variation), yes there is some headache with coordinated upgrades to the new trait version, but if
RustCrpyto partially solves this problem by having feature gated I would like suggest to move
Con is of course is the split of RNG crates between two organizations. |
Actually we already stopped using |
It'd be nice to save code size by using stream cipher code for CSPRNGs of course, but not that pressing. At first blush, I kinda think moving CSPRNG crates into another project, especially one less blessed by Mozilla, sounds premature and distracting. At minimum, all the BlockRng business will make reading the code annoying. |
With our current PRNGs and currently-proposed additions, a crate-per-family approach looks something like:
Lets not worry about the generators for now (there will probably be more); this is several crates already, and some of them will be very small (pcg unless we add more algorithms, sfc, xorshift unless combined with xoroshiro; also hc and chacha are only 500 lines). There is nothing inherently wrong with this, but it is a significant amount extra boilerplate just to save pulling unused PRNGs into a project (the entire Are there any alternatives worth discussing? |
I would probably use shorter names and drop the |
Just to be sure, the only real argument for having multiple crates is that at some point is becomes 'cleaner' for us to deprecate an RNG? And it adds the problem of boilerplate, making the PRNGs harder to compare, more difficult to guide users, and anyone is free to make a crate with a seemingly official name. |
I am afraid this would only cause more work, not less. While algorithms may be (mostly) the same, RustCrypto and Rand can have different goals. What happens when one decides some algorithm is fit for inclusion, but the other does not. Or that the trade-off for performance vs security should be different? We would have to have a very clear sense of direction and and idea of what both projects see in such a crate. Now I am not saying we can't work together, and work together well. But at this point I think it is better for us if we keep this only in mind as an option, and rethink it next year or something like that. Rand has changed a lot in the past couple of months, and it seems like a good idea to me to remain flexible and get our story straight first. Especially the amount of security features to provide is something we have quite a few open issues for, and seems a bit fuzzy. |
if it's officially published algorithm (i.e. not home-brew crypto) designed as CSRNG I will be happy to see it in the repo, so RustCrypto/CSRNGs is intended to be more inclusive than rand.
Ehm, for |
😄 Maybe I should write a bit clearer about which trade-offs I had in mind. Nothing terrible I hope. As an example, the Another one I have in mind is fork protection. I hope we can add that to the |
Zeroing ChaChaRng output does nothing. If you can hammer the offset then you can hammer the block number, well they even live in the same struct. |
What's the alternative — maintain a crate that may grow to a hundred or more PRNGs, or be extremely selective about which we include? Perhaps we won't include enough PRNGs for size to be an issue, but in that case what's the motivation for using an external crate in the first place? The only other reason I can see for separate crates is to allow an app to use |
PractRand contains 200+ RNGs, and that is already a selection. In my opinion having that many is not useful. Even the opposite, too many choices distract and complicate, so having many choice would be bad. What I had in mind (with the first post, and the relevant issues in your repro), was to have a small, well chosen selection. Not all that different from how we work now. Users that want more than My idea was not to have some super-crate or organization to implement every (possibly even obsolete) PRNG design under the sun. But to have a good resource for users. And other crates can still have a role in providing more variants or features, like the current PCG crate. |
@pitdicker What you are suggesting would be possible by reexporting the smaller crates in a "curation" crate. I would prefer this over duplicating code. Note that a small selection might become not so small as algorithms become obsolete and get replaced but have to be kept for backwards compability. |
In this case we have three options:
If we keep the number of choices low (e.g. 20 algorithms) then the first option is a good choice, though as @vks says this may grow over time (removing algorithms is not a good idea except perhaps if a CSPRNG is found to be insecure, because it causes problems for any libraries needing reproducibility with old algorithms and requires extra maintenance). Also, no one has answered my other question yet: if we don't have per-family crates, is there any point at all moving the algorithms out of the main Rand crate? |
Well, nothing has happened in the last month, yet I feel this issue is quite important. I suggest we move forward as follows:
We should move forward with some small(er) crates implementing families of PRNGs. On a case-by-case basis we decide whether to host these externally or within the Rand repository (which may or may not be considered to imply a higher level of review). (Alternatively we could start a "Rust Rand" organisation on GitHub; not sure that it's worth it though.) The following PRNG crates already exist:
Related, external RNGs:
Quality and maintenance:
Therefore I think we should slightly prefer internal implementations of PRNGs in our documentation and be open to pull-requests to add PRNGs here, but not require it. Promoting RNGs: I'm not convinced at this time it's worth having a "curation crate", though we could do so. A better option might be to add a document to Rand reviewing various RNG crates (there are a few other things which could go in a Rand "book"). Organisation: Edit: we should probably just use the crate names without subdirectory.
"Crypto" RNGsWe currently have only two implementations, Since the ISAAC generators are no longer used by NamingThis would be a good time to rename PRNGs, if appropriate. Currently all internal PRNGs have a Crate names: I suggest that all crates created as part of the Rand project use the prefix Does this sound like a good plan? |
I agree with your plan, it is almost exactly what I would suggest as well.
In addition, I would recommend to use specific RNGs and their crates for reproducibility. At the same time, we could explicitly weaken the reproducibility guarantees for About the |
The reproducibility limitations already apply to I also thought the use of generics in that crate is a bit over the top (the same criticisms apply to @imneme's libraries — most users want one relatively simple PRNG; not metacode for building hundreds of variants of generators). So I'd rather start from @pitdicker's code. |
I added a tracker at the top of this issue. @vks do you want to keep your xoshiro crate external or move it into this repo? I really don't care either way; externally you get more freedom but also the expectation of maintenance 😉 Also, what should we do with On another point: do we want to continue with only one repo or have multiple? Already we have some confusion of branch & tag names with Separate repos would give us more git and CI overhead but smaller history and less CI work to do on each change. |
@dhardy In principle, I'm fine with either way. It depends on what we decide about the fate of PRNGs in Rand: If Rand somehow depends on a PRNG, I think the crate should be in the Rand repository. If Rand just recommends the crate, either way is probably fine. Or should we be worried that a crate we recommend gets compromised? After all, Rand is the third-most popular crate. However, I anticipate the recommended crates will be used by few users, making them a smaller target. If we decide to give the recommended crates a About splitting repos: I'm not sure. It worked well for |
For non-crypto PRNGs we only need one in Rand ( I think we should use the True, at the moment most issues are strongly linked to the main Rand project even if about some other part such as a |
Would a pull request for a Xorshift1024 crate be accepted at this stage? I'm wanting to have xorshift1024* for a monte carlo simulation I'm writing, and it looks like moving forward having it hosted in this repository would be ideal (and match with the plans outlined here). I'd open a separate issue, but it seems slightly subsumed by this one. |
You mean this
It's also possible that other generators with similarly large periods might be added. |
Note: |
Well, I'll try again: @astocko are you interested in maintaining your Xorshift crate? But given the lack of commits in the last year it seems unlikely. The code is CC0 so it may be sensible at this point to pull out what we want and make a new @vks what do you think should be the divide between Xorshift, Xoroshiro and Xoshiro generators, since you already have the latter two in your crate? |
I see you mentioned me here as an author of randomorg crate. I am the one and I just wanted to remind you that this crate unfortunately is not going to have |
@dhardy My crates are based on the recommendations by Vigna. The xoroshiro
crate includes xorshift1024*phi, because that was recommended together with
the xoroshiro generators. The xoshiro crate implements the new generation
of Vigna's generators, which includes some xoroshiro variants incompatible
to the xoroshiro crate (!). So yes, the crate names are a bit misleading,
but I think the current separation makes the most sense. A more accurate
name would be rngs_vigna2018, but I'm not sure it would be a better name.
I should probably clarify that in the docs.
…On Sat, Jul 14, 2018, 09:52 Diggory Hardy ***@***.***> wrote:
Well, I'll try again: @astocko <https://github.com/astocko> are you
interested in maintaining your Xorshift crate?
But given the lack of commits in the last year it seems unlikely. The code
is CC0 so it may be sensible at this point to pull out what we want and
make a new rand_xorshift crate.
@vks <https://github.com/vks> what do you think should be the divide
between Xorshift, Xoroshiro and Xoshiro generators, since you already have
the latter two in your crate?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#431 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACCtBdWGQeAOE_GE66fWNGjJW8V1cJjks5uGaNFgaJpZM4T0Cak>
.
|
We could just throw it all under |
Well, stability is why I decided to create a new crate, so the old one can still be used for reproducibility. One of the advantages of moving the algorithms out of Rand it that we can just add and drop crates from the recommendations without breaking anyone's code. |
Hey, author of the Pcg32 crate here. The other PcgRand crate seems much more full-fledged than mine, but I'd be happy to continue maintaining Pcg32. I personally would advocate for some sort of vetting/benchmarking with the rand crates. Even something like dieharder scores would be nice to see. |
We currently have quality comparisons here: https://docs.rs/rand/0.5.4/rand/prng/index.html#basic-pseudo-random-number-generators-prngs I don't know if we've thought about listing actual test data though (probably TestU01 or PractRand if anything) |
@pitdicker I think it's time to turn your As you say, it is more useful having a small crate with a few high-quality PRNGs than to have a huge collection, so how about we create
Potentially we can add a separate |
I've just gone ahead and implements a xoroshiro256+ in my own code, which ended up seeming simplest, after I decided that compatibility with my old C code didn't matter. Maybe I'll use a "standard" xoroshiro implementation at some later point, but for now it seemed easiest to just translate the C code myself. That way I didn't have to mess with serde feature flags, and I could easily get proper seeding with a u64 as recommended by Vigna. |
If it helps you decide which PRNGs to include where, I have written an article on random number generators and categorized "good" RNGs into two categories. Essentially:
Any RNG other than a cryptographic or statistical RNG, as defined in my article, should be considered low-quality, I think. |
I updated my pcg crate to implement the |
Edit by dhardy: adding a tracker for the planned tasks:
SmallRng
algorithm (see Requirements for a small fast RNG dhardy/rand#60 and Shootout: small, fast PRNGs dhardy/rand#52)XorshiftRng
(required: that the RNG is available in another crate)rand_rngs
,rand_pcg
or something)Also: we should consider whether we want to rename PRNGs when moving or adding them.
As decided in dhardy#58.
The idea is to only keep the
StdRng
andSmallRng
wrappers in Rand, and to not expose PRNG algorithms directly.The field of PRNG algorithms is not static, it seems like a good idea to cut Rand partly loose from the developements to improve our backward-compatability and usability story. Advantages (from the original issue)
Instead we want to provide an official companion crate,
rand_rngs
, with a couple of blessed implementations.Whether Rand should depend on
rand_rngs
or copy the two algorithms used byStdRng
andSmallRng
is not fully decided. It might reduce code size if a crate depends on bothStdRng
and its algorithm (currentlyHc128Rng
) directly, but that seems rare and not worth much. It has the disadvantage that both implementations should remain in sync. Maybe the CI can check this?If Rand depends on
rand_rngs
, it would restrictrand_rngs
to only depend onrand_core
. But that is whatrand_core
is for. I am for having Rand depend onrand_rngs
.rand_rngs
rand_rngs
should provide a good selection of PRNGs.Including multiple PRNGs in one crate has the advantage of having a single place to provide guidance, as it helps comparing different algorithms with each other. Also it gives one place to offer consistent benchmarks, to run PRNG test suites, to keep up a common quality level, and possibly to develop functionality that is useful for more than one PRNG (e.g. jumping).
Which PRNGs to include has been the topic of endless discussions. A few are basically decided. I fully expect the number of PRNGs to grow over time. But we also should be careful not to expose too many, as there are hundreds. Every PRNG should have one thing that gives it a clear advantage over others, otherwise it is not worth the inclusion.
Normal PRNGs
dhardy#60 explored normal PRNGs. The following 5 I feel comfortable about for an initial version of
rand_rngs
:The two PCG variants offer good quality and reasonable performance. SFC provides high performance and a chaotic PRNG (not a fixed period). Xoroshiro128+ has acceptable quality but high performance.
An PRNG put together by me, XoroshiroMT, is also good quality and sits between PCG and Xoroshiro128+ qua performance. But now that there is just a new Xoshiro PRNG, it may be worth evaluating that one first, as it is said to also be good quality.
CSPRNGs
For CSPRNGs we currently have two good implementations in Rand
ChaCha20 offers reasonably good performance and uses little memory, while HC-128 brings high performance at the cost of using much memory.
I would also like to see some implementation of AES-CTR DRBG eventually, as it is commonly used, and has hardware support on modern desktop processors.
Other / deprecated PRNGs
We currently have the ISAAC, ISAAC-64 and Xorshift128/32 PRNGs in rand. They have no real advantages over the PRNGs metioned before. It is better to use HC-128 instead of ISAAC, and Xoroshiro128+ or PCG instead of Xorshift. We can include them in something like a
deprecated
module, but I propose to move them to stand-alone crates outside the Rand repository.Steps to take
prngs
module to a seperaterand_rngs
crate in the Rand repository, similar torand_core
.generators
benchmarks over.Does this sound about right?
The text was updated successfully, but these errors were encountered: