-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeding PRNGs (traits and bit size) #18
Comments
I'd say drop the I donno if seeding by the whole internal state always makes much sense either. Yes, you could set the internal state for some permutation like ChaCha or Keccak to produce random numbers, but the normal usage of ChaCha allocates 128 bits to fixed constants, at least a I dislike using slice methods since they make user pepper their code with asserts to emulate type checking. It's also bad to encourage
There are uses for object safety in some sort of
|
No, the use-case of seeding from I don't see much use in using I'm not sure I see any point in setting the entire internal state either. But using a fixed bit-size like You missed my point about being able to seed from whatever key size the user has directly (without e.g. copying into I don't see where asserts would need to be used. The API should allow usage with any number of bytes up to the maximum and just ignore extra bytes; the only things might be considered "wrong" are providing too many bytes (not really important) or not enough bytes (perhaps this makes it easier for users to provide insecure implementations, I don't know). |
Part of the issue with having an associated fixed-size Although slices aren't ideal, is there a better solution (besides allowing a whole family of seeding functions/traits using different sizes)? |
The seed sized should be determined by the PRNG, normally being 256 bits, but perhaps the type expresses alignment issues. In many usages like data structures, there should be a fatal error if the seed size does not match the seed provided, so this can either be done with type checking or with asserts. |
I disagree; users should be able to use a 128-bit seed with a generator like ChaCha if they wish, or a 256-bit seed, or even a 512-bit seed (if the generator can use it). Not to rule out other sizes (e.g. 192 or 384). |
I think if a user who wants to use a 128 bit seed with ChaCha then they should explicitly pad the seed with zeros, or call some special method whose documentation indicates this behavior. It's harder to audit code when you need to think about lengths of all the slices. I take it Isaac+ has no standard that defines a seeding procedure? I suppose either there is a security argument that some smaller seed plus zeros is fine even from the beginning, or else you seed with the smaller value and run if for one round before using the output. You could provide a I'm not advocating for |
Point noted about not wanting to check slice lengths, but this is normal practice with large buffers. Anything else makes it hard to avoid extra copy operations, though maybe we shouldn't worry too much about that here. Uh, probably @burdges you didn't see that we abandoned ISAAC+ (see also here). No, I'm not aware of a standard seeding routing for ISAAC. The author recommends seeding with the output of a secure cipher (see bottom of page). We could make a seeding routine for some selected key size (maybe to 256 bits), I suppose, and either pad with zero (current code does this) or use another PRNG (e.g. ChaCha) to expand the given 256-bits to the full state size. @pitdicker has just been looking at this. If we go that route, we either have to prevent direct provision of the full state size or add another function. And then we need to choose which seeding functions we have, e.g. from 64 and 256 bits, maybe also 128 and 512? |
If you have a long enough slice, then the I'd forgotten about ISAAC+. It sounds like ISAAC users need access to the full state, since the author does not express confidence in his seeding routine, but not necessarily through I think you want
where |
Interesting point. Of course, it would be possible to set the state of a PRNG like ISAAC using Implementing Of course, we could throw out I suppose we could fix that |
Can't we use something like this? pub trait SeedableRng: Rng {
const SeedSize: usize;
fn from_seed<S: ToSeed<Self::SeedSize>>(seed: S) -> Self;
fn from_rng(rng: &mut Rng) -> Self {
let mut buf = [0u8; Self::SeedSize];
self.fill(&mut buf);
Self::from_seed(buf)
}
}
pub trait ToSeed<const SeedSize: usize> {
fn to_seed(&self) -> [u8; SeedSize];
} And implement |
I don't see much point in that @newpavlov. I didn't really get @burdges requirements with asserting length, but the only difference between this and passing a slice directly is that it is possible to make some assertions about length in The problem with this is that it assumes each RNG has an ideal length of seed which is the maximum possible length, and always pads smaller seeds with zero. This may not be the case; e.g. small seeds may be repeated and/or modified with some magic number, and there may not be an ideal seed size, as in ISAAC could use a whole 8192 seeding bits but 256 would often be more reasonable. |
The point is that Click to expandpub trait SeedableRng: Rng {
const OptimalSeedSize: usize;
fn from_seed<S: ToSlice>(seed: S) -> Self;
fn from_rng(rng: &mut Rng) -> Self {
let mut buf = [0u8; Self::OptimalSeedSize];
self.fill(&mut buf);
Self::from_seed(&buf)
}
}
pub trait ToSlice {
fn to_seed(&self) -> &[u8];
} UPD: But drawback of this approach will be usage of native endian... |
Is there no iterator adapter in the itertools crate or whatever that agglomerates or splits numeric types according to a specific endianness? It'd be easy enough to write, but does need endianness. I think A priori, I'd think lengths should be avoided @newpavlov favoring actual types Afaik, we expect tricks like |
I personally like the Should we specify RNGs should implement
Even so, this approach is still not ideal. We could alternatively create some wrapper which can easily convert a slice or iterator into an |
After reading up on initializing RNGs and tinkering with ISAAC and Xorshift*/Xoroshiro the last couple of weeks this is how I understand the problem of initializing: If there is a high-quality source of randomness available ( If the source of randomness / seed is not of high quality, the problems begin. There is no way to 'increase the randomness'. The entropy of the seed should be distributed evenly over the RNGs state. But you have to make sure that biases in the seed do not affect the quality of the RNG algorithm. For simple RNGs the common advice in such a case is to run the weak seed through an algorithm that is completely different from the algorithm used in the RNG. If one RNG relies on Multiplies (like LCGs as PCG) it may be best to run the seed through something using Xorshifts. And for Xorshift-based something like SplitMix, a Wehl-sequence with a strong output hash, is a good option. But this assumes only RNGs that are statistically biased are available to initialize from a weak seed. I think any RNG that turns the seed into something statistically unbiased is fine like PCG and Xorshift*. For cryptographic RNGs it is (as everything) slightly more complicated. A weak seed could reduce the effort it takes to brute-force the internal state of the RNG. The result of the initialization should not only evenly and unbiased divide the available entropy over the internal state. It should also not be possible to reconstruct the seed from the initialized state. I think. The last point is why Bob Jenkins recommends to encrypt the initial seed of ISAAC with a strong block cipher, and does not fully trust his implementation. For us I think it is useful to keep it as-is. For the default use, initializing with For ChaCha we definitely need to look at how it gets initialized. I don't have faith in the current Sorry, this does not really fit into the current discussion. But it seems like things to keep in mind. |
I just noticed that if You could always support As for ChaCha, I always liked that Peter Reid's ChaCha makes all types explicit, so I'd likely take
In practice, one should avoid boilerplate by using internal traits or macros to get |
@pitdicker I don't understand the logic of this. My understanding of entropy in the case of cryptography is as a measure of how much information a potential attacker doesn't know; e.g. 30 bits of entropy would imply that an attacker could guess that our system is in one of 2^30 states. Assuming the attacker also knows the algorithms used (published in this case), it makes no difference what the initialisation algorithm does, so long as it doesn't lose entropy, the attacker still has 2^30 states to check. The only other thing to consider, I think, is how much of that entropy gets used in each number output. If there is insufficient mixing, the initial output may have less entropy than given in the seed, i.e. there may be fewer output possibilities than the available entropy should imply; I guess this is the generalisation of what's commonly called bias (the more specific version being a predicate like 'output < max / 2' not having the expected probability). I'm not sure that "not being able to reconstruct the seed from the internal state" says anything except about the complexity of the algorithm; I don't see why using a reversible hash function here would be a disadvantage? |
Yes, I wrote it badly. Let me try again: Suppose a cryptographic RNG is initialized with a 128-bit seed. The full state is 512 bits. What happens if the state is naively initialized with some non-cryptographic function (for example, repeating the seed 4 times)? If you can derive a quarter of the state, you know the entire state. Now you should not be able to derive that quarter anyway. But it seems to me bad initialization would make the cryptographic promises of such an RNG weaker. I was not thinking about brute-forcing, but about someone exploiting weaknesses in the cryptographic RNG. After a good initialization of a CSRNG it should not be possible to know anything about one part of the state after figuring out another part. Then that would mean using some irreversable function to construct the state from the seed. |
Just learned something. 'Every CSPRNG should withstand "state compromise extensions". In the event that part or all of its state has been revealed (or guessed correctly), it should be impossible to reconstruct the stream of random numbers prior to the revelation.' (Wikipedia). So it is part of the CSPRNG's job to prevent this. We don't need to search for "some irreversible function to construct the state from the seed" if the seed may be weak or too short. Just copy the available bytes from the seed into the RNGs state and run one round of the cryptographic RNG. It still means the current implementation of ChaCha's Note that ISAAC does not seem to offer this guarantee, and therefore has the special initialization function. |
Okay, I had some doubts about initializing RNGs with a seed that is shorter than the state. But they are all resolved now. Initializing with a slice of some arbitrary size should be workable. One other question: |
Is that not exactly what |
Why precisely? |
After the first round the first block of output is based on the weak state. Only after the first round is the state mixed well. So I would run |
I'd discourage giving any ChaCha less than a 256 bit key because ChaCha never mixes the internal state. If I understand, you guys are no longer discussing ChaCha though, but another PRNG built with the ChaCha permutation function? You need another name for that like |
@burdges I keep messing up anything I say about ChaCha, glad you know more about it. Still the point stands, one round of ChaCha should be a good enough initialization function to turn a weak seed into an acceptable state for Of course initializing with a full seed of good-quality random numbers is the only way to initialize ChaCha that can claim to be as secure as the algorithm gets. I am not sure yet what to think about allowing smaller seeds, but after the discussion today I think it only has consequences for brute-forcing. The algorithm does not get easier to break (with proper initialization). |
Hmm, I thought this was suspicious when I read it. It's not true. ChaCha allows seeking which makes reversal trivial, thus "state compromise" allows past output to be reconstructed easily.
Looking at Peter Reid's code (mentioned by @burdges earlier) I see three main differences:
I think it would be interesting to build some encryption/decryption code around Regarding More generally, encouraging use of What about ISAAC, should it just fill its huge state directly from the So now we want a general way to expand a short seed. Perhaps we should have something similar to So have we argued ourselves to the point that we should drop |
It's not even about security but about standards: If you want to call it literally I'd think a I'd wager the ChaCha permutation operating like Isaac gives plenty of security, but I donno if anybody bothered writing that down, or maybe it follows form DJB's paper. It's plausible other permutations like Keccak give such an analysis. |
I think the idea was to allow parameterisation with constants, i.e. But you posted at the same time as me so I'll assume you didn't reply to my previous post. |
When did you update the first post? I didn't notice it...
I don't think we should get to creative with the initialization function for RNG's, and also that we should not force other implementers of RNG's to become so. The standard functions in For the proposed I have looked at several randomness libraries in C, and some allow some flexibility in the seed size. This is mostly because of the lack of easy sources of randomness: they support seeding from things like a single ISAAC is a bit special because it supports arbitrary seed sizes. On the other hand its initialization function is viewed as the weakest part of it. In short I would not like to give users the idea that the seed size is something flexible, and using something else than the maximum is ok for serious uses. And I would not like to force RNG implementations to have to handle it. |
It should definitely be the other way around if there is a default. It's not about deterministic behaviour, it's about stating that the code won't change in the future (in a way that affects output).
Several times. I only copied |
I think Why the Why the
If you want to support In any case, I think if a developer just starts writing code using I think there are a couple relevant seeding modes, (1) seed however the PRNG is defined, or some approximation there of, (2) seed by filling the PRNG's whole internal state, and (3) hash something to do either (1) or (2), ala
|
The reasoning was (a) to avoid requirements on fixing Endianness and (b) to allow flexibility for optimisation later.
I wasn't so keen on this either but didn't see a better solution. @pitdicker wrote it.
So that users know something or other about It seems crypto PRNGs will probably never directly fill the internal state, or at least we probably won't do that with ISAAC; this makes me question whether we want |
I see no reason for In fact, your |
We should figure out if specialization can be used like this: pub trait SeedableRng: Sized {
type Seed;
fn from_seed(seed: Self::Seed) -> Self;
fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error>;
}
impl<T> SeedableRng for T where
T: SeedableRng,
T::Seed: Sized + Default + Deref<Target=[u8]> + DerefMut
{
default fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
let mut seed = Self::Seed::default();
assert!(seed.deref_mut().len() == ::std::mem::size_of::<Self::Seed>());
rng.try_fill_bytes(seed.deref_mut())?;
Ok(Self::from_seed(seed))
}
}
impl<T> SeedableRng for T where
T: SeedableRng,
T::Seed: Sized + Default + Deref<Target=[u32]> + DerefMut
{
default fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
let mut seed = Self::Seed::default();
let size = 4 * seed.deref_mut().len();
assert!(size == ::std::mem::size_of::<Self::Seed>());
{
let ptr = seed.deref_mut().as_mut_ptr() as *mut u8;
let s = unsafe { slice::from_raw_parts_mut(ptr, size) };
rng.try_fill_bytes(s)?;
}
Ok(Self::from_seed(seed))
}
} or maybe like this: pub trait SeedableRng: Sized {
type Seed;
fn from_seed(seed: Self::Seed) -> Self;
fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error>;
}
trait SeedArrayBase : Sized {}
impl SeedArrayBase for u8 {}
impl SeedArrayBase for u16 {}
impl SeedArrayBase for u32 {}
impl SeedArrayBase for u64 {}
impl<P,T> SeedableRng for T where
P: SeedArrayBase,
T: SeedableRng,
T::Seed: Sized + Default + Deref<Target=[P]> + DerefMut
{
default fn from_rng<R: Rng>(mut rng: R) -> Result<Self, Error> {
use std::mem::size_of;
let mut seed = Self::Seed::default();
let size = size_of::<P>() * seed.deref_mut().len();
assert!(size == size_of::<Self::Seed>());
{
let ptr = seed.deref_mut().as_mut_ptr();
let s = unsafe { slice::from_raw_parts_mut(ptr as *mut u8, size) };
rng.try_fill_bytes(s)?;
}
Ok(Self::from_seed(seed))
}
} |
Actually we should simply go with one specialization which probably works.
We should not provide any default conversion from In this way, a |
As I pointed out above, there is no obvious need for
In this way, you can provide the |
What's the UB? Sorry, if you're looking at the code at the top of this issue, it's incomplete; full version (sealing the types supported) is here. The "sealed" stuff is a trick to prevent user extension. As I remember, you implied that we should only use So maybe I've been wondering if |
Yeah, it's not undefined behavior per se but it pushes memory safety onto I suppose It appears my specialization trick fails bizarrely because all associated types cannot yet be leveraged for specialization. :( I'd think one could fix it by splitting off
|
No, UPD: Oh, sorry, you mean marking trait unsafe, here I agree with you. |
@burdges Can you explain why you need to set a nonce for an RNG? The implementation of ChaCha in
|
Just because a nonce is part of the chacha standard. I suppose a massive counter has uses too though, so maybe the nonce could be left aside. We might eventually have a general wrapper for treating stream ciphers as CSPRNGs anyways. |
ChaCha is standardised as a stream cipher; we're using it as an RNG. I have nothing against adding additional constructors on ChaCha to allow it to be used for both purposes however. It may be that using
And is there a good reason to keep
So my thoughts are we use @pitdicker's PR except (a) mark |
Sorry, if |
@burdges You are right, there is no unsafe code necessary anymore in One problem with Edit: the code without |
Also I suggested |
Since the experiments with I think again that a For cryptographic RNGs the So I would propose to add The advantage of adding the function to |
I don't think 20 times is enough for all RNGs with large state. I would suggest to implement |
Ah, you are right. I forgot about a third category: simple RNGs using large states, like xorshift1024*φ (which you are probably thinking about?). Although I see only very limited use for these, we should support them. Using a RNG to fill the state is definitely the best solution there. If we add helper functions for all three cases in the Of course the documentation should make very clear that using this in combination with a cryptographic RNG is rarely a good idea. |
For what it's worth, I'm still planning on getting I guess both A potential issue with Ultimately I don't see much use for |
Note that we don't need to implement either as a function at all: |
Implemented, apart from suggested additions like |
Update again:
So far we have proposed the following:
u64
(convenient non-crypto usage)If we're seeding from another RNG, there's probably no reason not to fill the entire state space from this source; we are very likely seeding from a CSPRNG which can easily provide as many bytes as we need for little more cost than just copying a smaller set of bytes multiple times.
On the other hand if we are providing a seed, we might want to use the maximum key size possible. On the other hand, we may well only want to provide a sub-set, such as a 128-bit or 256-bit key (AES uses these key sizes). So should we add functions to seed from 128-bit and 256-bit keys instead of forcing users to copy their key into a larger zero-initialised array?
Actually, the current implementations allow seeding from
u32
oru64
slices of undefined length. Keeping this functionality seems sensible.But why not support seeding from byte slices instead? The only difference is that enforced alignment may allow more efficient copying from a slice of
u32
oru64
. Considering the user may already have a key in a byte-slice, this enforcement probably does more harm than good (forcing an extra copy or unsafe pointer-coercion).This leaves me wondering if the current
SeedableRng
proposal is such a good idea; perhaps instead we should have the following:(
SeedFromRng
may stay a separate trait providing the other type of seeding.)The text was updated successfully, but these errors were encountered: