-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing of distributions #72
Comments
This brings up another point: should distributions be reproducible? We've already decided we want reproducible generators (meaning that a fixed seed produces a fixed output sequence, even if the library code is updated). Do we want the same for distributions? At a minimum I think changing distribution output should be considered a breaking change post 1.0 release. Black box tests are better than bucket tests here since they also test reproducibility of results. |
distributions should absolutely be reproducible any time they're run with a reproducible generator, yes. Anything else means that they're somehow getting their own entropy into the system instead of only using the entropy provided by the generator, which is clearly the wrong thing to do. |
The question about reproducibility is more: are we allowed to change the algorithm? We already have several changes that make distributions return completely different results:
I can see some changes that would be good:
Then there are some open questions:
Strictly speaking any change should probably be considered a breaking change. We may be able to relax that a little if we now how the distributions are used though. Are applications that require them to be reproducible across versions common or not? Are they used in games, libraries etc.? Is anyone now trying to do simulations in Rust? If you require reproducibility, is it not reasonable to pin crates critical to the algorithm to specific versions? Together with the testing this issue is about, and that we probably want some extra distributions, it seems there is quite some work left to do. Not sure there is any thought yet about a 1.0. If the goals is API stability, this can all be done in the future. If we want want to present 1.0 as something we have reasonably good confidence in, we should at least have answers to the questions. |
Ah, I misread the question somewhat. No, I don't think that exact results should be relied upon to not change between versions for a particular distribution. New versions must be free to update to newer methods and techniques without it being considered breaking. The contract should be that valid results are produced (eg: some distribution of 0.0 to 1.0 must never produce 1.2), but the exact input and output, or number of generator cycles used, that is much better kept as open as possible. |
Reproducibility could be useful for some simulations but I don't think it's critical for much. I suggest we only require a bump in the minor version number if an algorithm produces different results (i.e. after 1.0 patch releases must produce identical results from all distributions, given fixed input). |
While I agree that changing generators should be considered a breaking change, and while I consider reproducibility in simulations very important, I agree that changing how results are computed in distributions should not be considered a breaking change. When I'm doing simulations and I want to be able to reproduce in case for example I get a strange result, I always save (a) the start seed and (b) the git revision of my simulation code that includes Cargo.lock. And that handles both changes in the code on my side and any non-breaking changes. |
Respectfully, I disagree. Post 1.0, requiring a bump to the minor version number (1.0 → 1.1) would be enough, but having changes in distribution output with any new release would be a pain in some circumstances. Of course without tests, that's difficult to enforce. |
I agree with that, I was only trying to say that changing distribution output should not be a major bump, maybe I exaggerated a bit 😄. But yes, you're right. |
Moved: rust-random#357 |
Continuing from #69
We should add some testing for distributions. Unfortunately since distribution samples are random, this is inherently difficult.
@pitdicker suggests using 256 buckets but notes that this could only pick up very large errors. In my experience we'd need somewhere in the range from 10'000 to 1'000'000 samples to make the test reasonably useful and reliable, and it still wouldn't be 100% reliable (meaning we'd probably want to fix the seed, rather than let CI fail randomly from time to time).
In my opinion, that isn't the way we should go about it; instead we should do two things:
The text was updated successfully, but these errors were encountered: