Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing of distributions #72

Closed
dhardy opened this issue Dec 14, 2017 · 9 comments
Closed

Testing of distributions #72

dhardy opened this issue Dec 14, 2017 · 9 comments

Comments

@dhardy
Copy link
Owner

dhardy commented Dec 14, 2017

Continuing from #69

We should add some testing for distributions. Unfortunately since distribution samples are random, this is inherently difficult.

@pitdicker suggests using 256 buckets but notes that this could only pick up very large errors. In my experience we'd need somewhere in the range from 10'000 to 1'000'000 samples to make the test reasonably useful and reliable, and it still wouldn't be 100% reliable (meaning we'd probably want to fix the seed, rather than let CI fail randomly from time to time).

In my opinion, that isn't the way we should go about it; instead we should do two things:

  1. Have an external test suite for "supervised testing" (this could be as simple as plotting a histogram of results via gnuplot)
  2. Add "black box" unit tests (my terminology) which simply tries a few fixed inputs and checks they get the same results as last time (i.e. the test tells you when a code change affects output, but not whether the output is still follows the distribution)
@dhardy
Copy link
Owner Author

dhardy commented Dec 14, 2017

This brings up another point: should distributions be reproducible?

We've already decided we want reproducible generators (meaning that a fixed seed produces a fixed output sequence, even if the library code is updated). Do we want the same for distributions?

At a minimum I think changing distribution output should be considered a breaking change post 1.0 release.

Black box tests are better than bucket tests here since they also test reproducibility of results.

@Lokathor
Copy link

distributions should absolutely be reproducible any time they're run with a reproducible generator, yes.

Anything else means that they're somehow getting their own entropy into the system instead of only using the entropy provided by the generator, which is clearly the wrong thing to do.

@pitdicker
Copy link

The question about reproducibility is more: are we allowed to change the algorithm?

We already have several changes that make distributions return completely different results:

  • Integer ranges now use widening multiply instead of modulus;
  • Floating point numbers are converted differently with higher precision;
  • Floating point ranges use a different method to preserve that precision;
  • char distributions are changed;
  • With distributions that use a ziggurat table, the floating point conversion is slightly modified, and we also ignore the (unnecessary) three least significant bits.

I can see some changes that would be good:

  • Integer ranges with i128/u128 don't use the widening multiply yet;
  • Floating point ranges don't yet take rounding into account, and can generate values slightly outside the range (I am experimenting with this at the moment);
  • There is an improved (faster) ziggurat table method;

Then there are some open questions:

  • Are we sure the distributions as currently implemented are correct? Also in edge cases?
  • For many distributions there are several known algorithms to choose from. I am not sure what trade-offs the ones currently implemented make. Maybe we want to replace one or more?
  • Is no_std support possible with not to many changes for the distributions?
  • Are things like infinity, NAN's, subnormals etc. handled reasonably?

Strictly speaking any change should probably be considered a breaking change.

We may be able to relax that a little if we now how the distributions are used though. Are applications that require them to be reproducible across versions common or not? Are they used in games, libraries etc.? Is anyone now trying to do simulations in Rust? If you require reproducibility, is it not reasonable to pin crates critical to the algorithm to specific versions?

Together with the testing this issue is about, and that we probably want some extra distributions, it seems there is quite some work left to do. Not sure there is any thought yet about a 1.0. If the goals is API stability, this can all be done in the future. If we want want to present 1.0 as something we have reasonably good confidence in, we should at least have answers to the questions.

@Lokathor
Copy link

Lokathor commented Dec 26, 2017

Ah, I misread the question somewhat.

No, I don't think that exact results should be relied upon to not change between versions for a particular distribution. New versions must be free to update to newer methods and techniques without it being considered breaking. The contract should be that valid results are produced (eg: some distribution of 0.0 to 1.0 must never produce 1.2), but the exact input and output, or number of generator cycles used, that is much better kept as open as possible.

@dhardy
Copy link
Owner Author

dhardy commented Dec 26, 2017

Reproducibility could be useful for some simulations but I don't think it's critical for much. I suggest we only require a bump in the minor version number if an algorithm produces different results (i.e. after 1.0 patch releases must produce identical results from all distributions, given fixed input).

@tspiteri
Copy link

While I agree that changing generators should be considered a breaking change, and while I consider reproducibility in simulations very important, I agree that changing how results are computed in distributions should not be considered a breaking change. When I'm doing simulations and I want to be able to reproduce in case for example I get a strange result, I always save (a) the start seed and (b) the git revision of my simulation code that includes Cargo.lock. And that handles both changes in the code on my side and any non-breaking changes.

@dhardy
Copy link
Owner Author

dhardy commented Jan 19, 2018

Respectfully, I disagree. Post 1.0, requiring a bump to the minor version number (1.0 → 1.1) would be enough, but having changes in distribution output with any new release would be a pain in some circumstances.

Of course without tests, that's difficult to enforce.

@tspiteri
Copy link

I agree with that, I was only trying to say that changing distribution output should not be a major bump, maybe I exaggerated a bit 😄. But yes, you're right.

@dhardy
Copy link
Owner Author

dhardy commented Mar 31, 2018

Moved: rust-random#357

@dhardy dhardy closed this as completed Mar 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants