Tweak Open01 and Closed01 #237

tspiteri · 2018-01-15T18:31:00Z

The PR has two commits:

Fix SCALE to be 2^52 for f64 and 2^23 for f24. SCALE is expected to be 1/ɛ in the remaining code inside float_impls, so $mantissa_bits should be 52, 23 and not 53, 24. After this commit Closed01 can produce 0.0 and 1.0, whereas before this commit the maximum was erroneously (1 − ɛ) / (1 − ɛ/2).
Remove the bias in Open01 by adding ɛ/2 rather than the intended ɛ/4 (erroneously ɛ/8 before the first commit above). The largest number is now 1 − ɛ/2 which is still exactly representable, since 1 − ɛ has one spare bit of precision, unlike 1 + ɛ which uses all bits of precision.

pitdicker · 2018-01-15T18:39:10Z

@tspiteri Thank you for your PR!

We are planning to merge an other solution soon, that currently lives in this fork https://github.com/dhardy/rand/blob/master/src/utils.rs. That will do the same conversions as here, but generate floats with higher precision when the values are closer to zero.

pitdicker · 2018-01-15T18:41:12Z

As a simple solution I used rejection sampling there: https://github.com/dhardy/rand/blob/master/src/distributions/uniform.rs#L195

tspiteri · 2018-01-17T23:13:36Z

I'm not a fan of the methods in that fork which generate floats with somehow higher precision when the values are closer to zero. (For what it's worth, I'm not a fan of the [0,1] and (0,1) intervals either; I only submitted the PR because that functionality is already there, so I thought it should at least be more "correct" if that does not take too much effort.)

In my opinion, there are two ways to generate random floats with a uniform distribution:

Generate a random float where the possible values are equally spaced in the interval [0,1), where the space between possible values is ɛ. There are 1/ɛ possible values: 0, ɛ, 2ɛ, ..., 1-2ɛ, 1-ɛ. This is provided by the crate already when generating a random number in [0,1). This method makes sense because for a float with m mantissa bits, exactly m bits of randomness are generated.
Generate a random float using a continuous uniform distribution in the interval [0,1) and then return the value using correct rounding. This would be functionally equivalent to first generating 0.* where * is an infinitely-long stream of random bits, and then rounding the result. A consequence of this is that if * starts off with a very long stream of consecutive ones, rounding will cause the returned value to be 1.0, as 0.1111... will be rounded to 1.0 (unless rounding towards zero or negative infinity, but floats are rounded to the nearest in Rust). This method makes sense because it is similar to other float methods with correct rounding.

To me, the method in the fork is a conflation of these two methods. It seems that it is basically method 1, but then it veers a little towards method 2 just to make use of the excess bits generated.

pitdicker

I did a little test, which primitively does the same as your modifications:

        const SCALE: f32 = (1u64 << 23) as f32;
        let result1 = 0.0f32 + 0.5 / SCALE;
        let result2 = 0.99999994f32 + 0.5 / SCALE;
        println!("{}, {}", result1, result2);

with the result:

	0.000000059604645, 1

Compared to how the code used to work:

        const SCALE: f32 = (1u64 << 24) as f32;
        let result1 = 0.0f32 + 0.25 / SCALE;
        let result2 = 0.99999994f32 + 0.25 / SCALE;
        println!("{}, {}", result1, result2);

Result:

	0.000000014901161, 0.99999994

So the current code is correct, it produces the desired open range.

pitdicker · 2018-01-18T19:12:51Z

src/rand_impls.rs

@@ -148,8 +148,8 @@ macro_rules! float_impls {
        }
    }
 }
-float_impls! { f64_rand_impls, f64, 53, next_f64 }
-float_impls! { f32_rand_impls, f32, 24, next_f32 }


Floats only have the number of mantissa bits as you corrected, and one implicit bit. But does the logic here not rely on that? Are you sure the SCALE calculated in the macro is correct now?

The comment for Closed01 is "rescale so that 1.0 - epsilon becomes 1.0 precisely". The rescaling is done by multiplying by SCALE / (SCALE - 1.0). Now for SCALE / (SCALE - 1.0) to rescale 1.0 - EPSILON into 1.0, we need SCALE / (SCALE - 1.0) == 1.0 / (1.0 - EPSILON), which is true when SCALE == 1.0 / EPSILON. For f32 ɛ is 2⁻²³, so 1/ɛ (SCALE) should be equal to 1<<23, not 1<<24. For f64 ɛ is 2⁻⁵², so 1/ɛ (SCALE) should be equal to 1<<52, not 1<<53.

pitdicker · 2018-01-18T19:21:04Z

src/rand_impls.rs

-                    // the precision of f64/f32 at 1.0), so that small
-                    // numbers are larger than 0, but large numbers
-                    // aren't pushed to/above 1.
-                    Open01(rng.$method_name() + 0.25 / SCALE)


This method was correct.
0.0 + 0.25 / SCALE will change the range to have 0.25 / SCALE as lower bound.
(1.0 - 1.0 / SCALE) + 0.25 / SCALE is still (1.0 - 1.0 / SCALE), thanks to the more limited precision near 1.0. At least, it works when the floating point rounding mode is set to nearest, and not up.

This is a little more subjective than for the other change (that's why I put it in a second commit). The change in the first commit (changing 24, 53 into 23, 52) is to make the code work as was intended. The change from 0.25 / SCALE (ɛ/4) into 0.5 / SCALE (ɛ/2) is to change the intended behavior. Adding ɛ/4 will change from the half-open range [0,1) into an open range (0,1), but it is not symmetric about 0.5, so it is biased. Also, the spacing between all the possible outputs is not equal any more, as I'll try to show below.

The value obtained from rng can have any of the possible value:

0, ɛ, 2ɛ, …, 0.5−ɛ, 0.5, 0.5+ɛ, …, 1−2ɛ, 1−ɛ.

Adding ɛ/4 (the intended current behavior) would give the following possible outputs:

0.25ɛ, 1.25ɛ, 2.25ɛ, …, 0.5−0.75ɛ, 0.5+0.25ɛ, 0.5+1.25ɛ, …, 1−1.75ɛ, 1−0.75ɛ,

but rounded. Rounding has the effect that 0.25ɛ is added to all values < 0.5, but nothing is added to values ≥ 0.5, as then the added bit does not fit in the precision and is rounded off, so the actual outputs are:

0.25ɛ, 1.25ɛ, 2.25ɛ, …, 0.5−0.75ɛ, 0.5, 0.5+ɛ, …, 1−2ɛ, 1−ɛ.

Adding ɛ/2 to the value from rng would give the following possible outputs instead:

0.5ɛ, 1.5ɛ, 2.5ɛ, …, 0.5−0.5ɛ, 0.5+0.5ɛ, 0.5+1.5ɛ, …, 1−1.5ɛ, 1−0.5ɛ.

All these values can be represented exactly, so no rounding takes place. I see two advantages to this change:

The expected value (average) is now exactly 0.5, which is what I would expect from a random number in the open range (0,1).

The gap between possible values is always equal to ɛ, unlike the current intended behavior where the gap is almost always equal to ɛ, but is 0.75ɛ in the middle (between 0.5−0.75ɛ and 0.5).

tspiteri · 2018-01-18T21:51:22Z

About the little test, rng does not return a value from 0.0 to 0.99999994 inclusive; 0.99999994 is actually 1−ɛ/2. What it returns is a value from 0.0 to 0.9999999 inclusive; 0.9999999 is 1−ɛ. This code:

use std::f32;
fn main() {
    let (min, max) = (0.0f32, 1.0f32 - f32::EPSILON);
    const S23: f32 = (1u64 << 23) as f32;
    const S24: f32 = (1u64 << 24) as f32;
    println!("rng: {} to {}", min, max);
    println!("0.5, 1<<23: {} to {}", min + 0.5 / S23, max + 0.5 / S23);
    println!("0.25, 1<<24: {} to {}", min + 0.25 / S24, max + 0.25 / S24);
}

gives this output:

rng: 0 to 0.9999999
0.5, 1<<23: 0.000000059604645 to 0.99999994
0.25, 1<<24: 0.000000014901161 to 0.9999999

pitdicker · 2018-01-19T06:24:53Z

Ok, you are right. I confused things because I was mostly thinking in terms of number of possible values per exponent until now, and not with epsilon, the precision just outside the range we generate). This seems good to merge to me.

At some point we should have some test for the boundaries, but that is not an easy thing to do with the code as it is now.

@dhardy, what do you think?

dhardy · 2018-01-19T10:31:52Z

Okay, I'll take a look, but might take a little while. I'm hesitant to simply merge without working out the whole picture. @tspiteri did you also look at generation in the [0, 1) half-open interval — i.e. next_f64 etc. in the current master and Uniform01 in my branch?

Also it would be really good to get some unit tests (possibly for end-points and a few fixed positions); related: dhardy#72

pitdicker · 2018-01-19T14:15:50Z

I have set up testing for things like this in some branch, I will look it up. The idea is to use a custom RNG that iterates over a vector with interesting values.

dhardy · 2018-01-23T18:34:08Z

On first look this PR sounds good; I didn't go into comparisons with @pitdicker's code yet. Before merging I'd really like a few test cases however; I may have a go myself (setting up a special RNG to output end-points).

tspiteri · 2018-02-02T13:51:45Z

I've added a commit that tests both edge cases (RNG returns all zeros or all ones) for the half-open [0,1), closed [0,1] and open (0,1) intervals. For closed boundaries, equality is expected, while for open boundaries, the tests make sure that the number is not equal to the boundary but within EPSILON of it.

dhardy · 2018-02-02T16:52:09Z

src/rand_impls.rs

-        assert!(ConstantRng(0xffff_ffff).gen::<f32>() != 1.0);
-        assert!(ConstantRng(0xffff_ffff_ffff_ffff).gen::<f64>() != 1.0);
+        const EPSILON32: f32 = 1.0 / (1u32 << 23) as f32;
+        const EPSILON64: f64 = 1.0 / (1u64 << 52) as f64;


Why not use core::f32::EPSILON?

ε is defined as the difference between 1.0 and the next largest representable number; since all numbers output (excepting Closed01) are strictly less than 1.0 the exponent will be 1 less than for 1.0, hence the largest representable number less than 1 should be 1 - ε / 2.

I first used std::f32::EPSILON but that failed the continuous integration tests and I forgot about core, so I'll change this.

dhardy · 2018-02-02T16:54:45Z

src/rand_impls.rs

+        assert_eq!(zero32, 0.0);
+        assert_eq!(zero64, 0.0);
+        assert!(1.0 - EPSILON32 <= one32 && one32 < 1.0);
+        assert!(1.0 - EPSILON64 <= one64 && one64 < 1.0);


As noted above, I believe this should be exactly 1 - ε / 2. Exact equality tests here would be preferable (excepting the Open01 case near 0, where there are many near representable values).

Actually, one32 should be 1.0 - f32::EPSILON, not 1.0 - f32::EPSILON / 2, since the generator generates a multiple of ɛ in gen::<f32>. If we are to use the exact equality tests, they should be (zero32, one32) == (0.0, 1.0 - f32::EPSILON), (closed_zero32, closed_one32) == (0.0, 1.0) and (open_zero32, open_one32) == (f32::EPSILON / 2, 1.0 - f32::EPSILON / 2). Should I amend the commit to do that?

dhardy · 2018-02-03T12:53:43Z

I reworked these commits a bit; didn't touch the code changes but pushed the tests to the front and added tests for second-smallest-values. I you look at the individual commits its now clear how the tests change.

Still have to think about this a bit but I'm mostly supportive of these changes.

pitdicker · 2018-02-03T13:10:54Z

Is there something you have doubts about?

dhardy · 2018-02-03T14:57:01Z

I still didn't read through dhardy#85 or compare with your code; thought I should do that first.

Edit: also, since 1 - ε/2 is representable and we have spare bits when casting from an integer anyway, it seems odd that this aims for values every ε instead of every ε/2. Is that why the current code uses a larger SCALE factor?

dhardy · 2018-02-03T15:36:51Z

Okay, method used in next_f32 has precision of ε (23 bits for a mantissa represented using 23 bits, even though the mantissa is technically 24 bits and precision of ε / 2 is representable in the whole range). That explains why the largest value in default range is 1 - ε. This is rectifiable by adding ε/2 with prob. half, but probably not efficiently, and I'm not sure whether the scale factor would apply correctly in this case (possibly; after all the current error in the Closed01 case is ε/2).

Edit: sorry, this is what @pitdicker already did.

dhardy · 2018-02-06T15:21:15Z

Merged (via my reworking of the tests linked above)

tspiteri added 2 commits January 15, 2018 19:04

Fix off-by-one error in mantissa_bits passed to float_impls

cd830e7

remove bias in Open01

cd18f48

pitdicker reviewed Jan 18, 2018

View reviewed changes

add tests for floating-point edge cases

a5da781

dhardy reviewed Feb 2, 2018

View reviewed changes

dhardy mentioned this pull request Feb 4, 2018

discussion on uniform float distributions dhardy/rand#85

Open

dhardy closed this Feb 6, 2018

tspiteri deleted the open01-closed01 branch February 6, 2018 15:56

dhardy mentioned this pull request Mar 19, 2018

last bit of random f32 and f64 is not random #315

Closed

Uh oh!

Tweak Open01 and Closed01 #237

Tweak Open01 and Closed01 #237

Uh oh!

Conversation

tspiteri commented Jan 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitdicker commented Jan 15, 2018

Uh oh!

pitdicker commented Jan 15, 2018

Uh oh!

tspiteri commented Jan 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitdicker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tspiteri commented Jan 18, 2018

Uh oh!

pitdicker commented Jan 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Jan 19, 2018

Uh oh!

pitdicker commented Jan 19, 2018

Uh oh!

dhardy commented Jan 23, 2018

Uh oh!

tspiteri commented Feb 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhardy commented Feb 3, 2018

Uh oh!

pitdicker commented Feb 3, 2018

Uh oh!

dhardy commented Feb 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Feb 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Feb 6, 2018

Uh oh!

Uh oh!

tspiteri commented Jan 15, 2018 •

edited

Loading

tspiteri commented Jan 17, 2018 •

edited

Loading

pitdicker left a comment •

edited

Loading

pitdicker commented Jan 19, 2018 •

edited

Loading

dhardy commented Feb 3, 2018 •

edited

Loading

dhardy commented Feb 3, 2018 •

edited

Loading