Binomial distribution #66

eightysteele · 2024-02-27T20:33:27Z

Purpose

This PR adds support for the binomial distribution.

Log-likelihood

The log PMF for the binomial distribution gives the log probability of observing $k$ successes out of $n$ trials:

$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$

For computing that binomial coefficient, I used the existing log-gamma-fn. Otherwise for large values of $n$ and $k$ it would have overflown on factorials.

Testing

In distribution_test.clj I tested boundary conditions on $n$, $p$, and $k$ and verified some different properties of the distribution that should definitely hold (sum of probabilities, symmetry when $p=0$, mean and variance consistency). Both bb test:clj and bb test:cljs are passing for these.

I also spot checked a range of values against SciPy and Gen.jl (within a precision threshold of 1e-9 of expected values). One of theses tests is currently failing because the difference between expected and actual is right at the threshold boundary. If this is problematic, let me know and I can dig in!

      ;; TODO: failing test (off by 1.9e-9)
      ;; expected: (ish? -3.222306954272568 (dist/logpdf (->binomial 1000000 0.0001) 100))
      ;; actual: (not (ish? -3.222306954272568 -3.2223069561241857))
      (is (ish? -3.222306954272568 (dist/logpdf (->binomial 1000000 0.0001) 100)))

Next steps

A spot check and some feedback would be great. It's entirely possible that I'm missing something here. Happy to make changes.

sritchie

Nice work, just a few comments!

sritchie · 2024-02-27T20:35:01Z

src/gen/distribution/java_util.clj

+(defrecord Binomial [^SplittableRandom rnd n p]
+  d/Sample
+  (sample [_]
+    (.nextBinomial rnd n p))


no way, awesome! Is this a thing? I don't see it in the javadocs... https://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html

sike! i just yanked out the binomial stuff from java-util. i'll follow up with a separate PR to get samples going. then we get benchmark it against kixi and commons.

(i also added binomial-gf-tests that generatively samples from a distribution, which would have caught .nextBinomial)

sritchie · 2024-02-27T20:40:22Z

src/gen/distribution/math/log_likelihood.cljc

+         (>= v 0)
+         (>= n v)
+         (<= 0 p 1)]}
+  (cond


you might use case here to make this more efficient.

oh nice, done.

sritchie · 2024-02-27T20:41:11Z

src/gen/distribution/math/log_likelihood.cljc

+  {:pre [(>= x 0)]}
+  (log-gamma-fn (inc x)))
+
+(defn log-bico


let's make this private, or add ^:no-doc to keep it out of the public API

I would actually make this a local fn with letfn instead of making it its own thing. All of these preconditions are going to run every time, if you put it inside binomial you can skip running those since you're calling it in a place where you know you pass the preconditions.

totally. i made log-fact and log-bico local with letfn.

sritchie · 2024-02-27T20:42:57Z

test/gen/distribution_test.cljc

+    (is 0 (dist/logpdf (->binomial 10 1) 10)))
+
+  (testing "when p = 1 and v < n, probability is 0, log(0) = -Inf"
+    (is ##-Inf(dist/logpdf (->binomial 10 0) 1)))


whoops, missing a space

fixed. should bb lint normally catch that?

looks like it's still there, I'm not sure, maybe it should?

sritchie · 2024-02-27T20:43:42Z

test/gen/distribution_test.cljc

+  ;; properties...
+  (testing "sum of probabilities equals 1"
+    (with-comparator (within 1e-9)
+      (let [n 100


how about making this a generative test? See the checking style from other tests.

great idea. i changed a bunch of tests over to generative. done.

sritchie · 2024-02-27T20:44:30Z

test/gen/distribution_test.cljc

+      (is (dist/logpdf (->binomial n p) v)
+          (dist/logpdf (->binomial n p) (- n v)))))
+
+  (testing "mean and variance consistency where mu = n * p and variance = mu(1 - p)"


can you say more, even if just in comments below, about what you are testing here?

yeah, i added comments with more context. the basic idea is to spot check a couple of well known properties of the distribution. if that seems weird though i can leave them out for sure.

sritchie · 2024-02-27T20:44:45Z

test/gen/distribution_test.cljc

+    (with-comparator (within 1e-9)
+      (is (ish? -7.133546882067904 (dist/logpdf (->binomial 1000000 0.5) 500000)))
+
+      ;; TODO: failing test (off by 1.9e-9)


delete these if unnecessary

done. i dialed down the precision threshold from 1e-9 to 1e-6 and all of these tests are now passing. for spot checking against other libraries, that seems like a peaceful threshold but feel free to push back!

codecov-commenter · 2024-02-27T20:47:11Z

Codecov Report

Attention: Patch coverage is 96.66667% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 70.78%. Comparing base (106b026) to head (fe62fbc).

Files	Patch %	Lines
src/gen/distribution/math/log_likelihood.cljc	94.11%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #66      +/-   ##
==========================================
+ Coverage   70.38%   70.78%   +0.39%     
==========================================
  Files          16       16              
  Lines        1084     1102      +18     
  Branches       23       24       +1     
==========================================
+ Hits          763      780      +17     
  Misses        298      298              
- Partials       23       24       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eightysteele · 2024-02-29T02:44:34Z

@sritchie thanks for the fast feedback! just pushed updates and responded to comments.

i took a look at that score function in webppl to see how it compared to the lanczos approximation approach in log-gamma-fn:

https://github.com/probmods/webppl/blob/dev/src/dists/binomial.ad.js#L66-L107

i took a quick stab at porting to clojure so that i could run a comparison (this can almost certainly be optimized):

(defn binomial
  [n p v]
  {:pre [(integer? n)
         (integer? v)
         (>= v 0)
         (>= n v)
         (<= 0 p 1)]}
  (letfn [(log-fact
            [x]
            (if (< x 2)
              0
              (reduce + (map #(Math/log %) (range 2 (inc x))))))
          (log-bico
            [n m o]
            (let [x (reduce + (map #(Math/log %) (range (inc o) (inc n))))]
              (- x (log-fact m))))]
    (let [[m o] (if (< v (- n v)) [v (- n v)] [(- n v) v])]
      (case p
        0 (if (= v 0)
            0.0     ;; log(1)
            ##-Inf) ;; log(0))
        1 (if (= v n)
            0.0     ;; log(1)
            ##-Inf) ;; log(0)
        (+ (log-bico n m o)
           (* v (Math/log p))
           (* (- n v) (Math/log (- 1 p))))))))

this particular implementation turned out to be pretty slow with a bit of precision loss as $n$ and $v$ got large (7 minutes to run bb test:clj compared to 16 seconds with log-gamma-fn).

here's my hypothesis. the runtime complexity of my original binomial is basically $O(1)$, driven by the lanczos approximation in log-gamma-fn. that's because it involves a fixed number of coefficients and iterations that do not vary with the input. this is assuming constant time for arithmetic operations.

for the binomial test above, the complexity is $O(n)$ from the linear scaling of log summations within the range $n - o$ and the factorial computation $O(m)$. with the optimization to minimize computation range, the worst-case scenario scales linearly with $n$, which bumps the computational cost for larger inputs.

all to say, it seems reasonable to keep the original implementation for now.

eightysteele · 2024-02-29T19:58:47Z

purely for my own learning, i worked out a slightly more efficient implementation that memoizes factorials and swaps map/reduce for loop/recur (to factor out intermediate collections). it's roughly 78% faster than that first version, but still a minute slower than the log-gamma-fn version.

(def memoized-log-fact
  (memoize
   (fn [x]
     (loop [i x
            acc 0.0]
       (if (< i 2)
         acc
         (recur (dec i) (+ acc (Math/log i))))))))

(defn log-bico
  [n m o]
  (let [numerator (loop [i (inc o)
                         acc 0.0]
                    (if (> i n)
                      acc
                      (recur (inc i) (+ acc (Math/log i)))))]
    (- numerator (memoized-log-fact m))))

(defn binomial
  [n p v]
  {:pre [(integer? n)
         (integer? v)
         (>= v 0)
         (>= n v)
         (<= 0 p 1)]}
  (let [[m o] (if (< v (- n v)) [v (- n v)] [(- n v) v])]
    (case p
      0 (if (= v 0) 0.0 ##-Inf)
      1 (if (= v n) 0.0 ##-Inf)
      (+ (log-bico n m o)
         (* v (Math/log p))
         (* (- n v) (Math/log (- 1 p)))))))

eightysteele · 2024-03-08T18:18:28Z

@sritchie with log-gamma-fn in terms of kixi, all tests are now passing (including binomial comparisons to scipy and gen.jl with a precision threshold dialed up to 1e-12). it is my belief that the requested changes here have been addressed, but lmk!

sritchie · 2024-03-12T21:27:06Z

src/gen/distribution/math/log_likelihood.cljc

+  parameterized by `n` (number of trials) and `p` (probability of success in
+  each trial) at the value `v` (number of successes)."
+  [n p v]
+  {:pre [(integer? n)


I think in ClojureScript, number? checks if you have a goog.math/Integer... is that right? I want to check on how we did this in Emmy, I feel like there is some other predicate we should use. I could be wrong, flying blind here before a meeting..

yeah not sure, but it doesn't look like cljs checks for goog.math/Integer:

https://github.com/clojure/clojurescript/blob/acbefb9b1e79b659b639919fbf96cb3726719e25/src/main/cljs/cljs/core.cljs#L2328-L2334

in emmy i don't see an integer? defined, but the value namespace does have predicates like real? and number? that do use goog.math/...:

https://github.com/mentat-collective/emmy/blob/main/src/emmy/value.cljc

BUT if i understand the compatibility layer between clj and cljs, if we use integer? in clojure code and then compile it to cljs, the cljs version of integer? should be used when the code runs in a js environment. if that's the case, then it's plausible that this is the right predicate to use?

sritchie · 2024-03-12T22:31:17Z

Thanks @eightysteele , this looks great!!

eightysteele · 2024-03-13T00:08:51Z

thanks @sritchie! \o/

initial commit

346b5d0

sritchie requested changes Feb 27, 2024

View reviewed changes

updates with requested changes

0484bfb

eightysteele requested a review from sritchie February 29, 2024 03:06

log-gamma-fn dispatch to kixi

fe62fbc

sritchie reviewed Mar 12, 2024

View reviewed changes

sritchie approved these changes Mar 12, 2024

View reviewed changes

sritchie merged commit 3c03777 into ChiSym:main Mar 12, 2024
4 checks passed

eightysteele deleted the eighty-binomial branch April 11, 2024 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binomial distribution #66

Binomial distribution #66

eightysteele commented Feb 27, 2024

sritchie left a comment

sritchie Feb 27, 2024

eightysteele Feb 29, 2024

eightysteele Feb 29, 2024

sritchie Feb 27, 2024

eightysteele Feb 29, 2024

sritchie Feb 27, 2024

sritchie Feb 27, 2024

eightysteele Feb 29, 2024

sritchie Feb 27, 2024

eightysteele Feb 29, 2024

sritchie Mar 12, 2024

sritchie Feb 27, 2024

eightysteele Feb 29, 2024

sritchie Feb 27, 2024

eightysteele Feb 29, 2024

sritchie Feb 27, 2024

eightysteele Feb 29, 2024

codecov-commenter commented Feb 27, 2024 •

edited

Loading

eightysteele commented Feb 29, 2024

eightysteele commented Feb 29, 2024 •

edited

Loading

eightysteele commented Mar 8, 2024

sritchie Mar 12, 2024

eightysteele Mar 13, 2024

sritchie commented Mar 12, 2024

eightysteele commented Mar 13, 2024

Binomial distribution #66

Binomial distribution #66

Conversation

eightysteele commented Feb 27, 2024

Purpose

Log-likelihood

Testing

Next steps

sritchie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 27, 2024 • edited Loading

Codecov Report

eightysteele commented Feb 29, 2024

eightysteele commented Feb 29, 2024 • edited Loading

eightysteele commented Mar 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sritchie commented Mar 12, 2024

eightysteele commented Mar 13, 2024

codecov-commenter commented Feb 27, 2024 •

edited

Loading

eightysteele commented Feb 29, 2024 •

edited

Loading