Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binomial distribution #66

Merged
merged 3 commits into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,5 @@ pom.xml
!template/pom.xml
pom.xml.asc
node_modules
**.shadow-cljs
**.shadow-cljs
.#*
2 changes: 1 addition & 1 deletion package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions src/gen/distribution/commons_math.clj
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,10 @@
0 false
1 true)))))

(defn binomial-distribution
([n ^double p]
(BinomialDistribution. (rng) n p)))

(defn beta-distribution
([] (beta-distribution 1.0 1.0))
([^double alpha ^double beta]
Expand Down Expand Up @@ -129,6 +133,9 @@
(def bernoulli
(d/->GenerativeFn bernoulli-distribution 1))

(def binomial
(d/->GenerativeFn binomial-distribution 2))

(def beta
(d/->GenerativeFn beta-distribution 2))

Expand Down
16 changes: 16 additions & 0 deletions src/gen/distribution/java_util.clj
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,15 @@
(d/logpdf [_ v]
(ll/bernoulli p v)))

(defrecord Binomial [^SplittableRandom rnd n p]
d/Sample
(sample [_]
(.nextBinomial rnd n p))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no way, awesome! Is this a thing? I don't see it in the javadocs... https://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sike! i just yanked out the binomial stuff from java-util. i'll follow up with a separate PR to get samples going. then we get benchmark it against kixi and commons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i also added binomial-gf-tests that generatively samples from a distribution, which would have caught .nextBinomial)


d/LogPDF
(d/logpdf [_ v]
(ll/binomial n p v)))

(defrecord Gaussian [^SplittableRandom rnd mu sigma]
d/Sample
(sample [_]
Expand All @@ -43,6 +52,10 @@
([] (bernoulli-distribution 0.5))
([p] (->Bernoulli (rng) p)))

(defn binomial-distribution
[n p]
(->Binomial (rng) n p))

(defn uniform-distribution
([] (uniform-distribution 0.0 1.0))
([lo hi] (->Uniform (rng) lo hi)))
Expand All @@ -57,6 +70,9 @@
(def bernoulli
(d/->GenerativeFn bernoulli-distribution 1))

(def binomial
(d/->GenerativeFn binomial-distribution 2))

(def uniform
(d/->GenerativeFn uniform-distribution 2))

Expand Down
16 changes: 16 additions & 0 deletions src/gen/distribution/kixi.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
[kixi.stats.distribution :as k])
#?(:clj
(:import (kixi.stats.distribution Bernoulli Cauchy
Binomial
Exponential Beta
Gamma Normal Uniform T))))

Expand All @@ -22,6 +23,14 @@
(logpdf [this v]
(ll/bernoulli (.-p this) v)))

(extend-type #?(:clj Binomial :cljs k/Binomial)
d/Sample
(sample [this] (k/draw this))

d/LogPDF
(logpdf [this v]
(ll/binomial (.-n this) (.-p this) v)))

(extend-type #?(:clj Beta :cljs k/Beta)
d/Sample
(sample [this] (k/draw this))
Expand Down Expand Up @@ -106,6 +115,10 @@
([] (bernoulli-distribution 0.5))
([p] (k/bernoulli {:p p})))

(defn binomial-distribution
([n p]
(k/binomial {:n n :p p})))

(defn beta-distribution
([] (beta-distribution 1.0 1.0))
([alpha beta]
Expand Down Expand Up @@ -143,6 +156,9 @@
(def bernoulli
(d/->GenerativeFn bernoulli-distribution 1))

(def binomial
(d/->GenerativeFn binomial-distribution 2))

(def beta
(d/->GenerativeFn beta-distribution 2))

Expand Down
40 changes: 40 additions & 0 deletions src/gen/distribution/math/log_likelihood.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,46 @@
{:pre [(<= 0 p 1)]}
(Math/log (if v p (- 1.0 p))))

(defn log-fact
"Returns the natural logarithm of `x` factorial."
[x]
{:pre [(>= x 0)]}
(log-gamma-fn (inc x)))

(defn log-bico
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this private, or add ^:no-doc to keep it out of the public API

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually make this a local fn with letfn instead of making it its own thing. All of these preconditions are going to run every time, if you put it inside binomial you can skip running those since you're calling it in a place where you know you pass the preconditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally. i made log-fact and log-bico local with letfn.

"Returns the natural logorithm of the binomial coefficient, `n` choose `k`."
[n k]
{:pre [(integer? n)
(integer? k)
(>= k 0)
(>= n k)]}
(if (or (zero? k) (= k n))
0 ;; log 1
(- (log-fact n) (log-fact k) (log-fact (- n k)))))

(defn binomial
"Returns the log-likelihood of a [Binomial
distribution](https://en.wikipedia.org/wiki/Binomial_distribution)
parameterized by `n` (number of trials) and `p` (probability of success in
each trial) at the value `v` (number of successes)."
[n p v]
{:pre [(integer? n)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in ClojureScript, number? checks if you have a goog.math/Integer... is that right? I want to check on how we did this in Emmy, I feel like there is some other predicate we should use. I could be wrong, flying blind here before a meeting..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah not sure, but it doesn't look like cljs checks for goog.math/Integer:

https://github.com/clojure/clojurescript/blob/acbefb9b1e79b659b639919fbf96cb3726719e25/src/main/cljs/cljs/core.cljs#L2328-L2334

in emmy i don't see an integer? defined, but the value namespace does have predicates like real? and number? that do use goog.math/...:

https://github.com/mentat-collective/emmy/blob/main/src/emmy/value.cljc

BUT if i understand the compatibility layer between clj and cljs, if we use integer? in clojure code and then compile it to cljs, the cljs version of integer? should be used when the code runs in a js environment. if that's the case, then it's plausible that this is the right predicate to use?

(integer? v)
(>= v 0)
(>= n v)
(<= 0 p 1)]}
(cond
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might use case here to make this more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice, done.

(= p 0) (if (= v 0)
0.0 ;; log(1)
##-Inf) ;; log(0)
(= p 1) (if (= v n)
0.0 ;; log(1)
##-Inf) ;; log(0)
:else
(+ (log-bico n v)
(* v (Math/log p))
(* (- n v) (Math/log (- 1 p))))))

(defn cauchy
"Returns the log-likelihood of a [Cauchy
distribution](https://en.wikipedia.org/wiki/Cauchy_distribution) parameterized
Expand Down
3 changes: 3 additions & 0 deletions test/gen/distribution/commons_math_test.clj
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
(dt/bernoulli-tests commons/bernoulli-distribution)
(dt/bernoulli-gfi-tests commons/bernoulli))

(deftest binomial-tests
(dt/binomial-tests commons/binomial-distribution))

(deftest beta-tests
(dt/beta-tests commons/beta-distribution))

Expand Down
3 changes: 3 additions & 0 deletions test/gen/distribution/java_util_test.clj
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
(dt/bernoulli-tests java-util/bernoulli-distribution)
(dt/bernoulli-gfi-tests java-util/bernoulli))

(deftest binomial-tests
(dt/binomial-tests java-util/binomial-distribution))

(deftest uniform-tests
(dt/uniform-tests java-util/uniform-distribution))

Expand Down
3 changes: 3 additions & 0 deletions test/gen/distribution/kixi_test.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
(dt/bernoulli-tests kixi/bernoulli-distribution)
(dt/bernoulli-gfi-tests kixi/bernoulli))

(deftest binomial-tests
(dt/binomial-tests kixi/binomial-distribution))

(deftest beta-tests
(dt/beta-tests kixi/beta-distribution))

Expand Down
3 changes: 3 additions & 0 deletions test/gen/distribution/math/log_likelihood_test.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@
(deftest bernoulli-tests
(dt/bernoulli-tests (->logpdf ll/bernoulli)))

(deftest binomial-tests
(dt/binomial-tests (->logpdf ll/binomial)))

(deftest cauchy-tests
(dt/cauchy-tests (->logpdf ll/cauchy)))

Expand Down
81 changes: 81 additions & 0 deletions test/gen/distribution_test.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,87 @@
(Math/exp (dist/logpdf (->bernoulli p) (not v)))))
"All options sum to 1")))

(defn binomial-tests [->binomial]
;; boundaries...
(testing "when p = 0 and v = 0, probability is 1, log(1) = 0"
(is 0 (dist/logpdf (->binomial 10 0) 0)))

(testing "when p = 0 and v > 0, probability is 0, log(0) = -Inf"
(is ##-Inf (dist/logpdf (->binomial 10 0) 1)))

(testing "when p = 1 and v = n, probability is 1, log(1) = 0"
(is 0 (dist/logpdf (->binomial 10 1) 10)))

(testing "when p = 1 and v < n, probability is 0, log(0) = -Inf"
(is ##-Inf(dist/logpdf (->binomial 10 0) 1)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, missing a space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. should bb lint normally catch that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it's still there, I'm not sure, maybe it should?


;; properties...
(testing "sum of probabilities equals 1"
(with-comparator (within 1e-9)
(let [n 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about making this a generative test? See the checking style from other tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea. i changed a bunch of tests over to generative. done.

p 0.5
log-probs (map (fn [k] (dist/logpdf (->binomial n p) k)) (range 0 (inc n)))
probs (map (fn [x] (Math/exp x)) log-probs)
sum-probs (reduce + probs)]
(is (ish? 1.0 sum-probs)))))

(testing "symmetric when p = 0.5 such that binomial(k) = binomial(n -k)"
(let [n 100
p 0.5
v 10]
(is (dist/logpdf (->binomial n p) v)
(dist/logpdf (->binomial n p) (- n v)))))

(testing "mean and variance consistency where mu = n * p and variance = mu(1 - p)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you say more, even if just in comments below, about what you are testing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i added comments with more context. the basic idea is to spot check a couple of well known properties of the distribution. if that seems weird though i can leave them out for sure.

(with-comparator (within 1e-9)
(let [n 100
p 0.3
ks (range 0 (inc n))
log-probs (map (fn [k] (dist/logpdf (->binomial n p) k)) ks)
probs (map (fn [x] (Math/exp x)) log-probs)
mu (reduce + (map * probs ks))
variance (reduce + (map (fn [k p] (* p (Math/pow (- k mu) 2))) ks probs))
theoretical-mu (* n p)
theoretical-variance (* n p (- 1 p))]
(is (ish? theoretical-mu mu))
(is (ish? theoretical-variance variance)))))

(testing "spot check against scipy.stats.binom.logpmf (v1.12.0)"
(with-comparator (within 1e-9)
(is (ish? -7.13354688230902 (dist/logpdf (->binomial 1000000 0.5) 500000)))

;; TODO: failing test (off by 1.9e-9)
;; expected: (ish? -3.222306954272568 (dist/logpdf (->binomial 1000000 0.0001) 100))
;; actual: (not (ish? -3.222306954272568 -3.2223069561241857))
(is (ish? -3.222306954272568 (dist/logpdf (->binomial 1000000 0.0001) 100)))

(is (ish? -8.047189562170502 (dist/logpdf (->binomial 5 0.2) 5)))
(is (ish? -1.1856136373815076 (dist/logpdf (->binomial 50 0.99) 49)))
(is (ish? -1.185613637381508 (dist/logpdf (->binomial 50 0.01) 1)))
(is (ish? -693133.3650493873 (dist/logpdf (->binomial 1000000 0.5) 999999)))
(is (ish? 0 (dist/logpdf (->binomial 10 0) 0)))
(is (ish? 0 (dist/logpdf (->binomial 10 1) 10)))
(is (ish? -2.02597397686619 (dist/logpdf (->binomial 100 0.9) 90)))
(is (ish? -52.680257828913156 (dist/logpdf (->binomial 500 0.1) 0)))))

(testing "spot check against gen logpdf (v0.4.6)"
(with-comparator (within 1e-9)
(is (ish? -7.133546882067904 (dist/logpdf (->binomial 1000000 0.5) 500000)))

;; TODO: failing test (off by 1.9e-9)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete these if unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. i dialed down the precision threshold from 1e-9 to 1e-6 and all of these tests are now passing. for spot checking against other libraries, that seems like a peaceful threshold but feel free to push back!

;; expected: (ish? -3.222306954262436 (dist/logpdf (->binomial 1000000 0.0001) 100))
;; actual: (not (ish? -3.222306954262436 -3.2223069561241857))
(is (ish? -3.222306954262436 (dist/logpdf (->binomial 1000000 0.0001) 100)))

(is (ish? -8.047189562170502 (dist/logpdf (->binomial 5 0.2) 5)))
(is (ish? -1.185613637381516 (dist/logpdf (->binomial 50 0.99) 49)))
(is (ish? -1.1856136373815152 (dist/logpdf (->binomial 50 0.01) 1)))
(is (ish? -693133.3650493873 (dist/logpdf (->binomial 1000000 0.5) 999999)))
(is (ish? 0 (dist/logpdf (->binomial 10 0) 0)))
(is (ish? 0 (dist/logpdf (->binomial 10 1) 10)))
(is (ish? -2.025973976866184 (dist/logpdf (->binomial 100 0.9) 90)))
(is (ish? -52.680257828913156 (dist/logpdf (->binomial 500 0.1) 0))))))

(defn categorical-tests [->cat]
(checking "map => categorical properties"
[p (gen-double 0 1)]
Expand Down
Loading