Skip to content

Latest commit

 

History

History
199 lines (145 loc) · 7.75 KB

README.md

File metadata and controls

199 lines (145 loc) · 7.75 KB

imprecise-evolution

Imprecise probability and population genetics in OCaml using Owl and other libraries. Experimental work in progress.

(This page needs revision.)

Some of this will be reorganized in the future. Some of it was written when I was learning OCaml. (I am still learning OCaml.)

A few resources for learning about imprecise probability or population genetics are listed below.

The general idea

The modules I consider most useful here make use of what I call "distlists", which represent sequences of sets of probability distributions. These are probabilities of frequencies of organism traits in a population of organisms. For example, these might be frequencies of (type) alleles at a genetic locus, with two (token) alleles for each organism, so that the total number of (token) alleles is 2N where N the number of organisms in the population. (For "type" and "token" see Type-token distinction; these terms come from philosophy but should be a bit clearer for novices than ways of saying the same thing that are common in population genetics.)

Sets of probability distributions are often known as "credal sets" in the imprecise probability literature, although sometimes that term is used more narrowly. (However, "credal" comes from "credence", i.e. probability as degree of belief as in the Bayesian tradition. I think of the the probabilities here as objective probabilities, which are usually called "chances" in contemporary philosophy of probability.)

Credal sets here are implemented as regular OCaml lists of Owl row vector matrices (i.e. matrices with dimension 1xn). Distlists are implemented as lazy lists of credal sets, so a distlist is a lazy list of regular lists of matrices. Currently, I'm using LazyList and List from the Batteries Included library.

What's here

Note that in OCaml, a file named xyz.ml typically defines a module named Xyz.

Some of the following might be a little bit out of date at any particular time.

src/lib/models

tranmats.ml: Basic distlist functions

wrightfisher.ml: Simultaneous Wright-Fisher models

For modeling finite sets of transition probabilities generated by Wright-Fisher models with natural selection.

Uses distlists from Tranmats.

The credal sets here are supposed to represent collections of finite, discrete probability distributions. Typically, each credal set in a distlist will contain an expontentially increasing number of distributions.

setchains.ml: Implementations of algorithms in ch. 2 of Markov Set-Chains, by Darald J. Hartfiel, Springer 1998

These model continuous sets of stochastic transition matrices that fall within a matrix "interval", i.e. all stochastic matrices such that each element x is s.t. l <= x <= h, where l and h are corresponding elements of a low matrix L and a high matrix H. You can also start from an initial probability "interval", i.e. all stochastic vectors such that each element x is s.t. l <= x <= h, where l and h are correspoding elements of low and high vectors. Note that the low and high vectors are not typically stochastic vectors, nor are the low and high matrices typically transition matrices. (Stochastic vectors here are row vectors, and it's each row of a matrix that sums to 1, so multiplication of a vector and a matrix typically happens with the vector on the left.)

Uses distlists from Tranmats by way of Wrightfisher.

The credal sets in the distlists are all of length 2, each pair representing a tight estimate on the upper and lower bounds of a continuous "interval" of probability distributions.

credalsetIO.ml: Generate data files and PDF plots

Creates data files and PDF files from distlists.

src/lib/utils

genl.ml: general-purpose utilities

prob.ml: probability-related utilities

src/misc

miscellaneous code, potentially useful now or in the future, including:

setchain_egs.ml: definitions for specific examples in Hartfiel (see above)

src/bin

Once you have installed OCaml 4.06.1 and the necessary libraries (a big job), you should be able to build executable files for the following by entering make in the root directory of the repo. The exe files should show up in _build/default/src/bin.

wrightfisherPDFs.ml: program to generate Wright-Fisher model PDF files.

(Note that you're not restricted to imprecise-probability models. You can use this program to create plots for standard, precise-probability Wright-Fisher diploid models.)

setchainPDFs.ml: program to generate set-chain PDF files.

setchaintest.ml: miscellaneous tests for setchains.ml

(I last compiled the executable using OCaml 4.06.1 and jbuilder 1.0+beta20. jbuilder has subsequently been renamed to dune, and there have been some significant changes in both OCaml, dune, and the libraries on which my code depends, so you may have to install old versions of packages to get this to work without modification.)

doc

Miscellaneous notes. Don't expect to find any systematic documentation here.

Resources for learning about imprecise probability or population genetics

Imprecise probability

SIPTA: The Society for Imprecise Probability: Theories and Applications.

Imprecise Probabilities by Seamus Bradley in the Stanford Encyclopedia of Philosophy.

Terrence L. Fine's papers. (There are many other important writers on imprecise probability (IP), but most of the work is focused on IP as an extension of Bayesian probability. Fine and his collaborators have done the most work on objective imprecise probability, which is what particularly interest me.)

A few good books:

  • A handbook-style introductory survey, Introduction to Imprecise Probabilities edited by Augustin et al. (Remember that when it comes to math, one person's introduction is another person's advanced textbook.)
  • The perhaps easier Lower Previsions by Troffaes and de Cooman. (The same comment applies.)
  • A philosophical classic, The Enterprise of Knowledge by Isaac Levi.
  • The philosophical and mathematical classic of the field, Statistical Reasoning with Imprecise Probaiblities by Peter Walley.
  • Markov Set-Chains by Darald J. Hartfiel. A beautiful little book on methods for extending Markov chains for one kind of imprecise probability (or uncertainty, which is how Hartfiel frames it).

There's much more.

Population genetics

Where to start? This is a huge area in evolutionary biology.

Samir Okasha's article in the Stanford Encyclopedia of Philosophy provides an overview of basic ideas as well as history and philosophical issues.

My favorite textbooks include:

  • Population Genetics: A Concise Guide by John H. Gillespie. A great way to get into the mindset of population genetics. The first and second editions are similar, but there is valuable material in each that's not in the other. Includes some topics that are rarely covered in introductory texts.

  • Elements of Evolutionary Genetics by Charlesworth and Charlesworth. Yes, it's a thick book, but that's because the Charlesworths take care to explain ideas clearly and to explore many details and subtleties. Beautiful.

  • Evolutionary Theory: Mathematical and Conceptual Foundations by Sean H. Rice. A philosophically sensitive book by a mathematical biologist. Rice provides a unique but illuminating conceptual organization for population genetics and related areas.