diff --git a/CHANGELOG.md b/CHANGELOG.md index 855f1443..ea9128e6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,13 @@ mutations via `singletons_phased=False` option. The API is preliminary and may change. +**Documentation** + +- Fixed description of priors for variational gamma method, which were referred + to a 'flat' or improper but are actually empirical Bayes priors on root node ages, + fit by expectation maximization. + + ## [0.2.0] - 2024-06-10 diff --git a/docs/introduction.md b/docs/introduction.md index ce3994fc..f24031ab 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -50,7 +50,9 @@ Optionally, a posterior distribution of node times can be generated Since the method is Bayesian, technically it requires each node to have a prior distribution of times. The default `variational_gamma` method currently -uses an improper (flat) prior which does not need any user input. However, +only imposes a prior on internal nodes via their topological connection to the +root nodes (which are given an exponential prior). This Empirical Bayes procedure +does not need any user input. However, the alternative discrete-time methods currently require the prior to be explicitly provided, either via providing an estimated effective population size (which is then used in the diff --git a/docs/methods.md b/docs/methods.md index bd6b9054..27493ebf 100644 --- a/docs/methods.md +++ b/docs/methods.md @@ -145,11 +145,11 @@ TODO: describe the rescaling step in more detail. Could also link to [the popula ## Discrete-time The available discrete-time algorithms are the `inside_outside` and `maximization` methods. -For historical reasons, these approaches do not use a flat prior, -but use the [conditional coalescent prior](sec_priors_conditional_coalescent), +For historical reasons, these approaches use an informative (node-specific) prior, +the [conditional coalescent prior](sec_priors_conditional_coalescent), which means that you either need to provide them with an estimated effective population size, or a [priors](sec_priors) object. Future improvements may -allow flat priors to be set in discrete time methods, and coalescent priors +allow Empirical Bayes priors to be set in discrete time methods, and coalescent priors to be set in continuous time methods. The _tsdate_ discrete time methods have the following advantages and disadvantages: diff --git a/docs/popsize.md b/docs/popsize.md index cfdc15e3..5067a437 100644 --- a/docs/popsize.md +++ b/docs/popsize.md @@ -123,8 +123,11 @@ ax.set_ylabel("Population size", rotation=90); ## Misspecified priors -The flat prior for the default `variational_gamma` [method](sec_methods) is robust to -deviations from neutrality and panmixia. However, alternative approaches such as the +The rescaling method for the default `variational_gamma` [method](sec_methods) +(inspired by the rescaling algorithm described by [Deng et al +2024](https://doi.org/10.1101/2024.03.16.585351)), coupled with the absence of a strong +informative prior on internal nodes, means that we expect this method to be +robust to deviations from neutrality and panmixia. However, alternative approaches such as the `inside_outside` method default to a coalescent prior that assumes a fixed population size. Hence these approaches currently perform very poorly on such data: diff --git a/docs/priors.md b/docs/priors.md index 6e56d767..48a3d29d 100644 --- a/docs/priors.md +++ b/docs/priors.md @@ -20,8 +20,10 @@ kernelspec: Note that currently, you only need to set specific priors if you are using the alternative `inside_outside` or `maximization` [methods](sec_methods). This page is primarily left in the -documentation for historical reasons: for most purposes we recommend the default -`variational_gamma` method, which uses an unparameterized flat (improper) prior. +documentation for historical reasons. For most purposes we recommend the default +`variational_gamma` method: this only sets a prior on root ages and not on internal nodes +(whose priors essentially take a flat value, contrained only by their topological connection +to the roots, and which are updated via the expectation propagation mechanism). ## Basic usage @@ -108,7 +110,7 @@ flexibility.) ## The conditional coalescent -Currently, non-flat priors are based on the +Currently, informative node-specific priors are based on the [conditional coalescent](http://dx.doi.org/10.1006/tpbi.1998.1411). Specifically, in a tree sequence of `s` samples, the distribution of times for a node that always has `n` descendant samples is taken from the theoretical distribution of times diff --git a/docs/usage.md b/docs/usage.md index 0fc0d767..d89f2385 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -218,7 +218,7 @@ mu_per_bp_per_year = 3.4e-10 # Human generation time ~ 29 years ts_years = tsdate.date(ts, mutation_rate=mu_per_bp_per_year, time_units="years") ``` -However, if you are specifying a non-flat prior, e.g. because you are using a +However, if you are specifying a node-specific prior, e.g. because you are using a discrete-time method, you will also need to change the scale of the prior. In particular, if you are setting the prior using the `population_size` argument, you will also need to modify that by multiplying it by the generation time. For example: @@ -349,4 +349,4 @@ If unary regions are *correctly* estimated, they can help improve dating slightl There is therefore a specific route to date a tree sequence containing locally unary nodes. For example, for discrete time methods, you can use the `allow_unary` option when {ref}`building a prior`. -::: \ No newline at end of file +:::