From 6d6b024ea4759f8d9e0fcb7e6d7f0baa38cd0093 Mon Sep 17 00:00:00 2001 From: Nate Pope Date: Fri, 26 Jul 2024 21:47:02 -0700 Subject: [PATCH 1/4] Fix references to "flat" priors in docs Fix references to "flat" priors in docs --- CHANGELOG.md | 7 +++++++ docs/introduction.md | 2 +- docs/methods.md | 8 ++++---- docs/popsize.md | 6 ++++-- docs/priors.md | 5 +++-- docs/usage.md | 4 ++-- 6 files changed, 21 insertions(+), 11 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 855f1443..ea9128e6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,13 @@ mutations via `singletons_phased=False` option. The API is preliminary and may change. +**Documentation** + +- Fixed description of priors for variational gamma method, which were referred + to a 'flat' or improper but are actually empirical Bayes priors on root node ages, + fit by expectation maximization. + + ## [0.2.0] - 2024-06-10 diff --git a/docs/introduction.md b/docs/introduction.md index ce3994fc..1beebfdc 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -50,7 +50,7 @@ Optionally, a posterior distribution of node times can be generated Since the method is Bayesian, technically it requires each node to have a prior distribution of times. The default `variational_gamma` method currently -uses an improper (flat) prior which does not need any user input. However, +uses an Empirical Bayes prior on root nodes which does not need any user input. However, the alternative discrete-time methods currently require the prior to be explicitly provided, either via providing an estimated effective population size (which is then used in the diff --git a/docs/methods.md b/docs/methods.md index b83edf84..236fd0b5 100644 --- a/docs/methods.md +++ b/docs/methods.md @@ -137,11 +137,11 @@ TODO: describe the rescaling step in more detail. Could also link to [the popula ## Discrete-time The available discrete-time algorithms are the `inside_outside` and `maximization` methods. -For historical reasons, these approaches do not use a flat prior, -but use the [conditional coalescent prior](sec_priors_conditional_coalescent), +For historical reasons, these approaches use an informative (node-specific) prior, +the [conditional coalescent prior](sec_priors_conditional_coalescent), which means that you either need to provide them with an estimated effective population size, or a [priors](sec_priors) object. Future improvements may -allow flat priors to be set in discrete time methods, and coalescent priors +allow Empirical Bayes priors to be set in discrete time methods, and coalescent priors to be set in continuous time methods. The _tsdate_ discrete time methods have the following advantages and disadvantages: @@ -174,4 +174,4 @@ have no mapped mutations (e.g. in the centromere), which can be removed by The `maximization` approach is slightly less accurate empirically, and will not return true posteriors, but is theoretically robust and -additionally is always numerically stable. \ No newline at end of file +additionally is always numerically stable. diff --git a/docs/popsize.md b/docs/popsize.md index cfdc15e3..25a8ea50 100644 --- a/docs/popsize.md +++ b/docs/popsize.md @@ -123,8 +123,10 @@ ax.set_ylabel("Population size", rotation=90); ## Misspecified priors -The flat prior for the default `variational_gamma` [method](sec_methods) is robust to -deviations from neutrality and panmixia. However, alternative approaches such as the +The rescaling method for the default `variational_gamma` [method](sec_methods) +(based on the rescaling algorithm described by [Deng et al +2024](https://doi.org/10.1101/2024.03.16.585351)) adapts to deviations from +neutrality and panmixia. However, alternative approaches such as the `inside_outside` method default to a coalescent prior that assumes a fixed population size. Hence these approaches currently perform very poorly on such data: diff --git a/docs/priors.md b/docs/priors.md index 6e56d767..29f0fc7f 100644 --- a/docs/priors.md +++ b/docs/priors.md @@ -21,7 +21,8 @@ kernelspec: Note that currently, you only need to set specific priors if you are using the alternative `inside_outside` or `maximization` [methods](sec_methods). This page is primarily left in the documentation for historical reasons: for most purposes we recommend the default -`variational_gamma` method, which uses an unparameterized flat (improper) prior. +`variational_gamma` method, which uses an Empirical Bayes prior on root ages, fit by +expectation maximization. ## Basic usage @@ -108,7 +109,7 @@ flexibility.) ## The conditional coalescent -Currently, non-flat priors are based on the +Currently, informative node-specific priors are based on the [conditional coalescent](http://dx.doi.org/10.1006/tpbi.1998.1411). Specifically, in a tree sequence of `s` samples, the distribution of times for a node that always has `n` descendant samples is taken from the theoretical distribution of times diff --git a/docs/usage.md b/docs/usage.md index 0fc0d767..d89f2385 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -218,7 +218,7 @@ mu_per_bp_per_year = 3.4e-10 # Human generation time ~ 29 years ts_years = tsdate.date(ts, mutation_rate=mu_per_bp_per_year, time_units="years") ``` -However, if you are specifying a non-flat prior, e.g. because you are using a +However, if you are specifying a node-specific prior, e.g. because you are using a discrete-time method, you will also need to change the scale of the prior. In particular, if you are setting the prior using the `population_size` argument, you will also need to modify that by multiplying it by the generation time. For example: @@ -349,4 +349,4 @@ If unary regions are *correctly* estimated, they can help improve dating slightl There is therefore a specific route to date a tree sequence containing locally unary nodes. For example, for discrete time methods, you can use the `allow_unary` option when {ref}`building a prior`. -::: \ No newline at end of file +::: From 0161785e183088bc81f040846be5150ace3feaa4 Mon Sep 17 00:00:00 2001 From: Yan Wong Date: Sat, 27 Jul 2024 06:10:13 +0100 Subject: [PATCH 2/4] Update introduction.md --- docs/introduction.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index 1beebfdc..f24031ab 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -50,7 +50,9 @@ Optionally, a posterior distribution of node times can be generated Since the method is Bayesian, technically it requires each node to have a prior distribution of times. The default `variational_gamma` method currently -uses an Empirical Bayes prior on root nodes which does not need any user input. However, +only imposes a prior on internal nodes via their topological connection to the +root nodes (which are given an exponential prior). This Empirical Bayes procedure +does not need any user input. However, the alternative discrete-time methods currently require the prior to be explicitly provided, either via providing an estimated effective population size (which is then used in the From 47bc8e301be0c0afd350d21a9fd1cf112da85fc9 Mon Sep 17 00:00:00 2001 From: Yan Wong Date: Sat, 27 Jul 2024 07:17:19 +0100 Subject: [PATCH 3/4] Update priors.md --- docs/priors.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/priors.md b/docs/priors.md index 29f0fc7f..48a3d29d 100644 --- a/docs/priors.md +++ b/docs/priors.md @@ -20,9 +20,10 @@ kernelspec: Note that currently, you only need to set specific priors if you are using the alternative `inside_outside` or `maximization` [methods](sec_methods). This page is primarily left in the -documentation for historical reasons: for most purposes we recommend the default -`variational_gamma` method, which uses an Empirical Bayes prior on root ages, fit by -expectation maximization. +documentation for historical reasons. For most purposes we recommend the default +`variational_gamma` method: this only sets a prior on root ages and not on internal nodes +(whose priors essentially take a flat value, contrained only by their topological connection +to the roots, and which are updated via the expectation propagation mechanism). ## Basic usage From 8e246fce5e0bd1f1537442adadd090399ad90c69 Mon Sep 17 00:00:00 2001 From: Yan Wong Date: Sat, 27 Jul 2024 07:21:36 +0100 Subject: [PATCH 4/4] Update popsize.md --- docs/popsize.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/popsize.md b/docs/popsize.md index 25a8ea50..5067a437 100644 --- a/docs/popsize.md +++ b/docs/popsize.md @@ -124,9 +124,10 @@ ax.set_ylabel("Population size", rotation=90); ## Misspecified priors The rescaling method for the default `variational_gamma` [method](sec_methods) -(based on the rescaling algorithm described by [Deng et al -2024](https://doi.org/10.1101/2024.03.16.585351)) adapts to deviations from -neutrality and panmixia. However, alternative approaches such as the +(inspired by the rescaling algorithm described by [Deng et al +2024](https://doi.org/10.1101/2024.03.16.585351)), coupled with the absence of a strong +informative prior on internal nodes, means that we expect this method to be +robust to deviations from neutrality and panmixia. However, alternative approaches such as the `inside_outside` method default to a coalescent prior that assumes a fixed population size. Hence these approaches currently perform very poorly on such data: