Account for overlapping reads in Theoretical Sensitivity model #1802

fleharty · 2022-04-26T16:55:04Z

Description

This modifies the Theoretical Sensitivity model to account for overlapping reads. Overlapping reads are particularly important at low allele fractions because overlaps allow for the reduction of sequencer error because overlapping bases should be identical, and generally have much higher effective base qualities.

Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

Added or modified tests to cover changes and any new functionality
Edited the README / documentation (if applicable)
All tests passing on Travis

Review

Final thumbs-up from reviewer
Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

… update tool

fleharty · 2022-04-26T16:55:55Z

@kachulis Would you have a moment to take a look at this?

@davidbenjamin This might be of interest to you since you implemented the original TheoreticalHetSensitivity model.

src/main/java/picard/analysis/TheoreticalSensitivity.java

kachulis · 2022-04-27T18:08:04Z

src/main/java/picard/analysis/CollectWgsMetrics.java

+            log.info("Calculating theoretical sensitivity at " + ALLELE_FRACTION.size() + " allele fractions.");
+
+            List<TheoreticalSensitivityMetrics> tsm = TheoreticalSensitivity.calculateSensitivities(SAMPLE_SIZE,
+                    collector.getUnfilteredDepthHistogram(), collector.getUnfilteredBaseQHistogram(), ALLELE_FRACTION, collector.basesExcludedByOverlap, PCR_ERROR_RATE);


collector.basesExcludedByOverlap needs to be divided by total bases to get fraction overlapping

Thanks for catching this. I think this is fixed by using collector.getMetrics(dupeFilter, adapterFilter, mapqFilter, pairFilter).PCT_EXC_OVERLAP instead.

src/main/java/picard/analysis/TheoreticalSensitivity.java

kachulis · 2022-04-28T18:27:03Z

src/main/java/picard/analysis/TheoreticalSensitivity.java

+                        // of qualities cannot exceed the PCR error rate.
+                        sumOfQualities += Math.min(qualityRW.draw() + qualityRW.draw(), pcrErrorRate);
+                    }
+                }
            } else {


Don't you need to adjust the large number approx code as well? Particularly since you are introducing a PCR base quality cap.

Yes, I don't like the large number approx code. It adds complexity to the code, and really doesn't speed it up enough to justify it.

So I'm removing the approximation code.

src/main/java/picard/analysis/CollectWgsMetrics.java

src/test/java/picard/IntelInflaterDeflaterLoadTest.java

src/test/java/picard/analysis/TheoreticalSensitivityTest.java

kachulis · 2022-05-03T16:31:52Z

src/main/java/picard/analysis/TheoreticalSensitivity.java

@@ -319,8 +337,17 @@ private static double sensitivityAtConstantDepth(final int depth, final Histogra
     * @param alleleFraction the allele fraction to evaluate sensitivity at
     * @return Theoretical sensitivity for the given arguments over a particular depth distribution.
     */
-    public static double theoreticalSensitivity(final Histogram<Integer> depthHistogram, final Histogram<Integer> qualityHistogram,
-                                                final int sampleSize, final double logOddsThreshold, final double alleleFraction) {
+    public static double theoreticalSensitivity(final Histogram<Integer> depthHistogram,


do we need to keep this method overload, given that it is only used in tests? I get that it'd be an API change, but not sure how we feel about that. You're changing the API anyway with respect to sensitivityAtConstantDepth, so probably best to be consistent.

kachulis · 2022-05-03T16:34:50Z

src/test/java/picard/analysis/TheoreticalSensitivityTest.java

        };
    }

    @Test(dataProvider = "TheoreticalSensitivityDataProvider")
-    public void testSensitivity(final double expected, final File metricsFile, final double alleleFraction, final int sampleSize) throws Exception {
+    public void testSensitivity(final double expected, final File metricsFile, final double alleleFraction, final int sampleSize, final boolean useOverlapProbability) throws Exception {


add some tests that explore a range of different overlap probabilities and pcr error rates

… Chris

davidbenjamin · 2022-06-13T19:11:23Z

src/main/java/picard/analysis/TheoreticalSensitivity.java

-                        // of qualities cannot exceed the PCR error rate.
-                        sumOfQualities += Math.min(qualityRW.draw() + qualityRW.draw(), pcrErrorRate);
-                    }
+            // If the number of alt reads is "small" we draw from the actual base quality distribution.


Since it looks like you're getting rid of the Gaussian approximation at larger depth it seems that this comment is now moot.

davidbenjamin · 2022-06-13T19:14:09Z

src/main/java/picard/analysis/TheoreticalSensitivity.java

                }
-            } else {


Have you profiled this change, in particular when we care about extremely deep data?

The tests are a bit long now, I'm trying to resolve the issue.

fleharty · 2022-06-23T13:35:36Z

@davidbenjamin @kachulis

One thing that bothers me about this model is that it has non-monotonic behavior.
There are cases where increasing the depth causes a loss of sensitivity. I think this is really counter-intuitive and probably actually wrong, so it makes me wonder if there is a problem with the model.

As an example:
logOddsThreshold = 10
alleleFraction = 0.3
baseQualities = 30 (not a histogram, just all base qualities are q30)

For uniform depth of 20, the sensitivity is 0.76
For uniform depth of 21, the sensitivity is 0.64

This is a really big difference, and I really don't expect the theoretical sensitivity to drop by 10% simply by increasing the depth from 20 to 21.

I don't think this is a bug in the code, I double checked this in R, and got the same result. The reason seems to be because the threshold for calling a variant increases when the depth increases from 20 to 21.

Any thoughts on this?

$\sum{_{k=3}^n} {n \choose k} p^k (1 - p)^{n - k}$

…e problem with the model.

fleharty added 5 commits April 15, 2022 13:06

Overlapping reads supported in TheoreticalSensitity function, need to…

b1178ae

… update tool

Adds overlap functionality to CollectWgsMetrics

1090a30

Enable overlap probability for CollectHsMetrics

3d38bf0

Fixing a few minor bugs, and adding tests

025823a

Adding javadoc, and removing unnecessary default function

1d72fa7

kachulis self-requested a review April 27, 2022 14:37

kachulis requested changes May 3, 2022

View reviewed changes

Incomplete commit for Friday night, but almost finished responding to…

fa0c1e1

… Chris

davidbenjamin reviewed Jun 13, 2022

View reviewed changes

Added tests for monotonicity. These tests fail, and suggest a possibl…

91588b0

…e problem with the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for overlapping reads in Theoretical Sensitivity model #1802

Account for overlapping reads in Theoretical Sensitivity model #1802

fleharty commented Apr 26, 2022

fleharty commented Apr 26, 2022

kachulis Apr 27, 2022

fleharty Jun 3, 2022

kachulis Apr 28, 2022

fleharty Jun 3, 2022 •

edited

Loading

kachulis May 3, 2022

kachulis May 3, 2022

davidbenjamin Jun 13, 2022

davidbenjamin Jun 13, 2022

fleharty Jun 13, 2022

fleharty commented Jun 23, 2022 •

edited

Loading

Account for overlapping reads in Theoretical Sensitivity model #1802

Are you sure you want to change the base?

Account for overlapping reads in Theoretical Sensitivity model #1802

Conversation

fleharty commented Apr 26, 2022

Description

Checklist (never delete this)

Content

Review

fleharty commented Apr 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fleharty Jun 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fleharty commented Jun 23, 2022 • edited Loading

fleharty Jun 3, 2022 •

edited

Loading

fleharty commented Jun 23, 2022 •

edited

Loading