Skip to content

Commit

Permalink
Update trio merging case study.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 327460672
  • Loading branch information
pichuan authored and copybara-github committed Aug 19, 2020
1 parent 0fdd471 commit 2e68bdd
Showing 1 changed file with 75 additions and 66 deletions.
141 changes: 75 additions & 66 deletions docs/trio-merge-case-study.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This document outlines all the steps and considerations for calling and merging
a trio using DeepVariant and [GLnexus](https://github.com/dnanexus-rnd/GLnexus).
These best practices were developed and evaluated as described in the bioRxiv
preprint
[Accurate, scalable cohort variant calls using DeepVariant and GLnexus](https://www.biorxiv.org/content/10.1101/2020.02.10.942086v1).
[Accurate, scalable cohort variant calls using DeepVariant and GLnexus](https://doi.org/10.1101/2020.02.10.942086).

The process involves 3 major stages: running DeepVariant to create individual
genome call sets, running GLnexus to merge call sets, and analyzing the merged
Expand Down Expand Up @@ -78,6 +78,12 @@ aria2c -c -x10 -s10 -d "${DIR}" https://storage.googleapis.com/deepvariant/exome

### Command for downloading the truth files

There have been newer version of the truth files, including
[v4.1, GRCh37 for HG002](ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.1_SmallVariantDraftBenchmark_12182019/GRCh37),
and [v4.2, GRCh38 for HG002-4](ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.2_SmallVariantDraftBenchmark_07092020/).
In the future we will plan to update this documentation with newer versions.


HG002:

```
Expand Down Expand Up @@ -135,7 +141,7 @@ serially is not the most effective approach.
```
N_SHARDS=$(nproc) # Or change to the number of cores you want to use
CAPTURE_BED=agilent_sureselect_human_all_exon_v5_b37_targets.bed
VERSION=0.10.0
VERSION=rc1.0.0
declare -a trio=(HG002 HG003 HG004)
for SAMPLE in "${trio[@]}"
Expand Down Expand Up @@ -166,11 +172,11 @@ done
And then run GLnexus with this config:

```
sudo docker pull quay.io/mlin/glnexus:v1.2.6
sudo docker pull quay.io/mlin/glnexus:v1.2.7
time sudo docker run \
-v "${DIR}":"/data" \
quay.io/mlin/glnexus:v1.2.6 \
quay.io/mlin/glnexus:v1.2.7 \
/usr/local/bin/glnexus_cli \
--config DeepVariantWES \
--bed "/data/${CAPTURE_BED}" \
Expand All @@ -181,11 +187,14 @@ time sudo docker run \
When we ran on this WES trio, it took only about 13 seconds. For more details on
performance, see
[GLnexus performance guide](https://github.com/dnanexus-rnd/GLnexus/wiki/Performance).
And, if you are merging a WGS cohort,
please use the `--config
DeepVariantWGS`. The corresponding params can be found in
[WGS params](../deepvariant/cohort_best_practice/DeepVariantWGS_v1.yml) and
[WES params](../deepvariant/cohort_best_practice/DeepVariantWES_v1.yml).

For a WGS cohort, we recommend using `--config
DeepVariantWGS` instead of `DeepVariantWES`. Another preset
`DeepVariant_unfiltered` is available in `glnexus:v1.2.7` or later versions for
merging DeepVariant gVCFs with no QC filters or genotype revision (see [GitHub
issue #326](https://github.com/google/deepvariant/issues/326) for a potential
use case). The details of these presets can be found
[here](../deepvariant/cohort_best_practice).

## Annotate the merged VCF with Mendelian discordance information using RTG Tools

Expand Down Expand Up @@ -232,15 +241,16 @@ The output is:
```
Checking: /data/deepvariant.cohort.vcf.gz
Family: [Sample_Diag-excap51-HG003-EEogPU + Sample_Diag-excap51-HG004-EEogPU] -> [Sample_Diag-excap51-HG002-EEogPU]
Concordance Sample_Diag-excap51-HG002-EEogPU: F:59713/60296 (99.03%) M:60080/60268 (99.69%) F+M:59369/60146 (98.71%)
1 non-pass records were skipped
Concordance Sample_Diag-excap51-HG002-EEogPU: F:59383/59949 (99.06%) M:59760/59944 (99.69%) F+M:59053/59806 (98.74%)
Sample Sample_Diag-excap51-HG002-EEogPU has less than 99.0 concordance with both parents. Check for incorrect pedigree or sample mislabelling.
860/60618 (1.42%) records did not conform to expected call ploidy
60441/60618 (99.71%) records were variant in at least 1 family member and checked for Mendelian constraints
244/60441 (0.40%) records had indeterminate consistency status due to incomplete calls
799/60441 (1.32%) records contained a violation of Mendelian constraints
852/60261 (1.41%) records did not conform to expected call ploidy
60107/60261 (99.74%) records were variant in at least 1 family member and checked for Mendelian constraints
242/60107 (0.40%) records had indeterminate consistency status due to incomplete calls
781/60107 (1.30%) records contained a violation of Mendelian constraints
```

From this report, we know that there is a 1.32% Mendelian violation rate, and
From this report, we know that there is a 1.30% Mendelian violation rate, and
0.40% of the records had incomplete calls (with `.`) so RTG couldn't determine
whether there is violation or not.

Expand All @@ -267,9 +277,9 @@ done

| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st ALT) | [7]tv (1st ALT) | [8]ts/tv (1st ALT) |
| ------ | ----- | ----- | -------- | --------------- | --------------- | ------------------ |
| HG002 | 30174 | 11811 | 2.55 | 30161 | 11786 | 2.56 |
| HG003 | 30090 | 11908 | 2.53 | 30077 | 11882 | 2.53 |
| HG004 | 30308 | 11989 | 2.53 | 30294 | 11968 | 2.53 |
| HG002 | 30016 | 11709 | 2.56 | 30002 | 11693 | 2.57 |
| HG003 | 29880 | 11747 | 2.54 | 29871 | 11731 | 2.55 |
| HG004 | 30133 | 11860 | 2.54 | 30120 | 11848 | 2.54 |

If you want to restrict to the truth BED files, use this command:

Expand All @@ -288,9 +298,9 @@ Which resulted in this table:

| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st ALT) | [7]tv (1st ALT) | [8]ts/tv (1st ALT) |
| ------ | ----- | ----- | -------- | --------------- | --------------- | ------------------ |
| HG002 | 24472 | 9254 | 2.64 | 24467 | 9244 | 2.65 |
| HG003 | 24169 | 9185 | 2.63 | 24166 | 9177 | 2.63 |
| HG004 | 24314 | 9334 | 2.60 | 24307 | 9327 | 2.61 |
| HG002 | 24474 | 9255 | 2.64 | 24469 | 9245 | 2.65 |
| HG003 | 24175 | 9182 | 2.63 | 24172 | 9174 | 2.63 |
| HG004 | 24313 | 9334 | 2.60 | 24306 | 9327 | 2.61 |

### Rtg vcfstats

Expand All @@ -312,69 +322,69 @@ HG002:

```
Location : /data/HG002.vcf.gz
Failed Filters : 14061
Passed Filters : 45670
SNPs : 41946
Failed Filters : 14405
Passed Filters : 45447
SNPs : 41696
MNPs : 0
Insertions : 1919
Deletions : 1785
Indels : 17
Same as reference : 3
SNP Transitions/Transversions: 2.56 (42030/16447)
Total Het/Hom ratio : 1.53 (27590/18077)
SNP Het/Hom ratio : 1.54 (25439/16507)
Insertions : 1909
Deletions : 1821
Indels : 19
Same as reference : 2
SNP Transitions/Transversions: 2.56 (41882/16353)
Total Het/Hom ratio : 1.51 (27315/18130)
SNP Het/Hom ratio : 1.52 (25178/16518)
MNP Het/Hom ratio : - (0/0)
Insertion Het/Hom ratio : 1.17 (1035/884)
Deletion Het/Hom ratio : 1.60 (1099/686)
Indel Het/Hom ratio : - (17/0)
Insertion/Deletion ratio : 1.08 (1919/1785)
Indel/SNP+MNP ratio : 0.09 (3721/41946)
Insertion Het/Hom ratio : 1.10 (1001/908)
Deletion Het/Hom ratio : 1.59 (1117/704)
Indel Het/Hom ratio : - (19/0)
Insertion/Deletion ratio : 1.05 (1909/1821)
Indel/SNP+MNP ratio : 0.09 (3749/41696)
```

HG003:

```
Location : /data/HG003.vcf.gz
Failed Filters : 14783
Passed Filters : 45640
SNPs : 41955
Failed Filters : 15215
Passed Filters : 45306
SNPs : 41600
MNPs : 0
Insertions : 1909
Deletions : 1756
Insertions : 1902
Deletions : 1784
Indels : 18
Same as reference : 2
SNP Transitions/Transversions: 2.51 (41893/16673)
Total Het/Hom ratio : 1.52 (27513/18125)
SNP Het/Hom ratio : 1.53 (25372/16583)
SNP Transitions/Transversions: 2.52 (41676/16511)
Total Het/Hom ratio : 1.50 (27156/18148)
SNP Het/Hom ratio : 1.51 (25031/16569)
MNP Het/Hom ratio : - (0/0)
Insertion Het/Hom ratio : 1.21 (1044/865)
Deletion Het/Hom ratio : 1.59 (1079/677)
Insertion Het/Hom ratio : 1.15 (1019/883)
Deletion Het/Hom ratio : 1.56 (1088/696)
Indel Het/Hom ratio : - (18/0)
Insertion/Deletion ratio : 1.09 (1909/1756)
Indel/SNP+MNP ratio : 0.09 (3683/41955)
Insertion/Deletion ratio : 1.07 (1902/1784)
Indel/SNP+MNP ratio : 0.09 (3704/41600)
```

HG004:

```
Location : /data/HG004.vcf.gz
Failed Filters : 14442
Passed Filters : 45964
SNPs : 42256
Failed Filters : 14832
Passed Filters : 45681
SNPs : 41965
MNPs : 0
Insertions : 1911
Deletions : 1773
Indels : 21
Insertions : 1899
Deletions : 1796
Indels : 18
Same as reference : 3
SNP Transitions/Transversions: 2.54 (41890/16478)
Total Het/Hom ratio : 1.61 (28339/17622)
SNP Het/Hom ratio : 1.63 (26168/16088)
SNP Transitions/Transversions: 2.55 (41746/16352)
Total Het/Hom ratio : 1.58 (28000/17678)
SNP Het/Hom ratio : 1.60 (25850/16115)
MNP Het/Hom ratio : - (0/0)
Insertion Het/Hom ratio : 1.19 (1039/872)
Deletion Het/Hom ratio : 1.68 (1111/662)
Indel Het/Hom ratio : - (21/0)
Insertion/Deletion ratio : 1.08 (1911/1773)
Indel/SNP+MNP ratio : 0.09 (3705/42256)
Insertion Het/Hom ratio : 1.17 (1023/876)
Deletion Het/Hom ratio : 1.61 (1109/687)
Indel Het/Hom ratio : - (18/0)
Insertion/Deletion ratio : 1.06 (1899/1796)
Indel/SNP+MNP ratio : 0.09 (3713/41965)
```

### Run hap.py to calculate the accuracy of DeepVariant generated call sets
Expand Down Expand Up @@ -402,7 +412,6 @@ Accuracy F1 scores:

Sample | Indel | SNP
------ | -------- | --------
HG002 | 0.974056 | 0.999362
HG003 | 0.970449 | 0.998980
HG004 | 0.974259 | 0.999197

HG002 | 0.969993 | 0.999036
HG003 | 0.969311 | 0.998921
HG004 | 0.967794 | 0.999212

0 comments on commit 2e68bdd

Please sign in to comment.