Skip to content

Commit

Permalink
Update README with pointers to useful information.
Browse files Browse the repository at this point in the history
* Add a section about what DeepVariant supports in terms of data samples at the top of the documentation.
* Add links to a few blog posts on the front page where they are relevant.
* Add a section on "How DeepVariant works" with a diagram that includes pileup images.

PiperOrigin-RevId: 327496328
  • Loading branch information
MariaNattestad authored and copybara-github committed Aug 19, 2020
1 parent 2e68bdd commit 9985825
Show file tree
Hide file tree
Showing 7 changed files with 69 additions and 32 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright 2017 Google LLC.
Copyright 2020 Google LLC.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
Expand Down
69 changes: 52 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,29 @@
[![announcements](https://img.shields.io/badge/announcements-blue)](https://groups.google.com/d/forum/deepvariant-announcements)
[![blog](https://img.shields.io/badge/blog-orange)](https://goo.gl/deepvariant)

DeepVariant is an analysis pipeline that uses a deep neural network to call
genetic variants from next-generation DNA sequencing data. DeepVariant relies on
[Nucleus](https://github.com/google/nucleus), a library of Python and C++ code
for reading and writing data in common genomics file formats (like SAM and VCF)
designed for painless integration with the
[TensorFlow](https://www.tensorflow.org/) machine learning framework.
DeepVariant is a deep learning-based variant caller that takes aligned reads (in
BAM or CRAM format), produces pileup image tensors from them, classify each
tensor using a convolutional neural network, and finally reports the results in
a standard VCF or gVCF file.

DeepVariant supports:

* Germline variant-calling in diploid organisms.
* For somatic data or any other samples where the genotypes go beyond two
copies of DNA, DeepVariant will not work out of the box because the only
genotypes supported are hom-alt, het, and hom-ref.
* The models included with DeepVariant are only trained on human data. For
other organisms, see the
[blog post on non-human variant-calling](https://google.github.io/deepvariant/posts/2018-12-05-improved-non-human-variant-calling-using-species-specific-deepvariant-models/)
for some possible pitfalls and how to handle them.
* Calling from NGS and long-read sequencing data.
* NGS (Illumina) data for either a
[whole genome](docs/deepvariant-case-study.md) or
[whole exome](docs/deepvariant-exome-case-study.md).
* PacBio HiFi data, see the
[PacBio case study](docs/deepvariant-pacbio-model-case-study.md).
* ONT long-read data by using
[PEPPER-DeepVariant](https://github.com/kishwarshafin/pepper/blob/master/docs/PEPPER_variant_calling.md).

## How to run

Expand All @@ -30,21 +47,21 @@ docker run \
--num_shards=$(nproc) **This will use all your cores to run make_examples. Feel free to change.**
```

To see all flags you can use, run:
```
docker run google/deepvariant:"${BIN_VERSION}" --help
```

To see all flags you can use, run: `docker run
google/deepvariant:"${BIN_VERSION}" --help`

If you're using GPUs, or want to use Singularity instead, see
[Quick Start](docs/deepvariant-quick-start.md) for more details.
[Quick Start](docs/deepvariant-quick-start.md) for more details or see all the
[setup options](#deepvariant_setup) available including solutions on external
platforms.

For more information, also see:

* [Full documentation list](docs/README.md)
* [Best practices for multi-sample variant calling with DeepVariant](docs/trio-merge-case-study.md)
* [(Advanced) Training tutorial](docs/deepvariant-training-case-study.md)

* [Full documentation list](docs/README.md)
* [Detailed usage guide](docs/deepvariant-details.md) with more information on
the input and output file formats and how to work with them.
* [Best practices for multi-sample variant calling with DeepVariant](docs/trio-merge-case-study.md)
* [(Advanced) Training tutorial](docs/deepvariant-training-case-study.md)

## How to cite

Expand All @@ -69,7 +86,9 @@ doi: https://doi.org/10.1101/2020.02.10.942086
* **High accuracy** - In 2016 DeepVariant won
[PrecisionFDA Truth Challenge](https://precision.fda.gov/challenges/truth/results)
for best SNP Performance. DeepVariant maintains high accuracy across data
from different sequencing technologies, prep methods, and species.
from different sequencing technologies, prep methods, and species. For
[lower coverage](https://google.github.io/deepvariant/posts/2019-09-10-twenty-is-the-new-thirty-comparing-current-and-historical-wgs-accuracy-across-coverage/),
using DeepVariant makes an especially great difference.
* **Flexibility** - Out-of-the-box use for
[PCR-positive](https://ai.googleblog.com/2018/04/deepvariant-accuracy-improvements-for.html)
samples and
Expand All @@ -94,6 +113,22 @@ doi: https://doi.org/10.1101/2020.02.10.942086

<a name="myfootnote1">(1)</a>: Time estimates do not include mapping.

## How DeepVariant works

![diagram of stages in DeepVariant](docs/images/inference_flow_diagram.svg)

For more information on the pileup images and how to read them, please see the
["Looking through DeepVariant's Eyes" blog post](https://google.github.io/deepvariant/posts/2020-02-20-looking-through-deepvariants-eyes/).

DeepVariant relies on [Nucleus](https://github.com/google/nucleus), a library of
Python and C++ code for reading and writing data in common genomics file formats
(like SAM and VCF) designed for painless integration with the
[TensorFlow](https://www.tensorflow.org/) machine learning framework. Nucleus
was built with DeepVariant in mind and open-sourced separately so it can be used
by anyone in the genomics research community for other projects. See this blog
post on
[Using Nucleus and TensorFlow for DNA Sequencing Error Correction](https://google.github.io/deepvariant/posts/2019-01-31-using-nucleus-and-tensorflow-for-dna-sequencing-error-correction/).

## DeepVariant Setup

### Prerequisites
Expand Down
3 changes: 1 addition & 2 deletions docs/deepvariant-gvcf-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,7 @@ number of records generated relative to the baseline of a 50x whole genome with
`--gvcf_gq_binsize 1`) at different coverage levels, for GQ bins of size 1, 3,
5, and 10. The value of each bar is written in blue font above it for clarity.

![gVCF
size](DeepVariant-gvcf-sizes-figure.png?raw=true "DeepVariant gVCF sizes")
![gVCF size](images/DeepVariant-gvcf-sizes-figure.png?raw=true "DeepVariant gVCF sizes")

### Runtime

Expand Down
File renamed without changes
File renamed without changes
1 change: 1 addition & 0 deletions docs/images/inference_flow_diagram.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 14 additions & 12 deletions docs/trio-merge-case-study.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,10 @@ aria2c -c -x10 -s10 -d "${DIR}" https://storage.googleapis.com/deepvariant/exome

There have been newer version of the truth files, including
[v4.1, GRCh37 for HG002](ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.1_SmallVariantDraftBenchmark_12182019/GRCh37),
and [v4.2, GRCh38 for HG002-4](ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.2_SmallVariantDraftBenchmark_07092020/).
and
[v4.2, GRCh38 for HG002-4](ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.2_SmallVariantDraftBenchmark_07092020/).
In the future we will plan to update this documentation with newer versions.


HG002:

```
Expand Down Expand Up @@ -188,12 +188,12 @@ When we ran on this WES trio, it took only about 13 seconds. For more details on
performance, see
[GLnexus performance guide](https://github.com/dnanexus-rnd/GLnexus/wiki/Performance).

For a WGS cohort, we recommend using `--config
DeepVariantWGS` instead of `DeepVariantWES`. Another preset
`DeepVariant_unfiltered` is available in `glnexus:v1.2.7` or later versions for
merging DeepVariant gVCFs with no QC filters or genotype revision (see [GitHub
issue #326](https://github.com/google/deepvariant/issues/326) for a potential
use case). The details of these presets can be found
For a WGS cohort, we recommend using `--config DeepVariantWGS` instead of
`DeepVariantWES`. Another preset `DeepVariant_unfiltered` is available in
`glnexus:v1.2.7` or later versions for merging DeepVariant gVCFs with no QC
filters or genotype revision (see
[GitHub issue #326](https://github.com/google/deepvariant/issues/326) for a
potential use case). The details of these presets can be found
[here](../deepvariant/cohort_best_practice).

## Annotate the merged VCF with Mendelian discordance information using RTG Tools
Expand Down Expand Up @@ -275,8 +275,9 @@ do
done
```

| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st ALT) | [7]tv (1st ALT) | [8]ts/tv (1st ALT) |
| ------ | ----- | ----- | -------- | --------------- | --------------- | ------------------ |
| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st | [7]tv (1st | [8]ts/tv (1st |
: : : : : ALT) : ALT) : ALT) :
| ------ | ----- | ----- | -------- | ---------- | ---------- | ------------- |
| HG002 | 30016 | 11709 | 2.56 | 30002 | 11693 | 2.57 |
| HG003 | 29880 | 11747 | 2.54 | 29871 | 11731 | 2.55 |
| HG004 | 30133 | 11860 | 2.54 | 30120 | 11848 | 2.54 |
Expand All @@ -296,8 +297,9 @@ done

Which resulted in this table:

| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st ALT) | [7]tv (1st ALT) | [8]ts/tv (1st ALT) |
| ------ | ----- | ----- | -------- | --------------- | --------------- | ------------------ |
| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st | [7]tv (1st | [8]ts/tv (1st |
: : : : : ALT) : ALT) : ALT) :
| ------ | ----- | ----- | -------- | ---------- | ---------- | ------------- |
| HG002 | 24474 | 9255 | 2.64 | 24469 | 9245 | 2.65 |
| HG003 | 24175 | 9182 | 2.63 | 24172 | 9174 | 2.63 |
| HG004 | 24313 | 9334 | 2.60 | 24306 | 9327 | 2.61 |
Expand Down

0 comments on commit 9985825

Please sign in to comment.