diff --git a/LICENSE b/LICENSE
index 486de6e8..7f6763f0 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,4 +1,4 @@
-Copyright 2017 Google LLC.
+Copyright 2020 Google LLC.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
diff --git a/README.md b/README.md
index c96e812f..3d942c8d 100644
--- a/README.md
+++ b/README.md
@@ -4,12 +4,29 @@
[](https://groups.google.com/d/forum/deepvariant-announcements)
[](https://goo.gl/deepvariant)
-DeepVariant is an analysis pipeline that uses a deep neural network to call
-genetic variants from next-generation DNA sequencing data. DeepVariant relies on
-[Nucleus](https://github.com/google/nucleus), a library of Python and C++ code
-for reading and writing data in common genomics file formats (like SAM and VCF)
-designed for painless integration with the
-[TensorFlow](https://www.tensorflow.org/) machine learning framework.
+DeepVariant is a deep learning-based variant caller that takes aligned reads (in
+BAM or CRAM format), produces pileup image tensors from them, classify each
+tensor using a convolutional neural network, and finally reports the results in
+a standard VCF or gVCF file.
+
+DeepVariant supports:
+
+* Germline variant-calling in diploid organisms.
+ * For somatic data or any other samples where the genotypes go beyond two
+ copies of DNA, DeepVariant will not work out of the box because the only
+ genotypes supported are hom-alt, het, and hom-ref.
+ * The models included with DeepVariant are only trained on human data. For
+ other organisms, see the
+ [blog post on non-human variant-calling](https://google.github.io/deepvariant/posts/2018-12-05-improved-non-human-variant-calling-using-species-specific-deepvariant-models/)
+ for some possible pitfalls and how to handle them.
+* Calling from NGS and long-read sequencing data.
+ * NGS (Illumina) data for either a
+ [whole genome](docs/deepvariant-case-study.md) or
+ [whole exome](docs/deepvariant-exome-case-study.md).
+ * PacBio HiFi data, see the
+ [PacBio case study](docs/deepvariant-pacbio-model-case-study.md).
+ * ONT long-read data by using
+ [PEPPER-DeepVariant](https://github.com/kishwarshafin/pepper/blob/master/docs/PEPPER_variant_calling.md).
## How to run
@@ -30,21 +47,21 @@ docker run \
--num_shards=$(nproc) **This will use all your cores to run make_examples. Feel free to change.**
```
-To see all flags you can use, run:
-```
-docker run google/deepvariant:"${BIN_VERSION}" --help
-```
-
+To see all flags you can use, run: `docker run
+google/deepvariant:"${BIN_VERSION}" --help`
If you're using GPUs, or want to use Singularity instead, see
-[Quick Start](docs/deepvariant-quick-start.md) for more details.
+[Quick Start](docs/deepvariant-quick-start.md) for more details or see all the
+[setup options](#deepvariant_setup) available including solutions on external
+platforms.
For more information, also see:
- * [Full documentation list](docs/README.md)
- * [Best practices for multi-sample variant calling with DeepVariant](docs/trio-merge-case-study.md)
- * [(Advanced) Training tutorial](docs/deepvariant-training-case-study.md)
-
+* [Full documentation list](docs/README.md)
+* [Detailed usage guide](docs/deepvariant-details.md) with more information on
+ the input and output file formats and how to work with them.
+* [Best practices for multi-sample variant calling with DeepVariant](docs/trio-merge-case-study.md)
+* [(Advanced) Training tutorial](docs/deepvariant-training-case-study.md)
## How to cite
@@ -69,7 +86,9 @@ doi: https://doi.org/10.1101/2020.02.10.942086
* **High accuracy** - In 2016 DeepVariant won
[PrecisionFDA Truth Challenge](https://precision.fda.gov/challenges/truth/results)
for best SNP Performance. DeepVariant maintains high accuracy across data
- from different sequencing technologies, prep methods, and species.
+ from different sequencing technologies, prep methods, and species. For
+ [lower coverage](https://google.github.io/deepvariant/posts/2019-09-10-twenty-is-the-new-thirty-comparing-current-and-historical-wgs-accuracy-across-coverage/),
+ using DeepVariant makes an especially great difference.
* **Flexibility** - Out-of-the-box use for
[PCR-positive](https://ai.googleblog.com/2018/04/deepvariant-accuracy-improvements-for.html)
samples and
@@ -94,6 +113,22 @@ doi: https://doi.org/10.1101/2020.02.10.942086
(1): Time estimates do not include mapping.
+## How DeepVariant works
+
+
+
+For more information on the pileup images and how to read them, please see the
+["Looking through DeepVariant's Eyes" blog post](https://google.github.io/deepvariant/posts/2020-02-20-looking-through-deepvariants-eyes/).
+
+DeepVariant relies on [Nucleus](https://github.com/google/nucleus), a library of
+Python and C++ code for reading and writing data in common genomics file formats
+(like SAM and VCF) designed for painless integration with the
+[TensorFlow](https://www.tensorflow.org/) machine learning framework. Nucleus
+was built with DeepVariant in mind and open-sourced separately so it can be used
+by anyone in the genomics research community for other projects. See this blog
+post on
+[Using Nucleus and TensorFlow for DNA Sequencing Error Correction](https://google.github.io/deepvariant/posts/2019-01-31-using-nucleus-and-tensorflow-for-dna-sequencing-error-correction/).
+
## DeepVariant Setup
### Prerequisites
diff --git a/docs/deepvariant-gvcf-support.md b/docs/deepvariant-gvcf-support.md
index 9b27ca92..d28a5f62 100644
--- a/docs/deepvariant-gvcf-support.md
+++ b/docs/deepvariant-gvcf-support.md
@@ -145,8 +145,7 @@ number of records generated relative to the baseline of a 50x whole genome with
`--gvcf_gq_binsize 1`) at different coverage levels, for GQ bins of size 1, 3,
5, and 10. The value of each bar is written in blue font above it for clarity.
-
+
### Runtime
diff --git a/docs/DeepVariant-gvcf-sizes-figure.png b/docs/images/DeepVariant-gvcf-sizes-figure.png
similarity index 100%
rename from docs/DeepVariant-gvcf-sizes-figure.png
rename to docs/images/DeepVariant-gvcf-sizes-figure.png
diff --git a/docs/DeepVariant-workflow-figure.png b/docs/images/DeepVariant-workflow-figure.png
similarity index 100%
rename from docs/DeepVariant-workflow-figure.png
rename to docs/images/DeepVariant-workflow-figure.png
diff --git a/docs/images/inference_flow_diagram.svg b/docs/images/inference_flow_diagram.svg
new file mode 100644
index 00000000..67633780
--- /dev/null
+++ b/docs/images/inference_flow_diagram.svg
@@ -0,0 +1 @@
+
\ No newline at end of file
diff --git a/docs/trio-merge-case-study.md b/docs/trio-merge-case-study.md
index 8d2a8986..df62e638 100644
--- a/docs/trio-merge-case-study.md
+++ b/docs/trio-merge-case-study.md
@@ -80,10 +80,10 @@ aria2c -c -x10 -s10 -d "${DIR}" https://storage.googleapis.com/deepvariant/exome
There have been newer version of the truth files, including
[v4.1, GRCh37 for HG002](ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.1_SmallVariantDraftBenchmark_12182019/GRCh37),
-and [v4.2, GRCh38 for HG002-4](ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.2_SmallVariantDraftBenchmark_07092020/).
+and
+[v4.2, GRCh38 for HG002-4](ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.2_SmallVariantDraftBenchmark_07092020/).
In the future we will plan to update this documentation with newer versions.
-
HG002:
```
@@ -188,12 +188,12 @@ When we ran on this WES trio, it took only about 13 seconds. For more details on
performance, see
[GLnexus performance guide](https://github.com/dnanexus-rnd/GLnexus/wiki/Performance).
-For a WGS cohort, we recommend using `--config
-DeepVariantWGS` instead of `DeepVariantWES`. Another preset
-`DeepVariant_unfiltered` is available in `glnexus:v1.2.7` or later versions for
-merging DeepVariant gVCFs with no QC filters or genotype revision (see [GitHub
-issue #326](https://github.com/google/deepvariant/issues/326) for a potential
-use case). The details of these presets can be found
+For a WGS cohort, we recommend using `--config DeepVariantWGS` instead of
+`DeepVariantWES`. Another preset `DeepVariant_unfiltered` is available in
+`glnexus:v1.2.7` or later versions for merging DeepVariant gVCFs with no QC
+filters or genotype revision (see
+[GitHub issue #326](https://github.com/google/deepvariant/issues/326) for a
+potential use case). The details of these presets can be found
[here](../deepvariant/cohort_best_practice).
## Annotate the merged VCF with Mendelian discordance information using RTG Tools
@@ -275,8 +275,9 @@ do
done
```
-| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st ALT) | [7]tv (1st ALT) | [8]ts/tv (1st ALT) |
-| ------ | ----- | ----- | -------- | --------------- | --------------- | ------------------ |
+| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st | [7]tv (1st | [8]ts/tv (1st |
+: : : : : ALT) : ALT) : ALT) :
+| ------ | ----- | ----- | -------- | ---------- | ---------- | ------------- |
| HG002 | 30016 | 11709 | 2.56 | 30002 | 11693 | 2.57 |
| HG003 | 29880 | 11747 | 2.54 | 29871 | 11731 | 2.55 |
| HG004 | 30133 | 11860 | 2.54 | 30120 | 11848 | 2.54 |
@@ -296,8 +297,9 @@ done
Which resulted in this table:
-| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st ALT) | [7]tv (1st ALT) | [8]ts/tv (1st ALT) |
-| ------ | ----- | ----- | -------- | --------------- | --------------- | ------------------ |
+| Sample | [3]ts | [4]tv | [5]ts/tv | [6]ts (1st | [7]tv (1st | [8]ts/tv (1st |
+: : : : : ALT) : ALT) : ALT) :
+| ------ | ----- | ----- | -------- | ---------- | ---------- | ------------- |
| HG002 | 24474 | 9255 | 2.64 | 24469 | 9245 | 2.65 |
| HG003 | 24175 | 9182 | 2.63 | 24172 | 9174 | 2.63 |
| HG004 | 24313 | 9334 | 2.60 | 24306 | 9327 | 2.61 |