-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
improve handling of DeepVariant non-called records
Compared to other callers, DeepVariant generates more non-called (./.) gVCF records to express uncertainty; including both reference bands and variant records. In 497277a we made sure to preserve the non-called status for reference bands, albeit in a heavy-handed way that censored all the QC fields (RNC=PartialData). Here we introduce a new RNC InputNonCalled, used for both reference bands and variant records when applicable, which preserves the associated QC fields in the pVCF.
- Loading branch information
Showing
8 changed files
with
182 additions
and
36 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
readme: | | ||
1000G chr21_5583275_G_T | ||
A site exercising corner behaviors with a low-quality allele, non-called reference bands, | ||
and RefCall variant records. | ||
input: | ||
header : |- | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##FILTER=<ID=RefCall,Description="Genotyping model thinks this site is reference."> | ||
##FILTER=<ID=LowQual,Description="Confidence in this variant being real is below calling threshold." | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth of all passing filters reads."> | ||
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block."> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Read depth for each allele"> | ||
##FORMAT=<ID=VAF,Number=A,Type=Float,Description="Variant allele fractions."> | ||
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype likelihoods, log10 encoded"> | ||
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype likelihoods, Phred encoded"> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval"> | ||
##contig=<ID=chr21,length=48129895> | ||
##contig=<ID=chr22,length=51304566> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT | ||
body: | ||
- HG00096.gvcf: | | ||
HG00096 | ||
chr21 5583273 . G <*> 0 . END=5583274 GT:GQ:MIN_DP:PL 0/0:15:5:0,15,149 | ||
chr21 5583275 . G T,<*> 1 RefCall . GT:GQ:DP:AD:VAF:PL ./.:7:5:2,3,0:0.6,0:0,12,7,990,990,990 | ||
chr21 5583276 . T <*> 0 . END=5583283 GT:GQ:MIN_DP:PL 0/0:15:5:0,15,149 | ||
- HG00097.gvcf: | | ||
HG00097 | ||
chr21 5583273 . G <*> 0 . END=5583274 GT:GQ:MIN_DP:PL 0/0:9:3:0,9,89 | ||
chr21 5583275 . G <*> 0 . END=5583275 GT:GQ:MIN_DP:PL ./.:0:3:20,0,50 | ||
chr21 5583276 . T <*> 0 . END=5583280 GT:GQ:MIN_DP:PL 0/0:9:3:0,9,89 | ||
- HG00099.gvcf: | | ||
HG00099 | ||
chr21 5583256 . G <*> 0 . END=5583274 GT:GQ:MIN_DP:PL 0/0:18:6:0,18,179 | ||
chr21 5583275 . G T,<*> 0.5 RefCall . GT:GQ:DP:AD:VAF:PL ./.:10:6:3,3,0:0.5,0:0,9,21,990,990,990 | ||
chr21 5583276 . T <*> 0 . END=5583290 GT:GQ:MIN_DP:PL 0/0:18:6:0,18,179 | ||
- HG01377.gvcf: | | ||
HG01377 | ||
chr21 5583188 . C <*> 0 . END=5583274 GT:GQ:MIN_DP:PL 0/0:6:2:0,9,89 | ||
chr21 5583275 . G T,<*> 10.7 PASS . GT:GQ:DP:AD:VAF:PL 1/1:9:3:1,2,0:0.666667,0:10,11,0,990,990,990 | ||
chr21 5583276 . T <*> 0 . END=5583277 GT:GQ:MIN_DP:PL 0/0:9:3:0,9,89 | ||
config_preset: DeepVariant | ||
|
||
unifier_config: | ||
min_AQ1: 10 | ||
min_AQ2: 10 | ||
min_GQ: 10 | ||
monoallelic_sites_for_lost_alleles: true | ||
|
||
truth_discovered_alleles: | ||
- range: {ref: chr21, beg: 5583275, end: 5583275} | ||
dna: 'G' | ||
is_ref: true | ||
all_filtered: false | ||
top_AQ: [0] | ||
zygosity_by_GQ: [[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0]] | ||
- range: {ref: chr21, beg: 5583275, end: 5583275} | ||
dna: 'T' | ||
is_ref: false | ||
all_filtered: false | ||
top_AQ: [10] | ||
zygosity_by_GQ: [[0,1],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0]] | ||
|
||
truth_unified_sites: | ||
- range: {ref: chr21, beg: 5583275, end: 5583275} | ||
alleles: | ||
- dna: G | ||
- dna: T | ||
quality: 10 | ||
frequency: 0.125 # unifier puts 1/2N floor under frequency as long as AQ is sufficient | ||
quality: 10 | ||
|
||
truth_output_vcf: | ||
- truth.vcf: | | ||
##fileformat=VCFv4.2 | ||
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency estimate for each alternate allele"> | ||
##INFO=<ID=AQ,Number=A,Type=Integer,Description="Allele Quality score reflecting evidence from all samples (Phred scale)"> | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype likelihoods, Phred encoded"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)"> | ||
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed"> | ||
##FORMAT=<ID=RNC,Number=G,Type=Character,Description="Reason for No Call in GT: . = n/a, M = Missing data, P = Partial data, D = insufficient Depth of coverage, - = unrepresentable overlapping deletion, L = Lost/unrepresentable allele (other than deletion), U = multiple Unphased variants present, O = multiple Overlapping variants present"> | ||
##contig=<ID=21,length=48129895> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00099 HG01377 | ||
chr21 5583275 chr21_5583275_G_T G T 10 . AF=0.125;AQ=10 GT:DP:AD:GQ:PL:RNC ./.:5:2,3:7:0,12,7:II ./.:3:3,0:0:.:II ./.:6:3,3:10:0,9,21:II 1/1:3:1,2:1:10,11,0:.. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters