-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepvariant results quality filtration #881
Comments
Could you please paste the line from the VCF that you are referring to? |
I do not know which line you mean exactly, but i am going to give you an example variant from three different variant callers: Deepvariant RNA model: chr11 115166697 . G T 10 PASS . GT:GQ:DP:AD:VAF:PL 1/1:3:5:0,5:1:6,0,0 HaplotypeCaller: chr11 115166697 . G T 156.96 . AC=2;AF=1.00;AN=2;DP=7;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.39;SOR=3.611 GT:AD:DP:GQ:PL 1/1:0,5:5:15:171,15,0 Freebayes: chr11 115166697 . G T 90.0627 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=5;CIGAR=1X;DP=5;DPB=5;DPRA=0;EPP=13.8677;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=10.1503;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=147;QR=0;RO=0;RPL=5;RPP=13.8677;RPPR=0;RPR=0;RUN=1;SAF=5;SAP=13.8677;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.Illumina=1 GT:DP:AD:RO:QR:AO:QA:GL 1/1:5:0,5:0:0:5:147:-13.5217,-1.50515,0 The three variant callers are run on the same BAM, and i also forgot to mention that i am variant calling RNA data not DNA so i followed the Deepvariant RNA Casestudy. |
One thing we find is that DeepVariant is much more conservative in assigning quality values compared to other callers. These values come from the probability of the call from the neural network. Generally, we observe that DeepVariant's probabilities are reasonably well-calibrated with empirical error rates. For an example of this, please see Figure 2 from the DeepVariant paper (https://www.nature.com/articles/nbt.4235 or the open preprint of the same work https://www.biorxiv.org/content/10.1101/092890v6.full.pdf) As a consequence, you should probably not filter in the same way as with other callers. You should likely have a lower threshold for DeepVariant calls. |
Thank you so much for the detailed explanation! I have checked the paper and understand your point now. Last thing i want to ask if there is a general recommended filtration criteria for variants generated by Deepvariant, specially those produced from the RNA model? |
The quality threshold for filtering will depend on your tolerance for false positives versus false negatives. We find the quality is well-calibrated with error, so Qual of 20 is ~1% false discovery probability, Qual of 10 is ~10% false discovery probability and so on. So if you value precision, something between Qual 10 and Qual 20 as a threshold is probably a good place to start. |
This one is not an issue but more of a question, i was running multiple variant callers on the same bam and i realized that some variants are assigned a low quality in deepvariant (<20 in my case) even though they do pass that quality threshold in other variant callers.
I wanted to know if there is any known reason that this happens? I am a little hesitant on choosing my filtration criteria for Deepvariant results, should i only consider the >20 variants as i do, or maybe only keep the PASS variants?
Sorry for the inconvenience and thank you.
The text was updated successfully, but these errors were encountered: