Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepvariant results quality filtration #881

Closed
esraaelmligy opened this issue Sep 10, 2024 · 5 comments
Closed

Deepvariant results quality filtration #881

esraaelmligy opened this issue Sep 10, 2024 · 5 comments
Assignees

Comments

@esraaelmligy
Copy link

esraaelmligy commented Sep 10, 2024

This one is not an issue but more of a question, i was running multiple variant callers on the same bam and i realized that some variants are assigned a low quality in deepvariant (<20 in my case) even though they do pass that quality threshold in other variant callers.

I wanted to know if there is any known reason that this happens? I am a little hesitant on choosing my filtration criteria for Deepvariant results, should i only consider the >20 variants as i do, or maybe only keep the PASS variants?

Sorry for the inconvenience and thank you.

@akolesnikov
Copy link
Collaborator

Could you please paste the line from the VCF that you are referring to?

@esraaelmligy
Copy link
Author

I do not know which line you mean exactly, but i am going to give you an example variant from three different variant callers:

Deepvariant RNA model:

chr11 115166697 . G T 10 PASS . GT:GQ:DP:AD:VAF:PL 1/1:3:5:0,5:1:6,0,0

HaplotypeCaller:

chr11 115166697 . G T 156.96 . AC=2;AF=1.00;AN=2;DP=7;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.39;SOR=3.611 GT:AD:DP:GQ:PL 1/1:0,5:5:15:171,15,0

Freebayes:

chr11 115166697 . G T 90.0627 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=5;CIGAR=1X;DP=5;DPB=5;DPRA=0;EPP=13.8677;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=10.1503;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=147;QR=0;RO=0;RPL=5;RPP=13.8677;RPPR=0;RPR=0;RUN=1;SAF=5;SAP=13.8677;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.Illumina=1 GT:DP:AD:RO:QR:AO:QA:GL 1/1:5:0,5:0:0:5:147:-13.5217,-1.50515,0

The three variant callers are run on the same BAM, and i also forgot to mention that i am variant calling RNA data not DNA so i followed the Deepvariant RNA Casestudy.

@AndrewCarroll
Copy link
Collaborator

Hi @esraaelmligy

One thing we find is that DeepVariant is much more conservative in assigning quality values compared to other callers. These values come from the probability of the call from the neural network.

Generally, we observe that DeepVariant's probabilities are reasonably well-calibrated with empirical error rates. For an example of this, please see Figure 2 from the DeepVariant paper (https://www.nature.com/articles/nbt.4235 or the open preprint of the same work https://www.biorxiv.org/content/10.1101/092890v6.full.pdf)

As a consequence, you should probably not filter in the same way as with other callers. You should likely have a lower threshold for DeepVariant calls.

@esraaelmligy
Copy link
Author

Thank you so much for the detailed explanation! I have checked the paper and understand your point now. Last thing i want to ask if there is a general recommended filtration criteria for variants generated by Deepvariant, specially those produced from the RNA model?

@AndrewCarroll
Copy link
Collaborator

Hi @esraaelmligy

The quality threshold for filtering will depend on your tolerance for false positives versus false negatives. We find the quality is well-calibrated with error, so Qual of 20 is ~1% false discovery probability, Qual of 10 is ~10% false discovery probability and so on.

So if you value precision, something between Qual 10 and Qual 20 as a threshold is probably a good place to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants