Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send adjacent SNP and Indel candidates both to full-alignment model #260

Open
haraldgrove opened this issue Jan 10, 2024 · 9 comments
Open
Labels
enhancement New feature or request

Comments

@haraldgrove
Copy link

I've been running Clair3 (v1.01) on some pig samples with ONT reads. I noticed that Clair3 has been assigning the "1/1" genotype to SNVs that have an allele frequency where I would expect an "0/1" genotype.

Example:

Ssc08   3369224 .       A       C       25.76   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:25:42:22,20:0.4762:63,45,0
Ssc08   3369498 .       C       G       22.15   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:22:43:20,20:0.4651:61,44,0
Ssc08   3374209 .       A       C       8.15    PASS    F       GT:GQ:DP:AD:AF:PL       1/1:8:49:24,23:0.4694:0,29,2
Ssc08   3374212 .       G       A       5.95    PASS    F       GT:GQ:DP:AD:AF:PL       1/1:5:49:25,23:0.4694:1,22,0

Do you have any idea why this might be happening?

-Best regards
Harald

@aquaskyline
Copy link
Member

Clair3 concludes genotype based on AF and many other factors. Reads supporting the reference allele could be mistakes due to sequencing or alignment errors. If Clair3's model thinks many of the reads supporting the reference allele are more likely to be mistaken, it might conclude the genotype to be 1/1 instead of 0/1. Clair3 might or might not be correct about your examples, but please pay more attention to them because there exists evidence that discredits the reads supporting the reference allele.

@haraldgrove
Copy link
Author

haraldgrove commented Jan 11, 2024

Thank you.
I discovered that there was an uncalled deletion covering the same position. So the reference allele was not present at all. It's still a confusing variant, but maybe not much to do about it.

@haraldgrove
Copy link
Author

A follow up to the issue I had with understanding the AF output from Clair3.

I found another location where Clair3 is calling a homozygous insertion, but where an inspection in IGV seems to strongly suggest it to be a heterozygous insertion instead. It also seems as if Clair3 is not discarding any reads as unreliable since the total number of reads is the same in both instances.

The variant:
Ssc13 147177599 . T TTAA 21.09 PASS P GT:GQ:DP:AD:AF:PL 1/1:21:43:2,40:0.9302:63,40,0

I'm wondering if there are any way for me to know what Clair3 is basing its decision on?
ssc13_frosk_147177599

@zhengzhenxian
Copy link
Collaborator

zhengzhenxian commented Jan 18, 2024

Hi, @haraldgrove,

Could you please provide us with the pileup result of the flanking 10bp windows? This will help us pinpoint the genotype issue more accurately. To obtain the mpileup result using samtools, you can use the following command:

samtools mpileup  --min-MQ 5 --min-BQ 0 --ff 2316 -r Ssc13:147177594-147177604 ${BAM}

@aquaskyline
Copy link
Member

Thanks @haraldgrove, your case shows an insertion following a deletion immediately, and it looks like a bug in Clair3 to me. Besides the mpileup results, could you please also send us a minibam covering the case. Thanks!

@haraldgrove
Copy link
Author

Thank you @aquaskyline and @zhengzhenxian for following up on this.

The mpileup output looks like this:

Ssc13   147177594       G       44      .,.,..,.,.,,....,,,,,,..,...,,,,.,,.,..,...,    C&:6:9<7A5;:<4?<263;A!?;%A15=6<8@!<!/;;?38<=
Ssc13   147177595       T       44      .,.,..,.,.,,....,,,,,,..c...,,,,.,,.,..,...,    @&9+<)=7A5=??.>=)5/:F!A>%A14?8>9A!<!0>:A4<=?
Ssc13   147177596       T       44      .,.,..,.,.,,....,,,,,,..,...,,,,.,,.,..,...,    A):(>&A7A2@>?)?=&6.9{!B?'@14@9>;@#<!1A:A;<=>
Ssc13   147177597       A       44      .,.,..,.,.+2GT,,....,,,,,,..,..-2AT.,,,,.,,.,..,...,    A*8)A%B8@+A@@(=>%6.;E!@@(@,5@6><A#=!1@;B5<6?
Ssc13   147177598       A       44      .-1T,-1t.,..,.-1T,.,-1t,-1t.-1T.-1T.-1T.,,-1t,,-1t,-1t,.-1T.,.-1T*.,,,,.-1T,,-1t.+2AT,-1t.-1TG,-1t..-1T.,-1t    @*8(B#A:<!==B);>#5)9D!A>(@*6@5<?@!8!0A2I3=+<
Ssc13   147177599       T       44      **.,.+3TAA.+3TAA,+3taa*,+3taa.*****.+3TAA,+3taa*,**,+3taa*.,**.,+3taa,+3taa,+3taa,+3taa*,*.+2AA**.+1G*.+3TAA*.+3TAA*    !!7'@"?!?!!!!!!?"!(!!!!='!*6@4=?!!!!!!2!3!+!
Ssc13   147177600       T       44      .,.,..,.,.,,....,,,,,,..,...,,,,.,+2aa,.,.G,...,        !!8&$!!!$!!!!!!$!!*!!!!?+!*7$$$$!!!!!!!!$!$!
Ssc13   147177601       T       44      .,.,..,.,.,,....,,,+1c,,,..,...,,,,.,,.,..,..., !$6(C!5$;!$!$$$?!$)$!!$>)$*6=064$!!!$$!!4$+$
Ssc13   147177602       T       44      .,C,..+1C,.,.-1C,,....,+1c,,,,,.C,...,,,,.,,.,.A,...,   !'7)@!;';!'''''B!')''!'>+'+18114'!'!''#'5'('
Ssc13   147177603       C       44      .+2AG,.g..,.,*,,....,,g,,,..,...,,,,.,+1t,.,..,...,     !,**B+:@>)<8{=D@-5)9@!{).C(47219>!4!1D%>3@+<
Ssc13   147177604       A       44      .,.t..,.,.,,....,,c,,,..,...,,,,.t,.,..,...,    /5)+6(;4>)?8C4<?.9)9?!:(.?&/73:;*!8!5,&?-6,<

The BAM file for the region is attached.
ssc13_region.bam.gz

@zhengzhenxian
Copy link
Collaborator

zhengzhenxian commented Jan 22, 2024

Hi, @haraldgrove,

Thanks for providing the BAM, it seems the Clair3 pileup model made an incorrect zygosity prediction due to the insertion
variant was followed by a deletion immediately.

Could you try to add --var_pct_full=1.0 option to feed all pileup results to a more reliable full-alignment model? We consider the full-alignment network would perform better in such cases.

@aquaskyline
Copy link
Member

@haraldgrove please kindly let us know if --var_pct_full=1.0 gives the correct answer of your variants. If yes, we will force these type of variants to go into the full-alignment model to solve the problem in a long-run.

@haraldgrove
Copy link
Author

Hi.

Thank you for the suggestion. The full-alignment mode fixed the problem.

@aquaskyline aquaskyline added the enhancement New feature or request label Jan 26, 2024
@aquaskyline aquaskyline changed the title Variants with AF < 0.5 still called as GT=1/1 Send adjacent SNP and Indel candidates both to full-alignment model Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants