Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Possible enhancement for VAF estimation #86

Open
philip-holmgren opened this issue Mar 21, 2018 · 0 comments
Open

Possible enhancement for VAF estimation #86

philip-holmgren opened this issue Mar 21, 2018 · 0 comments

Comments

@philip-holmgren
Copy link

philip-holmgren commented Mar 21, 2018

Hi,

I came across this tool because we are developing an inhouse NGS pipeline for FL3-ITD detection detection in AML patients. We already use NGS for this but it's outsourced and quite costly so we're looking into an alternative we can set up ourselves.

Based on an initial validation cohort (17 FLT3-ITD+ AML, 23 FLT3-ITD- AML patients) Pindel does very well in detecting all ITD's witch matching ITD lengths for all positive samples.

However, we see a marked difference in Variant Allele Fraction (VAF) compared to the other two techniques.
Although fragment analysis might not be very accurate to determine this, we were surprised another NGS analysis shows a much bigger VAF in several FLT3-ITD cases, sometimes with a 50% decrease of VAF detected by Pindel.
This has also been observed in other studies, comparing different tools to detect FLT3-ITD in the literature (e.g. Rustagi et al., https://www.ncbi.nlm.nih.gov/pubmed/27121965).

To understand what causes the difference we used a software package (SeqPilot, JSI) to detect the FLT3-ITD in 1 of our positive patients. Although SeqPilot is not really good in detecting FLT3-ITD, in this case the 21bp indel was probably small enough to be mapped and called correctly.
Similar to the outsourced analysis, SeqPilot showed a much higher VAF than Pindel for this variant.

To understand the difference we extracted the variant reads in SeqPilot and compared them with the reads in the Pindel output.

Pindel called the 21bp with a VAF of 30% (ADREF=5122;ADALT=2227) whereas SeqPilot called the same variant with 40% (ADREF=6411; ADALT=4274)
We are not entirely sure where the difference in total coverage comes from (perhaps the alignment of SeqPilot is less strict than when we perform it with BWA-MEM) but we focused on the coverage of the ITD allele to explain the ~2000 coverage difference in ADALT.

By comparing header read information we got the following results:
Read,unique to Pindel,Both, unique to SeqPilot
R1,22,897,1308
R2,76,1232,837
Same header,76,897,9

If we compare the read1 intersect or read2 intersect individually, we notice SeqPilot has much more read in R1 as well as R2 calling the variant.
However if we simply look at the originating sequence fragment (same FASTQ header disregarding R1/R2) we notice almost all fragments are used to detect the variant using both tools.
The major difference is that SeqPilot often detects the variant in both reads whereas Pindel only calls it in either one both not both.

Based on how Pindel works this would make sense: by using one as the mapped read to get the anchor point and trying to map the unmapped read and find the proper breakpoints.
However, in a lot of cases the mapped read (Pindel's persepective) is only partially mapped and soft-clipped toward the end (at least for the reads I investigated).

Based on these findings we propose that VAF could be estimated more accurately if Pindel checked for the presence of the variant in the mapped read.

I could provide the sample data if that would help.

Kind regards

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant