Skipping of Indels in CountSNPASE #15

petercombs · 2016-06-20T22:47:55Z

So I get why we want to skip over indels—they're much more likely to have trouble mapping correctly. But the way it's working now, it actually seems more likely to have a bias, since it's looking in the cigar string for indels, but that would only show up in the cigar string for one of the alleles, leaving the other alone. The SNP itself doesn't get used because presumably we were smart enough to not include indels in the bed file, but if there's an indel next to a SNP, that might introduce bias.

In this toy example, the red is reference, and the blue has a 1bp deletion. In the "truth" the expression is unbiased, but ASEr would call it 2:1, since the blue read that falls across both the SNP and the indel gets thrown out for having a deletion, but the red one doesn't.

My intuition would be that the way to fix this is to come up with a list of indels, and then check to see if each read falls across one, and throw it out if so. However, some quick thought while procrastinating on writing a fellowship proposal did not suggest to me any obvious ways to do this efficiently.

rmagoglia · 2016-06-20T23:30:10Z

The easiest way to avoid this issue is to filter your list of SNPs to remove those that are within one read length of an indel. If you have a bed file of your SNPs and a separate bed file of your indels (snps.bed and indels.bed), a simple way is this:

For each SNP find the closest indel:
bedtools closest -t first -a snps.bed -b indels.bed -d > snps.closest

If the closest INDEL is closer than read length (say 125bp), throw it out:
awk '{if ($11 > 125) print $1 "\t" $2 "\t" $3 "\t" $4 "|" $5}' snps.closest > snps.indelsPurged.bed

This also formats the resulting snp file as "Chrom 0-Position 1-Position Ref|Alt"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skipping of Indels in CountSNPASE #15

Skipping of Indels in CountSNPASE #15

petercombs commented Jun 20, 2016

rmagoglia commented Jun 20, 2016 •

edited

Loading

Skipping of Indels in CountSNPASE #15

Skipping of Indels in CountSNPASE #15

Comments

petercombs commented Jun 20, 2016

rmagoglia commented Jun 20, 2016 • edited Loading

rmagoglia commented Jun 20, 2016 •

edited

Loading