You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I get why we want to skip over indels—they're much more likely to have trouble mapping correctly. But the way it's working now, it actually seems more likely to have a bias, since it's looking in the cigar string for indels, but that would only show up in the cigar string for one of the alleles, leaving the other alone. The SNP itself doesn't get used because presumably we were smart enough to not include indels in the bed file, but if there's an indel next to a SNP, that might introduce bias.
In this toy example, the red is reference, and the blue has a 1bp deletion. In the "truth" the expression is unbiased, but ASEr would call it 2:1, since the blue read that falls across both the SNP and the indel gets thrown out for having a deletion, but the red one doesn't.
My intuition would be that the way to fix this is to come up with a list of indels, and then check to see if each read falls across one, and throw it out if so. However, some quick thought while procrastinating on writing a fellowship proposal did not suggest to me any obvious ways to do this efficiently.
The text was updated successfully, but these errors were encountered:
The easiest way to avoid this issue is to filter your list of SNPs to remove those that are within one read length of an indel. If you have a bed file of your SNPs and a separate bed file of your indels (snps.bed and indels.bed), a simple way is this:
For each SNP find the closest indel:
bedtools closest -t first -a snps.bed -b indels.bed -d > snps.closest
If the closest INDEL is closer than read length (say 125bp), throw it out:
awk '{if ($11 > 125) print $1 "\t" $2 "\t" $3 "\t" $4 "|" $5}' snps.closest > snps.indelsPurged.bed
This also formats the resulting snp file as "Chrom 0-Position 1-Position Ref|Alt"
So I get why we want to skip over indels—they're much more likely to have trouble mapping correctly. But the way it's working now, it actually seems more likely to have a bias, since it's looking in the cigar string for indels, but that would only show up in the cigar string for one of the alleles, leaving the other alone. The SNP itself doesn't get used because presumably we were smart enough to not include indels in the bed file, but if there's an indel next to a SNP, that might introduce bias.
In this toy example, the red is reference, and the blue has a 1bp deletion. In the "truth" the expression is unbiased, but ASEr would call it 2:1, since the blue read that falls across both the SNP and the indel gets thrown out for having a deletion, but the red one doesn't.
My intuition would be that the way to fix this is to come up with a list of indels, and then check to see if each read falls across one, and throw it out if so. However, some quick thought while procrastinating on writing a fellowship proposal did not suggest to me any obvious ways to do this efficiently.
The text was updated successfully, but these errors were encountered: