-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tangram_scan segmentation fault using BWA aligned reads #5
Comments
Similar in here,
i get the segmentation fault error
from the bam file header
I have not tried re-aligning this data using MOSAIK. |
I have also observed seg faults running on bwa data and am not sure what On Wed, Sep 16, 2015 at 1:36 PM, Inti Pedroso [email protected]
|
Hi,
I would like to use tamgram to identify the location of transposable elements in Drosophila, however when I run tangram_scan I get a segmentation fault. I suspect that tangram_bam is not working, as it looks like the ZA headers are empty (I think?). However, I know that my strains should be heterozygous for a number of different transposable elements, and in fact there are already estimates of the locations. I'd rather not run Mosaik, so if there is a way to get tangram_bam to work that would be nice.
I've put below info about what I'm doing. Thanks!
Zoe
As a positive control, I know, for example, that there should be at least 78 copies of INE_1 heterozygous in my strain, which I know from previous work. E.g., I have this data:
te presence ch Upstream_estimate Downstream_estimate
INE-1:TIR:DNA yes 2R 2496555 2497124
INE-1:TIR:DNA yes 4 286454 287918
INE-1:TIR:DNA yes 3L 17862241 17862700
I can get a copy of INE_1 sequence from flybase (transposon_sequence_set.embl.txt), so I make my moblist file, which contains only:
I generate my bam file with bwa, with the option -a to keep reads which only have 1 of the pair map to the genome (since this appears necessary for tangram?). These are the command line options I use:
bwa mem -M -a -R
Then I remove duplicates and sort and index using PIcardTools. I also merge several bams together, because I have a single sample which was used to generate several libraries. Then with that merged bam I run tangram_bam:
mySoftwarePath/Tangram/bin/tangram_bam -i myDataPath/MA_6.merged.dedup.bam -r myDataPath/moblist_ine_only.fasta -o myDataPath/MA_6.merged.dedup.tangram.bam
And sort the resulting stuff
mySoftwarePath/java -Xmx2g -jar mySoftwarePath/picard-tools-1.105/SortSam.jar INPUT=myDataPath/MA_6.merged.dedup.tangram.bam OUTPUT=myDataPath/MA_6.merged.dedup.sorted.tangram.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE
Now generate my file list tangramBamList.txt, which contains only:
myDataPath/MA_6.merged.dedup.sorted.tangram.bam
Now do tangram_scan:
mySoftwarePath/Tangram/bin/tangram_scan -in myDataPath/tangramBamList.txt -dir myDataPath/tangramOut
And I get the error:
Segmentation fault (core dumped)
This is what a sample of what my bam file looks like:
D4LHBFN1:293:C3L3LACXX:2:2213:20193:18303 107 YHet 1 60 16S48M2S = 1 38 CTACGGTTGTCTCAGCAGGGTCACGTAATGCTGATCCAGTCTTGTTTTTATTTTCATTCATGTTGT BHGHIIIIG@HGG
GDGIIGI:BDFHDFEGGG<FGHGIIIBHHFHCDHIIGHIFEHFHFFEDE?CCE PG:Z:MarkDuplicates RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2 NM:i:0 AS:i:48 XS:i:0 ZA:Z:<@;60;;;1;;><&;60;;;1;;>
D4LHBFN1:293:C3L3LACXX:2:2213:20193:18303 151 YHet 1 60 28S38M = 1 -38 ATATGGTGTTTCCTACGGTTGTCTCCGCAGGGTCACGTAATGCTGATCCAGTCTTGTTTTTATTTT CDCDDDCADDDBDDDDFFHEH
HB;-'GHGGDHDB2HBIGGHCGCEGIJJJJJJIJIJJJJJIHEBA PG:Z:MarkDuplicates RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2 NM:i:0 AS:i:38 XS:i:0 ZA:Z:<&;60;;;1;;><@;60;;;1;;>
D4LHBFN1:293:C3L3LACXX:2:2313:15215:43919 147 YHet 10 60 83M = 21 -72 TAATGCTGATCCAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGAT DDDDD
DDDDDDCCDDEDDDFFFFFFGHHHHHJJJJJJJJJJJIJJJJIJHJIIIIJJJJHHJIJIIJJJJJJJHHJJIJJJII PG:Z:MarkDuplicates.3 RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2.3 NM:i:3 AS:i:68 XS:i:20 ZA:Z:<&;60;;;1;;><@;60;;;1;;>
D4LHBFN1:293:C3L3LACXX:2:2314:3464:7166 99 YHet 17 60 82M = 56 122 GATCCAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGATCAGACG AEDHGGIEFHHHH
HIICAGGIIFE>DFDHHHGEHHIIIG@FGGGGIIIIIG@HIHIIIGHFHGEFFFF@@EECEEA;>CCCC PG:Z:MarkDuplicates.1 RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2.1 NM:i:3 AS:i:67 XS:i:20 ZA:Z:<@;60;;;1;;><&;60;;;1;;>
D4LHBFN1:293:C3L3LACXX:2:2313:15215:43919 99 YHet 21 60 81M = 10 72 CAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGATCAGACGTTT JIJHH
The text was updated successfully, but these errors were encountered: