Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tangram_scan segmentation fault using BWA aligned reads #5

Open
zjassaf opened this issue Feb 4, 2015 · 2 comments
Open

Tangram_scan segmentation fault using BWA aligned reads #5

zjassaf opened this issue Feb 4, 2015 · 2 comments

Comments

@zjassaf
Copy link

zjassaf commented Feb 4, 2015

Hi,

I would like to use tamgram to identify the location of transposable elements in Drosophila, however when I run tangram_scan I get a segmentation fault. I suspect that tangram_bam is not working, as it looks like the ZA headers are empty (I think?). However, I know that my strains should be heterozygous for a number of different transposable elements, and in fact there are already estimates of the locations. I'd rather not run Mosaik, so if there is a way to get tangram_bam to work that would be nice.

I've put below info about what I'm doing. Thanks!
Zoe

As a positive control, I know, for example, that there should be at least 78 copies of INE_1 heterozygous in my strain, which I know from previous work. E.g., I have this data:
te presence ch Upstream_estimate Downstream_estimate
INE-1:TIR:DNA yes 2R 2496555 2497124
INE-1:TIR:DNA yes 4 286454 287918
INE-1:TIR:DNA yes 3L 17862241 17862700

I can get a copy of INE_1 sequence from flybase (transposon_sequence_set.embl.txt), so I make my moblist file, which contains only:

moblist_INE-1 GA(transposon_sequence_set.embl.txt) SN(Drosophila melanogaster)
tatacccgttactagattcgttgaaatgaatgtaacaggcagaaggaagcgtcttagaccatatatagtatatacatacatgtatattcttgatcaggatcaatagccgagtcgatcttgccatatccgtctgtccgtatgaacgtcgagatctcaggaactataaaagctagaaggtttagattcagcatacagagacaaagacgcaagtagccatgcccactctaacgtccacaaacagcgcaaaactatcacgcccacacttttgaaaaatgtgttgttcttttcacattctgattagtcttttacatttctatcgatttccaaaaaaaaactttttgccaacgccctaaaaccgcccaaaactccgacacccacatttgtaaaaaattgttgggaatttttttcataaatttattagtttattatttattataaatttaagtttatatcgatttgccgacaacatattttaattttttttctcattttatcttttatctatcgatatcccagaaaaattgtgcaatttcgcattcacactagctgagtaacgggtatctgatagtcgggaaactcgactatagcattctctctttttgaaattgcgg

I generate my bam file with bwa, with the option -a to keep reads which only have 1 of the pair map to the genome (since this appears necessary for tangram?). These are the command line options I use:
bwa mem -M -a -R

Then I remove duplicates and sort and index using PIcardTools. I also merge several bams together, because I have a single sample which was used to generate several libraries. Then with that merged bam I run tangram_bam:
mySoftwarePath/Tangram/bin/tangram_bam -i myDataPath/MA_6.merged.dedup.bam -r myDataPath/moblist_ine_only.fasta -o myDataPath/MA_6.merged.dedup.tangram.bam

And sort the resulting stuff
mySoftwarePath/java -Xmx2g -jar mySoftwarePath/picard-tools-1.105/SortSam.jar INPUT=myDataPath/MA_6.merged.dedup.tangram.bam OUTPUT=myDataPath/MA_6.merged.dedup.sorted.tangram.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE

Now generate my file list tangramBamList.txt, which contains only:
myDataPath/MA_6.merged.dedup.sorted.tangram.bam

Now do tangram_scan:
mySoftwarePath/Tangram/bin/tangram_scan -in myDataPath/tangramBamList.txt -dir myDataPath/tangramOut

And I get the error:
Segmentation fault (core dumped)

This is what a sample of what my bam file looks like:
D4LHBFN1:293:C3L3LACXX:2:2213:20193:18303 107 YHet 1 60 16S48M2S = 1 38 CTACGGTTGTCTCAGCAGGGTCACGTAATGCTGATCCAGTCTTGTTTTTATTTTCATTCATGTTGT BHGHIIIIG@HGG
GDGIIGI:BDFHDFEGGG<FGHGIIIBHHFHCDHIIGHIFEHFHFFEDE?CCE PG:Z:MarkDuplicates RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2 NM:i:0 AS:i:48 XS:i:0 ZA:Z:<@;60;;;1;;><&;60;;;1;;>
D4LHBFN1:293:C3L3LACXX:2:2213:20193:18303 151 YHet 1 60 28S38M = 1 -38 ATATGGTGTTTCCTACGGTTGTCTCCGCAGGGTCACGTAATGCTGATCCAGTCTTGTTTTTATTTT CDCDDDCADDDBDDDDFFHEH
HB;-'GHGGDHDB2HBIGGHCGCEGIJJJJJJIJIJJJJJIHEBA PG:Z:MarkDuplicates RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2 NM:i:0 AS:i:38 XS:i:0 ZA:Z:<&;60;;;1;;><@;60;;;1;;>
D4LHBFN1:293:C3L3LACXX:2:2313:15215:43919 147 YHet 10 60 83M = 21 -72 TAATGCTGATCCAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGAT DDDDD
DDDDDDCCDDEDDDFFFFFFGHHHHHJJJJJJJJJJJIJJJJIJHJIIIIJJJJHHJIJIIJJJJJJJHHJJIJJJII PG:Z:MarkDuplicates.3 RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2.3 NM:i:3 AS:i:68 XS:i:20 ZA:Z:<&;60;;;1;;><@;60;;;1;;>
D4LHBFN1:293:C3L3LACXX:2:2314:3464:7166 99 YHet 17 60 82M = 56 122 GATCCAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGATCAGACG AEDHGGIEFHHHH
HIICAGGIIFE>DFDHHHGEHHIIIG@FGGGGIIIIIG@HIHIIIGHFHGEFFFF@@EECEEA;>CCCC PG:Z:MarkDuplicates.1 RG:Z:140307_PINKERTON_0293_BC3L3LACXX_L2.1 NM:i:3 AS:i:67 XS:i:20 ZA:Z:<@;60;;;1;;><&;60;;;1;;>
D4LHBFN1:293:C3L3LACXX:2:2313:15215:43919 99 YHet 21 60 81M = 10 72 CAGTCTTGTTTTTATTTTCATTCATGTTGTTGCTCTTGCTTTGATTCCGACTTCTAACGTTTAACCTGTGATCAGACGTTT JIJHH

@zjassaf zjassaf changed the title Tamgram_scan segmentation fault using BWA aligned reads Tangram_scan segmentation fault using BWA aligned reads Feb 4, 2015
@inti
Copy link

inti commented Sep 16, 2015

Similar in here,
after running

gkno tangram-bam --in bams/93-968.bam --mobile-element-fasta repeats/test_me.fa --out 93-968.tangram.bam --region Chr19 

i get the segmentation fault error

sh-4.2$ /home/shared/app/gkno_launcher/tools/Tangram/bin/tangram_scan -in /home/ipedroso/ANALYSES/MEI/Populus/file_list.text -dir tangram_out 
Violación de segmento

from the bam file header

@PG     ID:bwa  PN:bwa  VN:0.5.9-r16
@PG     ID:tangram_bam  CL:/home/shared/app/gkno_launcher/tools/Tangram/bin/tangram_bam --ref repeats/test_me.fa --input bams/93-968.bam --target-ref-name Chr19 --output /home/ipedroso/ANALYSES/MEI/Populus/93-968_ZA.bam

I have not tried re-aligning this data using MOSAIK.

@AlistairNWard
Copy link

I have also observed seg faults running on bwa data and am not sure what
the cause of the problem is. If you don't have massive amounts of data, I
would recommend aligning with Mosaik since this is what Tangram was
designed to work with. If you need any assistance, please let me know (
[email protected]) and I can help getting Mosaik alignments and
tangram run. In particular, we have a pipeline system (gkno) that helps
running larger pipelines and also makes it possible to build your own
pipelines for running repeated / similar analyses.

On Wed, Sep 16, 2015 at 1:36 PM, Inti Pedroso [email protected]
wrote:

Similar in here,
after running

gkno tangram-bam --in bams/93-968.bam --mobile-element-fasta repeats/test_me.fa --out 93-968.tangram.bam --region Chr19

i get the segmentation fault error

sh-4.2$ /home/shared/app/gkno_launcher/tools/Tangram/bin/tangram_scan -in /home/ipedroso/ANALYSES/MEI/Populus/file_list.text -dir tangram_out
Violación de segmento

from the bam file header

@pg ID:bwa PN:bwa VN:0.5.9-r16
@pg ID:tangram_bam CL:/home/shared/app/gkno_launcher/tools/Tangram/bin/tangram_bam --ref repeats/test_me.fa --input bams/93-968.bam --target-ref-name Chr19 --output /home/ipedroso/ANALYSES/MEI/Populus/93-968_ZA.bam

I have not tried re-aligning this data using MOSAIK.


Reply to this email directly or view it on GitHub
#5 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants