Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge LR bam files by TieBrush #6

Open
YIGUIz opened this issue Oct 12, 2024 · 2 comments
Open

merge LR bam files by TieBrush #6

YIGUIz opened this issue Oct 12, 2024 · 2 comments

Comments

@YIGUIz
Copy link

YIGUIz commented Oct 12, 2024

Hi,

I recently used TieBrush to merge BAM files and noticed that it worked effectively on short-read BAM files. However, it didn't seem to reduce the read count for long-read BAM files.

LR merge:
image

SR merge:
image

Could you kindly advise if the software is primarily intended for short-read data, or if there are specific parameters I can adjust to merge long-read BAM files?

Thank you for your assistance. I look forward to your guidance.

Best regards,
Qi

@gpertea
Copy link
Owner

gpertea commented Oct 12, 2024

Indeed, Tiebrush was devised for and tested on short-read data only. What you observed is expected, due to the nature of long read alignments, with large and complex CIGAR strings due to higher error rate and much higher chance of having some alignment variations/errors between reads aligned to the same isoform.

One thing you could try is using the -E option with tiebrush with long reads, which ignores some of these small alignment variations/errors and only consider the rough exon/intron coordinates, though we noticed that intron coordinates (splice junction locations) can also be a bit "fuzzy" in the case of long read alignments.

@YIGUIz
Copy link
Author

YIGUIz commented Oct 12, 2024

Thanks for your suggestions!

I’m planning to assemble a tissue transcriptome using StringTie2 (-mix). I have around 400 long-read RNAseq samples, along with paired short-read RNAseq data. Do you have any recommendations for merging all the BAM files to ensure smooth execution?

Also, I have another question: what’s the difference between assembling by sample (assembling each sample individually and then merging the GTF files) and assembling by tissue (merging all BAM files into one before assembly)? Why does CHESS3 choose overlapping transcripts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants