Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add TSS and TES on metagene plots for RIP-seq data? #35

Open
MolyWang opened this issue Dec 31, 2023 · 3 comments
Open

How to add TSS and TES on metagene plots for RIP-seq data? #35

MolyWang opened this issue Dec 31, 2023 · 3 comments

Comments

@MolyWang
Copy link

Hello Eric, thanks in advance for any help that would be offered!

The data I have:

  1. RIP-seq dataset for an RNA-binding protein. (Full transcripts that co-immunoprecipitate with the protein are subject to RNA-seq).
  2. properly processed .bam and .bam.bai files. (I can visualize them properly with Integrative Genomics Viewer). I will need to plot 6 bam files in one plot, each is about 2~3GB.

rip_bam_files = basename(Sys.glob(gsub(".bam", ".ba*", rip_bam_filesnames)))
rip_bam_files[1:4]
[1] "RBP_2h_sorted.bam" "RBP_2h_sorted.bam.bai"
[3] "CON_2h_sorted.bam" "CON_2h_sorted.bam.bai"

  1. I am using the UCSC dm6 annotation
    dm6_tx = GenomicFeatures::transcripts(TxDb.Dmelanogaster.UCSC.dm6.ensGene)

The problem I want to solve:
With the above data, I want to produce a metagene plot to summarize where the sequencing reads come from, whether they are 5'UTR, CDS, or 3'UTR. (In this case, I would like to collapse the whole genome into one generic transcript schematic. The x-axis of the output plot would be a generic transcript, and below would be something similar to my expected plot.)
image

One final question:
Now isoforms would matter in my case, because different isoforms of the same gene may have different TSS and TES. If necessary, I have data to select the most abundant isoform for each gene, but is there any way to produce the plot without this selection?

Thank you again,
Zhuyi

@ericfournier2
Copy link

Hello Zhuyi,

Metagene2 cannot directly produce the plot you need. However, you can use it in rna-seq mode to summarize reads for the 5'UTR, CDS and 3'UTR regions separately, then use the generated tables to produce a combined plot like the one you showed.

Hope this helps,
-Eric

@MolyWang
Copy link
Author

Then which intermediate results should I save/output to make the pipeline most straightforward for me?
Thanks.

@ericfournier2
Copy link

Sorry for the delay!

You need to use the results from the add_metadata or the calculate_ci functions to get the data-frames which will allow you to combine the three types of regions into a single plot.

Cheers,
-Eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants