-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Analysis Example: RNA-seq Pathway analysis -- ORA #344
Comments
In my opinion, I think we can get started on this before we have a decision on #340. (@cansavvy might share that opinion, but it's hard to tell based on this issue.) Specifically, we can start to list all the ways this could potentially differ from the microarray example. One idea that just occurred to me and is further afield would be not to use differential expression analysis results at all. You could instead imagine a situation where you have some grouping of genes (a co-expression module?) and want to know if there's an overlap with pathways/gene sets. If one did that, that would make the upstream steps different by technology. That's not originally what we talked about, but might be a good illustration of a situation where you would probably be better served using ORA – in a lot of other cases, if you're looking at genome-wide differential expression results, GSEA is probably preferred. |
I wasn't sure what made sense which is why this issue is quite wishy washy. I'm going to try to think up and explore some options and post them here and tomorrow morning @cbethell are going to do some planning for this and #343.
If we want to look into doing this kind of strategy, should we consider showing users how to run WGCNA or something similar? |
Yep, that was my thought. I'm not sure if WGCNA is currently thought to be the best way to find co-expression modules these days. I have some vague recollection of recent literature that would suggest not, but I think looking into what to use should be part of this issue. |
Related to #346, I've been trying out some things with WGCNA and CoGaps, and I think for the purposes of ORA, going with a quick WGCNA and running ORA on the biggest gene module seems like a straightforward and not too crazy way to go. Perhaps at a different time we can look into using CoGaps for its own example, but that is probably too much conceptual info as well a computing power that would be needed for the purposes of running ORA. |
This being said, I'll propose an outline for what an ORA of a gene module might look like and then if it seems reasonable, I'll prepare a draft PR we can discuss it over. |
I think CoGAPS is a great contender for an "advanced usage" example where we use a larger dataset comprised of multiple experiments if you want to get a new issue file to track what you've found. |
Rough outline:
From here we use the genes in the "most of interest" module and carry on with the ORA steps we've used in the microarray module so: |
For WGCNA, we should probably normalize with DESeq2 instead of using refine.bio normalized data? |
With the DESeq2 normalization steps and WGCNA, this seems like it will be too much for one notebook. Even after trying to trim it down, its ~640 lines. I think WGCNA (if we are to use it) needs to be its own example. |
Update: WGCNA (which we are going to use results from for ORA input) is going to become an "advanced-topics" module; we can still use the output from there, but if in the future we add a simpler method of reaching a group of genes (say a k-means clustering of genes example or what have you) Then ORA for RNA-seq can be updated at that time to use that output. On a basic level we want a group of genes that it makes sense to run ORA for, but not differential expression results since it would make more sense to use GSEA for that. |
A reminder that the intro paragraph from #349 will need to be added here, and the table will need to be made to reflect the RNA-seq versions of the analyses. |
All wrapped up! |
I'd consider reopening this and calling it wrapped up when merged to master? This seems like a good place to track that final step. |
What are the goals of this new example analysis?
We show how to use microarray (limma) results to do ORA: #206
It might be good to show how to do ORA with RNA-seq (DESeq2 output) -- but we should keep in mind that these will likely not differ too much. This will start out as an exploration of how different these examples might be? This will help inform the organizational discussion on #340
If we find that these RNA-seq vs microarray examples are going to be SO identical, we may want to reconsider having separate technology examples (related to #223 and #175) and this is something we can comment about on #340
Alternatively, if we do want to maintain separate technology examples, we can take the strategy of illustrating different aspects/strategies in this ORA example as compared to what is shown in the microarray example. Aka, maybe we could decide on a gene list using a different method and make different plots??
What kind of dataset will this need?
This will need RNA-seq differential expression results, which currently for RNA-seq we have one option already prepared: 03-rnaseq/differential-expression_rnaseq_01.html (until #242 is completed that is).
This will involve using this file specifically: SRP078441_differential_expression_results.tsv which is based on AML patients.
We haven't used this for pathway analysis yet. If it turns out to be an insignificant dud, we may have to look into completing #242 and trying the results from that instead.
What steps should be included in this analysis?
The steps should for the most part follow what is being used for microarray: https://github.com/AlexsLemonade/refinebio-examples/blob/staging/02-microarray/pathway-analysis_microarray_02_ora.Rmd
We'll have to change the steps to use human though instead of zebrafish and also consider what other alternative decisions we may want to show users as possibilities.
What packages/methods do you recommend using or looking into for this analysis?
Same as the microarray example, we should still be able to use clusterProfiler.
The text was updated successfully, but these errors were encountered: