-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: WGCNA module draft #350
Conversation
network-analysis_rnaseq_01_wgcna.html.zip Here's the rendered html if you wanna see what the draft plots look like without checking out the branch. |
I'll start with the elephant, which is that I share your concerns about this being a bit of a bear (mixed animal metaphors for the win). I'm not 100% sure how I feel about that on the whole. It does seem like something that might end up in "advanced" rather than the main set of analyses. I am also not sure whether you are intending to keep this described as ORA or if that is just leftovers from the template. I see it as quite distinct from a standard ORA, and I would not refer to it as such. Maybe you are planning to use these modules as the sets for ORA, but that would seem rather circular to me. Skipping to the end, I feel like we would want some kind of rough conclusion for users to get out of this analysis. Is there a way we can reduce the number of modules that are returned to get the ones we might care most about? What do we do with these modules once we find them? I think the relationship of the eigengenes to the metadata labels gets closest to answering this question, so it seems like the best place to focus effort. To make this easier to interpret, we might try to come up with ways to reduce the number of Eigengenes that we get out. That might involve adjusting and examining some parameters and thresholds, which I imagine is work that might not make it into an example notebook, but should be done. The thing we need to think about is how to provide a balance between a single example that looks pretty good and the fact that parameters and results should be examined carefully. Answering your question about A couple of specific notes:
with:
To me that is more clear what is going on, and does it all in one neat step. |
@jashapiro, This was my attempt to "simplify" the WGCNA tutorials, but I think you are correct that it is still too much for a non-advanced topics user to go through. I tried to give just enough parameter guidance as to not be totally cavalier with the analysis but I think if someone were to really get into this the right way, they should dive into the parameters more than the user of this section may be willing or able to do. So... I think moving this to advanced-topics makes sense. To clear up the confusion about is this ORA or not: So, this means as far as timeline, we need to discuss the further work on this since the priority for this sprint is getting the pathway analyses and this was a tangent that was supposed to feed into an ORA, pathway analysis example. That being said it doesn't make sense to have an advanced topics example feed into a "main" example, so we will have to reevaluate where we are getting a gene set from for the ORA example #344. In other notes:
|
Is this a thing that people do? Because as I said, it seems kind of circular. I can see taking modules from one data set and using them for ORA in another data set, but if you are using modules defined in the same data set, I would assume that all the genes will be significant or none of them will be, as the modules are defined by the correlations between genes... |
The question I was after is "Does this coexpression module significantly overlap with known pathways?" So given a module that is associated with metadata, for example, are p53 pathway genes overrepresented?
I'm not outright opposed to having an advanced example feed into a "main" example. We're not sure people will ever look at more than one example and our point for getting on this tangent was basically that any list of genes of interest will work. |
I suppose that's fair. We usually link to the notebook where the data come from though, in case the user wants to produce the data themselves, so I guess we'd have to ask if it's okay to link a user to a more advanced topic or will that be overwhelming. It's obviously unclear to us at this stage how users will be navigating these kinds of examples with predecessors so perhaps it's not a problem. |
Ah, that makes more sense... I am also not opposed to using an advanced example to feed back in, but it does seems a bit of a slippery slope. It could add confusion as to the "why" for a particular analysis. |
I think the plan is to move forward as is, but when we add a more simple method of getting gene cluster example, we can replace it in ORA. See here: #344 (comment) |
So it seems we probably want to end up with a gene list from a module that is most related to a metadata variable of interest. This seems most pertinent to what users would want to do as well as our downstream ORA application. This is somewhat related to my rough draft of a heatmap of eigengenes with metadata labels, but should probably be a little more methodical than that. One of the WGCNA tutorials suggests doing this kind of analysis through more correlations and a plot like this: If we want to do this type of analysis, this means we'd actually have to have a meaningful numeric variable or more which means this dataset won't work and I'll have to do some digging to find a dataset that would work. (So meanwhile I'm not going to worry about playing with power if we switch datasets completely). However, I'm not crazy about more correlations, so if people have other suggestions of how to identify modules of most interest, I'm open ears. I'll do some digging and see if there's other legit ideas out there. |
You can test the module eigengenes for differential expression if you have a categorical variable that's of interest. |
I was thinking about that but didn't know if that'd be considered a "valid" method. |
It absolutely would be! Think about the eigengenes as a smaller set of genes that you can use to reduce multiple testing burdens. SO you can treat them as "gene" with measured expression, and then ask if the expression is different between the groups you care about. In the example you showed above, they are comparing lots of different continuous traits to eigengene "expression", and plotting all the correlation coefficients, but the much simpler case is when you have only one trait and you want to see which of the eigengenes is correlated with it. In the very simplest case, you can do something like a t-test comparing expression between two groups of samples. If you go from 20,000 genes to 100 eigengenes, that can give you a huge boost in power (statistical power here for realsies) because you don't have so much multiple testing to correct for. |
I wish GitHub had ☝️ as a reaction. @jashapiro is exactly right. WGCNA is both somewhat biologically motivated if we think that genes that are coordinated in expression might participate in a coherent process and technically motivated if we want to reduce the number of variables we need to test. |
So for differential expression do I make a new DESeq2 object from the eigengenes and use DESeq2 for differential expression per usual? Or do I do an ANOVA with a base R function? |
I would probably keep it simple per Josh's suggestion. |
I'm probably going to open up a second Draft PR since much of this discussion has lead to quite a bit of changes and it'd probably be best to start fresh. |
Analysis Purpose
It's a spin off issue from #306 and the discussion on #346.
Ultimately I was trying to see if we could squeak in a quick WGCNA run for ORA in one module, but as you can see here, it's not really realistic for the. WGCNA run to be that short particularly since we also need to do some DESeq2 transformations before we can get to WGCNA.
Pull Request Stage
This is a Draft PR - needs review of big concepts and outline
Strategy
I've been trying to sort through what seemed like the important points of the WGCNA tutorials and:
Also made sure we are following the RNA-seq guidance that's written in the FAQ of WGCNA.
Concerns/Questions for reviewers:
In the "Explore WGCNA" results section you will see I'm doing a number of things, making multiple plots. I in no way expect we keep all those things, but instead I want to know which things we find most useful and we can delete the others and I'll expand a tad more on the useful things. Here's a summary of the things I'm proposing (some of which are inspired by plots I see in WGCNA tutorials):
You'll see I briefly mention/explain
TOMtype
parameter. Do we think this is unnecessary/out of scope for our users?I've been looking at the tutorials and papers and forums and I can't for the life of me find an intuitive explanation for what "power" for soft-threshold network construction actually is (not info on how to use it, but like what does it mean?? What does WGCNA do with it). So if anyone has insight on that, I couldn't figure it out.
I have set the seed in the main WGCNA function
blockwiseModules()
but it sounds like a global set the seed might work instead? I suppose global is preferred or no?Analysis Pull Request Check List (roughly in order):
Content checks
{{BLANKS}}
have been replaced with the correct content.Formatting Checks
Add datasets to S3
Docker/Snakemake rendering components
.html
link to the navigation bar.