-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eqtl analysis problem - decrease running time and memory burden - running mashr by chromosome #127
Comments
@jke20 Thanks for your feedback. Could you tell us a little bit more about the inputs you are providing to mash? If I understand correctly, your Bhat is roughly 10,000 x 100 (one row for each gene, one column for each brain tissue)? |
@pcarbo thank you for the reply, the matrix is 200,000,000 * 100. rows for gene-variant pairs and columns for tissues. |
@jke20 Potentially you could take a random subset of the gene-variant pairs, then rerun mash a second time with |
Hi, thank you very much for the help and i think mashr is running nicely right now.
I wonder what's the difference between the results from above and the results if I run mash with only 1 type of covariance (like m = mash(data.random, Ulist = c(U.ed), outputlevel=1)). Thanks! |
Hi Jianfeng,
Thanks for your questions - it you run with the list of deconvolved (U.ed) as you write l after it has already been initialized with U.pca and data.strong , U.ed should return a list of several covariance matrices (depending on how many PCS you initialized with, here 5). That will allow mash to put weight on flexible patterns, whether it’s enough (or also requires canonical matrices) depends on your data - you can try both and see what improves likelihood on training/testing workflow.
Thanks for your question.
-Sarah
… On Aug 27, 2024, at 10:24 PM, Jianfeng Ke ***@***.***> wrote:
Hi, thank you very much for the help and i think mashr is running nicely right now.
Here is a following question:
Below I run mashr with two types of covariances:
# data driven covariances
U.pca = cov_pca(data.strong, 5)
U.ed = cov_ed(data.strong, U.pca)
# canonical covariances
U.c = cov_canonical(data.random)
# run mashr for null hypothesis
m = mash(data.random, Ulist = c(U.ed,U.c), outputlevel=1)
# rerun mashr on strong matrix
m2 = mash(data.strong, g=get_fitted_g(m), fixg=TRUE)
I wonder what's the difference between the results from above and the results if I run mash with only 1 type of covariance (like m = mash(data.random, Ulist = c(U.ed), outputlevel=1)). Thanks!
—
Reply to this email directly, view it on GitHub <#127 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCI4XIGYRA4ZNRBJPKW6B3ZTUYG7AVCNFSM6AAAAABL7QVDYGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTHE3DCNBXGE>.
You are receiving this because you are subscribed to this thread.
|
Thanks, Sarah. Just to add to what Sarah said, in general mash will be faster with fewer matrices, but more matrices gives you more flexibility to model different sharing patterns. So there will be a tradeoff. In practice, as Sarah said, the data-driven matrices (U.ed) in your code are more adaptable, so |
Hi dear authors, thank you so much for developing mashr.
Recently, I was trying to apply mashr on my eqtl pipeline outputs for discovering tissue-specific effects and shared-tissue effects (conditions here are different tissues in human brains).
As you know, there are many gene-variant pairs and in our study, there are over 100 brain tissues. To decrease the computational burden, I wonder if I can run mashr by chromosome using the same covariance (strong matrix that takes the most significant eqtl from each gene across all chromosomes)? I don't know how will the final results be affected if I do that.
Thank you for your help in advance!
The text was updated successfully, but these errors were encountered: