eqtl analysis problem - decrease running time and memory burden - running mashr by chromosome #127

jke20 · 2024-08-05T04:52:03Z

Hi dear authors, thank you so much for developing mashr.
Recently, I was trying to apply mashr on my eqtl pipeline outputs for discovering tissue-specific effects and shared-tissue effects (conditions here are different tissues in human brains).
As you know, there are many gene-variant pairs and in our study, there are over 100 brain tissues. To decrease the computational burden, I wonder if I can run mashr by chromosome using the same covariance (strong matrix that takes the most significant eqtl from each gene across all chromosomes)? I don't know how will the final results be affected if I do that.
Thank you for your help in advance!

pcarbo · 2024-08-06T14:05:36Z

@jke20 Thanks for your feedback. Could you tell us a little bit more about the inputs you are providing to mash? If I understand correctly, your Bhat is roughly 10,000 x 100 (one row for each gene, one column for each brain tissue)?

jke20 · 2024-08-06T15:20:58Z

@pcarbo thank you for the reply, the matrix is 200,000,000 * 100. rows for gene-variant pairs and columns for tissues.

pcarbo · 2024-08-06T15:42:07Z

@jke20 Potentially you could take a random subset of the gene-variant pairs, then rerun mash a second time with fixg = TRUE on each chromosome, for example; see help(mash) for details.

jke20 · 2024-08-28T02:24:24Z

Hi, thank you very much for the help and i think mashr is running nicely right now.
Here is a following question:
Below I run mashr with two types of covariances:

# data driven covariances
U.pca = cov_pca(data.strong, 5)
U.ed = cov_ed(data.strong, U.pca)
# canonical covariances
U.c = cov_canonical(data.random)
# run mashr for null hypothesis
m = mash(data.random, Ulist = c(U.ed,U.c), outputlevel=1)
# rerun mashr on strong matrix
m2 = mash(data.strong, g=get_fitted_g(m), fixg=TRUE)

I wonder what's the difference between the results from above and the results if I run mash with only 1 type of covariance (like m = mash(data.random, Ulist = c(U.ed), outputlevel=1)). Thanks!

surbut · 2024-08-28T10:39:09Z

Hi Jianfeng, Thanks for your questions - it you run with the list of deconvolved (U.ed) as you write l after it has already been initialized with U.pca and data.strong , U.ed should return a list of several covariance matrices (depending on how many PCS you initialized with, here 5). That will allow mash to put weight on flexible patterns, whether it’s enough (or also requires canonical matrices) depends on your data - you can try both and see what improves likelihood on training/testing workflow. Thanks for your question. -Sarah

…

On Aug 27, 2024, at 10:24 PM, Jianfeng Ke ***@***.***> wrote: Hi, thank you very much for the help and i think mashr is running nicely right now. Here is a following question: Below I run mashr with two types of covariances: # data driven covariances U.pca = cov_pca(data.strong, 5) U.ed = cov_ed(data.strong, U.pca) # canonical covariances U.c = cov_canonical(data.random) # run mashr for null hypothesis m = mash(data.random, Ulist = c(U.ed,U.c), outputlevel=1) # rerun mashr on strong matrix m2 = mash(data.strong, g=get_fitted_g(m), fixg=TRUE) I wonder what's the difference between the results from above and the results if I run mash with only 1 type of covariance (like m = mash(data.random, Ulist = c(U.ed), outputlevel=1)). Thanks! — Reply to this email directly, view it on GitHub <#127 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCI4XIGYRA4ZNRBJPKW6B3ZTUYG7AVCNFSM6AAAAABL7QVDYGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTHE3DCNBXGE>. You are receiving this because you are subscribed to this thread.

pcarbo · 2024-08-28T18:33:45Z

Thanks, Sarah.

Just to add to what Sarah said, in general mash will be faster with fewer matrices, but more matrices gives you more flexibility to model different sharing patterns. So there will be a tradeoff. In practice, as Sarah said, the data-driven matrices (U.ed) in your code are more adaptable, so Ulist = U.ed could be a convenient (i.e., slightly faster) option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eqtl analysis problem - decrease running time and memory burden - running mashr by chromosome #127

eqtl analysis problem - decrease running time and memory burden - running mashr by chromosome #127

jke20 commented Aug 5, 2024

pcarbo commented Aug 6, 2024

jke20 commented Aug 6, 2024

pcarbo commented Aug 6, 2024

jke20 commented Aug 28, 2024

surbut commented Aug 28, 2024 via email

pcarbo commented Aug 28, 2024

eqtl analysis problem - decrease running time and memory burden - running mashr by chromosome #127

eqtl analysis problem - decrease running time and memory burden - running mashr by chromosome #127

Comments

jke20 commented Aug 5, 2024

pcarbo commented Aug 6, 2024

jke20 commented Aug 6, 2024

pcarbo commented Aug 6, 2024

jke20 commented Aug 28, 2024

surbut commented Aug 28, 2024 via email

pcarbo commented Aug 28, 2024