A problem that arises in plant breeding is to understand individual genetic regions contributing to trait variation across geographic space. The data are high-dimensional and structured by a host of growing season phenotypes of plants, for instance, switchgrass, screened for a series of high-throughput physiological measurements in 10 different latitudes for multiple years 1.
One is interested in determining the genomic regions underlying upland/lowland ecotype divergence and adaptation for switchgrass by assessing gene by environment interactions (GxE) across space and climate variation. A high-dimensional multivariate linear mixed model assuming phenotypes are correlated due to genetic similarity, measured by genome-wide markers, and environmental similarity, by climatic or edaphic information, is built as follows:
\begin{equation}
Y=XBZ'+R+E,
\end{equation}
or implicitly
\begin{equation}
Y^v\sim MVN((Z\otimes X) B^v,\tau^2 K_C\otimes K_G+\Sigma\otimes I_n),
\end{equation}
where the superscript
Incorporating genetic background information, kinship matrix (
The algorithm is developed in Julia using an Expectation Conditional Maximization (ECM) 4 with a Speed Restarting Nesterov’s Accelerated
Gradient method 5. Extensive simulations show that the results are relatively insensitive to different
Furthermore, the analyses of fitness in Arabidopsis thaliana with climate
information in 2 sites by 3 years (
-
Preprint is available in BioRxiv.
-
Presentation slides can be found here.
-
Github repository : FlxQTL.jl
[1] D. B. Lowry, J. T. Lovell, L. Zhang, J. Bonnette, P. A. Fay, R. B. Mitchell, J. Lloyd-Reilley, A. R. Boe, Y. Wu, F. M. Rouquette Jr, R. L. Wynia, X. Weng, K. D. Behrman, A. Healey, K. Barry, A. Lipzen, D. Bauer, A. Sharma, J. Jenkins, J. Schmutz, F. B. Fritschi, and T. E. Juenger. QTL x environment interactions underlie adaptive divergence in switchgrass across a large latitudinal gradient. PNAS, 116(26):12933-12941, 2019.
[2] X. Zhou and M. Stephens. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature Methods, 11(4):407-409, 2014.
[3] C. Lippert, J. Listgarten, Y. Liu, C. M. Kadie, R. I. Davidson, and D. Heckerman. FaST linear mixed models for genome-wide association studies. Nature Methods, 8(10):833-835, 2011.
[4] X. L. Meng and D. B. Rubin. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrica, 80(2):267-278, 1993.
[5] W. Su, S. Boyd and E. Candès. A differential equation for modeling Nesterov’s accelerated gradient method: Theory and Insights. In Advances in Neural Information Processing Systems, pages 2510-2518, 2014.
[6] J. Ågren, C. G. Oakley, J. K. McKay, J. T. Lovell, and D. W. Schemske. Genetic mapping of adaptation
reveals fitness tradeoffs in Arabidopsis thaliana. PNAS, 110(52): 21077-21082, 2013.
[7] M. M. Gray, M. D. Parmenter, C. A. Hogan, I. Ford, R. J. Cuthbert, P. G. Ryan, K. Broman, and B. A.
Payseur. Genetics of rapid and extreme size evolution in island mice. Genetics, 201(1):213-228, 2015.