This repo contains slides and exercise materials for my workshop on statistical modeling and mixed models with R. Previous instances of this workshop:
- The first instance of this workshop was held as part of the Data on the Mind 2017. Title: Statistical Models for Dependent Data: An Introduction to Mixed Models in R
- One day workshop at the University of Freiburg in June 2018. Title: Mixed Models in R – An Applied Introduction
- One day tutorial at CogSci 2018 in Madison (Wisconsin). Title: Mixed Models in R – An Applied Introduction
The mixed model part of the workshop are loosely based on my chapter: An introduction to linear mixed modeling in experimental psychology. Read the chapter to get a more comprehensive overview.
The repo currently contains three html
presentations:
In addition, the repo contains a pdf
handout providing a concise overview.
- A recent version of
R
(currentlyR 3.5.2
):https://cran.rstudio.com/
R
packages necessary for the analysis (install withinstall.packages("package")
atR
prompt):afex
(which automatically installs the additional requirementsemmeans
,lme4
, andcar
) andpsych
andMEMSS
(for example data)R
packagetidyverse
as well asbroom
for the exercises (we mainly needdplyr
,broom
,tidyr
,purrr
, andggplot2
).R
packagexaringan
to compile the slides.R
packagesjstats
for Intraclass Correlation Coefficient (ICC).R
packageGGally
for some plots.- Possibly
R
packagessjPlot
andMuMIn
for some examples. - A html 5 compatible browser to view the slides.
RStudio
: https://www.rstudio.com/products/rstudio/download3/#download
This workshop will cover several topics in an integrative fashion. The first half day will be devoted to an overview of modern tools for data science in R: Rmarkdown and the tidyverse. Based on this knowledge, the second half day will be devoted to an introduction to statistical modeling in R. The second day will introduce mixed models. Mixed models are a generalization of ordinary regression models that explicitly capture dependencies among related data points via random-effects parameters. Such dependencies are ubiquitous in psychology due to collecting more than one data point from the same participant and/or from the same item. Compared to traditional analyses approaches that ignore these dependencies, mixed models provide more accurate (and generalizable) estimates, improved statistical power, and non-inflated Type I errors. The workshop will introduce the functionality of lme4, the gold standard for estimating mixed models in R. In addition, it will introduce the functionality of afex, which simplifies many aspects of using lme4, such as the calculation of p-values for mixed models. Attendants are expected to have knowledge of R.
In order to increase statistical power and precision, many data sets in cognitive and behavioral sciences contain more than one data point from each unit of observation (e.g., participant), often across different experimental conditions. Such repeated-measures pose a problem to most standard statistical procedures such as ordinary least-squares regression, (between-subjects) ANOVA, or generalized linear models (e.g., logistic regression) as these procedures assume that the data points are independent and identically distributed. In case of repeated measures, the independence assumption is expected to be violated. For example, observations coming from the same participant are usually correlated - they are more likely to be similar to each other than two observations coming from two different participants.
The goal of this workshop is to introduce a class of statistical models that is able to account for most of the cases of non-independence that are typically encountered in cognitive science – linear mixed-effects models (Baayen, Davidson, & Bates, 2008), or mixed models for short. Mixed models are a generalization of ordinary regression that explicitly capture the dependency among data points via random-effects parameters. Compared to traditional analyses approaches that ignore these dependencies, mixed models provide more accurate (and generalizable) estimates of the effects, improved statistical power, and non-inflated Type I errors (e.g., Barr, Levy, Scheepers, & Tily, 2013).
In recent years, mixed models have become increasingly popular. One of the main reason for this is that a number of software packages have appeared that allow to estimate large classes of mixed models in a relatively convenient manner. The workshop will focus on lme4
(Bates, Mächler, Bolker, & Walker, 2015), the gold standard for estimating mixed models in R
(R Core Team, 2018). In addition, it will introduce the functionality of afex
(Singmann, Bolker, Westfall, & Aust, 2017), which simplifies many aspects of using lme4
, such as the calculation of p-values for mixed models. afex
was specifically developed with a focus on factorial designs that are common in cognitive and behavioral sciences.
Despite a number of high impact publications that introduce mixed models to a wide variety of audiences (e.g., Baayen et al., 2008; Judd, Westfall, & Kenny, 2012) the application of mixed models in practice is far from trivial. Applying mixed models requires a number of steps and decisions that are not necessarily part of the methodological arsenal of every researcher. The goal of the workshop is to change this and to introduce mixed models in such a way that they can be effectively used and the results communicated.
The workshop is split into three parts. The first half day will be devoted to an overview of modern tools for data science in R
: Rmarkdown
and the tidyverse
. Based on this knowledge, the second half day will be devoted to an introduction to statistical modeling in R
. The second day will introduce mixed models using the lme4
package, the gold standard for estimating mixed models in R. In addition, it will introduce the functionality of afex
(the package of the convenor), which simplifies many aspects of using lme4, such as the calculation of p-values for mixed models.
Participants of the workshop need some basic knowledge of R. For example, they should be able to read in data, select subsets of the data, and estimate a linear regression model. Participants without any R knowledge will likely nor profit from the workshop.
- Baayen, H., Davidson, D. J., & Bates, D. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
- Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01
- Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001
- Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54–69. https://doi.org/10.1037/a0028347
- Singmann, H., Bolker, B., Westfall, J., & Aust, F. (2017). afex: Analysis of Factorial Experiments. R package version 0.18-0. http://cran.r-project.org/package=afex
- R Core Team. (2017). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/
- Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Sebastopol CA: O’Reilly.
Last edited: May 2019
All code in this repository is released under the GPL v2 or later license. All non-code materials is released under the CC-BY-SA license.