-
Notifications
You must be signed in to change notification settings - Fork 40
/
caterpillar_plots.Rmd
209 lines (168 loc) · 9.33 KB
/
caterpillar_plots.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
title: "Random Effects Plotting for Mixed Effects Models Fitted with glmmTMB"
author: "Isabella Ghement"
date: "24/07/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
**note**: at present this requires the `ignore_stuff` branch of `glmmTMB` (`remotes::install_github("glmmTMB/glmmTMB/glmmTMB@ignore_stuff")`)
```{r pkgs, message=FALSE}
## general manipulation
library(dplyr)
## graphics
library(ggplot2); theme_set(theme_bw())
library(qqplotr) ## for bands on Normal probability plots
## fitting packages
library(glmmTMB)
library(nlme)
library(lme4)
library(MCMCglmm)
## extraction
library(broom.mixed)
```
## Fit Mixed Effects Model
To illustrate plotting the predicted values of the random effects (and corresponding 95% uncertainty intervals) obtained from a mixed effects model fitted with `glmmTMB`, we consider the following model:
```{r model, warning = FALSE}
data(Orthodont,package="nlme")
## glmmTMB hiccups unless age is centered (scaling is harmless)
oo <- transform(Orthodont, c_age = drop(scale(age)))
form <- distance ~ c_age*Sex + (c_age | Subject)
fit_list <- list(
lmer = lmer(form, data = oo),
glmmTMB = glmmTMB(form, data = oo),
lme = lme(distance ~ c_age*Sex, random = ~c_age|Subject, data = oo),
MCMCglmm = MCMCglmm(distance ~ c_age*Sex, random = ~us(1 + c_age):Subject,
verbose = FALSE, data = oo, pr = TRUE,
prior = list(G=list(Subject=list(V=diag(2), nu =0.1))))
)
tt <- purrr::map_dfr(fit_list, tidy, effect = "ran_vals", .id = "pkg")
```
```{r}
t_glmmTMB <- dplyr::filter(tt, pkg == "glmmTMB")
gg0 <- ggplot(t_glmmTMB, aes(x=estimate, y=level)) +
geom_pointrange(aes(xmin= estimate-2*std.error, xmax = estimate + 2*std.error)) +
facet_wrap(~term, scale="free_x")
print(gg0)
```
```{r}
caterpillar_levels <- function(x, order_term = "(Intercept)") {
lev_order <- (x
%>% filter(term==order_term)
%>% group_by(level)
%>% summarise(across(estimate, mean, na.rm = TRUE))
%>% arrange(estimate)
%>% pull(level)
)
x <- dplyr::mutate(x, across(level, factor, levels = lev_order))
return(x)
}
gg0 %+% caterpillar_levels(t_glmmTMB)
```
We can *almost*
```{r grp1, eval = FALSE}
gg0 %+% caterpillar_levels(tt) + aes(colour=pkg, shape=pkg)
```
```{r grp2}
pd <- position_dodge(width =0.5)
gg1 <- ggplot(caterpillar_levels(tt), aes(x=estimate, y=level,
colour = pkg, shape = pkg)) +
geom_point(position = pd) +
geom_linerange(aes(xmin= estimate-2*std.error, xmax = estimate + 2*std.error),
position = pd) +
facet_wrap(~term, scale="free_x")
print(gg1)
```
## Compute Random Effects
Once we fit the mixed effects model using the `glmmTMB` package, we can extract the
predicted random effects using the `ranef()` function from this package. Because the
fitted model included random intercepts and random slopes for age, `ranef()` will return
predicted values for these quantities in two columns contained in the Subject component
of the object re_fm2. The two columns are titled (Intercept) and age, respectively.
```{r effects1}
fm2 <- fit_list$glmmTMB
re_fm2 <- glmmTMB::ranef(fm2, condVar = TRUE)
str(re_fm2)
```
Even though we applied the `ranef()` function to the fitted model fm2 using the option
`condVar = TRUE`, the conditional standard deviations associated with the predicted
random effects are hidden in the belly of the `re_fm2` object. These conditional
standard deviations - also known as the "posterior standard deviations" - can be accessed by transforming the `re_fm2` object to a data frame with the as.data.frame() command.
The *NaNs produced* warning produced by R after executing the `ranef()` command stems from its inability to compute conditional standard deviations associated with the random slopes of
age due to the fact that there isn't much variation among these random slopes.
As noted by @bmwiernik on Twitter:
>"glmmTMB estimates its models with (restricted) maximum likelihood, which by definition is looking for the mode of the distribution. A core assumption of (RE)ML (like w/ Laplace approx) is normality of parameters."
```{r effects2}
dfre_fm2 <- as.data.frame(x = re_fm2)
dfre_fm2
```
The data frame version of the object `re_fm2`, titled `re_fm2`, contains several columns, as follows.
The **component** column specifies the model component to which the predicted random effects
and their associated conditional standard deviations refer to. For the current model, this
**component** column contains the value *cond* throughout - shortcut for the conditional mean
component of the model, which is modelled on the scale given by the specified link function.
(The current model fm2 uses a gaussian family and the default identity link function.)
Mixed effects models with both a conditional mean component and a zero inflation component will contain the values *cond* and *zi* in the **component** column if both of these model components include random effects in their specification.
The **grp** column specifies the random grouping variable(s) included in the model. The current model, fm2, includes a single grouping variable - Subject - which is used to populate the entire
**grp** column.
The **term** column specifies the random intercepts and the random slope(s) allowed for in the model component(s) corresponding to the fitted model. The random intercepts are labeled as (Intercept) and the random slope(s) are labelled by the corresponding predictor variable - in this case, age.
The **grp** column lists the actual levels of the random grouping variable(s) present in the data.
Since the current model, fm2, only includes Subject as a random grouping variable, the **grp** lists the subject IDs. Each subject gets their own predicted value for the random intercept and random slope of age.
The **condval** column lists the predicted values of the random intercepts and random slopes of age
corresponding to the subjects included in the model. The **condsd** colum lists the associated conditional standard deviations. These two pieces of information can be used to construct 95% uncertainty intervals for the random intercept and random slope corresponding to a subject via the formula: **condval ± 2 condsd**.
## Caterpillar and Dot Plots of Predicted Random Effects
The predicted random effects and their associated 95% uncertainty intervals can be plotted using the R code below.
For the random intercepts, where conditional standard deviations are available based on
the fitted model fm2, 95% uncertainty intervals can be added using the geom_errorbarh() function in the ggplot2 package. The resulting plot is a *caterpillar plot* which shows the levels of the
random grouping factor Subject on its vertical axis and the predicted values of the
random intercepts (along with corresponding 95% uncertainty intervals) on its horizontal
axis. In this plot, the levels of Subject are ordered according to increasing values of the
predicted random intercepts. The plot can be used, for example, to detect subjects with extreme
values of the predicted random effects.
For the random slopes, where conditional standard deviations are not available based on
the fitted model fm2, 95% uncertainty intervals cannot be added.
The resulting plot is a *dot plot*.
```{r plot, echo=TRUE, warning = FALSE, message = FALSE}
plot_re <- dfre_fm2 %>%
## filter(component %in% "cond", grpvar %in% "Subject") %>%
ggplot(aes(y=grp, x = condval)) +
geom_point(aes(y = grp, colour = grp), size=3) +
geom_errorbarh(aes(xmin = condval - 2*condsd,
xmax = condval + 2*condsd,
colour = grp),
height = 0, size = 0.7) +
facet_wrap(~ term, ncol = 2) +
ylab("Subject\n") +
xlab("\nConditional Mode ± 2 * Conditional Standard Deviation") +
ggtitle("Predicted Random Intercept and Slope of age Effects Associated with Subject") +
geom_vline(xintercept = 0, linetype = 2, colour="grey", size = 0.7) +
theme(legend.position = "none")
plot_re
```
## Normal Probability Plots of Predicted Random Effects
To assess the normality of the random intercepts and random slopes, we can construct
normal probability plots using R code such as the one below. This code relies on the
R package `qqplotr` to add a normal confidence band to the normal probability plot.
For details on `qqplotr`, see:
<https://cran.r-project.org/web/packages/qqplotr/vignettes/introduction.html>.
```{r qqplot, echo=TRUE, warning = FALSE, message = FALSE}
library(qqplotr)
dfre_fm2 %>%
filter(component %in% "cond",
grpvar %in% "Subject") %>%
ggplot(aes(sample = condval)) +
stat_qq_band(colour = "lightblue", fill = "lightblue", alpha = 0.5) +
stat_qq_line() +
stat_qq_point() +
facet_wrap(~ term, scale = "free") +
labs(x = "Theoretical Quantiles", y = "Sample Quantiles of Predicted Random Effects")
```
Or with the tidied data:
```{r qqplot2}
ggplot(t_glmmTMB, aes(sample = estimate)) +
stat_qq_band(colour = "lightblue", fill = "lightblue", alpha = 0.5) +
stat_qq_line() +
stat_qq_point() +
facet_wrap(~ term, scale = "free")
```