-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make augment.merMod()
more consistent with predict.merMod()
when using newdata
#141
Comments
Thanks! Given the number of open issues and their heterogeneity I really think I need to go through and tag them with 'feature-request'/'enhancement' etc. so I can prioritize them and fix the ones that really need to be fixed (I would put this one in that category ... maybe I'll add an 'infelicity' tag [that's Bill Venables's neutral term for "it's not technically a bug but it's definitely bad behaviour"] that's just below 'bug' in priority ...) |
Welcome! Haha, I like the infelicity tag. It's probably also worth checking if this issue applies to any of the other functions that rely on Here's a very rough rewrite of library(tibble)
library(lme4)
#> Loading required package: Matrix
library(broom)
library(broom.mixed)
lmm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
augment.merMod <- function(x, data = stats::model.frame(x), newdata, ...) {
# Augment the original data used to fit the model
if (missing(newdata)) {
# move rownames if necessary
newdata <- NULL
ret <- suppressMessages(augment_columns(x, data, newdata, se.fit = NULL, ...))
# add predictions with no random effects (population means)
predictions <- stats::predict(x, re.form = NA)
# some cases, such as values returned from nlmer, return more than one
# prediction per observation. Not clear how those cases would be tidied
if (length(predictions) == nrow(ret)) {
ret$.fixed <- predictions
}
# columns to extract from resp reference object
# these include relevant ones that could be present in lmResp, glmResp,
# or nlsResp objects
respCols <- c(
"mu", "offset", "sqrtXwt", "sqrtrwt", "weights",
"wtres", "gam", "eta"
)
cols <- lapply(respCols, function(cc) x@resp[[cc]])
names(cols) <- paste0(".", respCols)
## remove too-long fields and empty fields
n_vals <- vapply(cols,length,1L)
min_n <- min(n_vals[n_vals>0])
cols <- dplyr::bind_cols(cols[n_vals==min_n])
cols <- broom.mixed:::insert_NAs(cols, ret)
if (length(cols) > 0) {
ret <- dplyr::bind_cols(ret, cols)
}
return(broom.mixed:::unrowname(ret))
# Make predictions on new data
} else {
ret <- suppressMessages(augment_columns(x, data, newdata, se.fit = NULL, ...))
# Throw an error when re.form isn't specified, and there's no grouping
# variable in newdata. This is fragile but just intended for demonstration.
# Note: Can't use missing() since re.form comes from the ... args.
if (!hasArg(re.form) & ncol(stats::model.frame(x)) != ncol(ret)) {
stop("No data provided for grouping variable.")
}
# add predictions on newdata with no random effects (population means)
predictions <- stats::predict(x, newdata, re.form = NA)
# some cases, such as values returned from nlmer, return more than one
# prediction per observation. Not clear how those cases would be tidied
if (length(predictions) == nrow(ret)) {
ret$.fixed <- predictions
}
tibble::tibble(ret, .mu = NA, .offset = NA, etc. = NA)
}
}
augment(lmm1)
#> # A tibble: 180 × 14
#> Reaction Days Subject .fitted .resid .hat .cooksd .fixed .mu .offset
#> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 250. 0 308 254. -4.10 0.229 0.00496 251. 254. 0
#> 2 259. 1 308 273. -14.6 0.170 0.0402 262. 273. 0
#> 3 251. 2 308 293. -42.2 0.127 0.226 272. 293. 0
#> 4 321. 3 308 313. 8.78 0.101 0.00731 283. 313. 0
#> 5 357. 4 308 332. 24.5 0.0910 0.0506 293. 332. 0
#> 6 415. 5 308 352. 62.7 0.0981 0.362 304. 352. 0
#> 7 382. 6 308 372. 10.5 0.122 0.0134 314. 372. 0
#> 8 290. 7 308 391. -101. 0.162 1.81 325. 391. 0
#> 9 431. 8 308 411. 19.6 0.219 0.106 335. 411. 0
#> 10 466. 9 308 431. 35.7 0.293 0.571 346. 431. 0
#> # … with 170 more rows, and 4 more variables: .sqrtXwt <dbl>, .sqrtrwt <dbl>,
#> # .weights <dbl>, .wtres <dbl>
augment(lmm1, newdata = expand.grid(Days = 0:3, Subject = c(308, 310)))
#> # A tibble: 8 × 7
#> Days Subject .fitted .fixed .mu .offset etc.
#> <int> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl>
#> 1 0 308 254. 251. NA NA NA
#> 2 1 308 273. 262. NA NA NA
#> 3 2 308 293. 272. NA NA NA
#> 4 3 308 313. 283. NA NA NA
#> 5 0 310 212. 251. NA NA NA
#> 6 1 310 217. 262. NA NA NA
#> 7 2 310 222. 272. NA NA NA
#> 8 3 310 227. 283. NA NA NA
augment(lmm1, newdata = tibble(Days = 0:3))
#> Error in augment.merMod(lmm1, newdata = tibble(Days = 0:3)): No data provided for grouping variable.
augment(lmm1, newdata = data.frame(Days = 0:3), re.form = NA)
#> # A tibble: 4 × 6
#> Days .fitted .fixed .mu .offset etc.
#> <int> <dbl> <dbl> <lgl> <lgl> <lgl>
#> 1 0 251. 251. NA NA NA
#> 2 1 262. 262. NA NA NA
#> 3 2 272. 272. NA NA NA
#> 4 3 283. 283. NA NA NA
augment(lmm1, newdata = data.frame(Days = 6:9), re.form = NA)
#> # A tibble: 4 × 6
#> Days .fitted .fixed .mu .offset etc.
#> <int> <dbl> <dbl> <lgl> <lgl> <lgl>
#> 1 6 314. 314. NA NA NA
#> 2 7 325. 325. NA NA NA
#> 3 8 335. 335. NA NA NA
#> 4 9 346. 346. NA NA NA Created on 2023-05-30 with reprex v2.0.2 |
This is related to #125, but I felt like it deserved its own issue.
The behaviour (and documentation) of
augment.merMod()
when making predictions on new data could use some love. The current behaviour is inconsistent withpredict.merMod()
and leads to unexpected results that can be misleading or unclear.Here's a reprex covering some of the issue's I found. I think the function needs a rewrite to handle augmenting the original data used to fit the model differently from making predictions on new data. Perhaps dropping its dependence on
broom::augment_columns()
given some of its behaviour (or at least adding some error checking for the cases where it should fail).Regarding documentation, it isn't documented anywhere that you can use the
re.form
argument withaugment.merMod()
/broom::augment_columns()
; I tried it on a whim while trying to make predictions and it just happened to (partially) work.Created on 2023-05-30 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: