Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: Tidy random effects from nested mixed effect models #152

Open
phargarten2 opened this issue Aug 30, 2024 · 4 comments
Open

Comments

@phargarten2
Copy link

phargarten2 commented Aug 30, 2024

Thank you for making the broom.mixed package and be able to tidy model results from mixed effect modeling packages.

Recently, I fitted a nested random intercept model and realized that the random parameters are not tidied. All i really want is to have the variance estimates in a nice tibble.

Example:

    library(tibble)
    library(magrittr)
    library(lme4)
#> Loading required package: Matrix
    library(nlme)
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:lme4':
#> 
#>     lmList
    library(broom.mixed)
    
    data("sleepstudy", package="lme4")
    
## Random Intercept Model
lmm1 <- nlme::lme(Reaction ~ Days, random=~ Days|Subject, sleepstudy)

tidy(lmm1, "ran_pars")
#> # A tibble: 4 × 4
#>   effect   group    term                 estimate
#>  <chr>    <chr>    <chr>                   <dbl>
#> 1 ran_pars Subject  sd_(Intercept)        24.7   
#> 2 ran_pars Subject  cor_Days.(Intercept)   0.0656
#> 3 ran_pars Subject  sd_Days                5.92  
#> 4 ran_pars Residual sd_Observation        25.6  

## Crossed effect model
    sleepstudy2 <- sleepstudy %>%
        tibble::add_column(Group = rep(LETTERS[1:10], each = 18))
    
    lmm2 <- nlme::lme(Reaction ~ Days,
                      random = list(Subject = ~ Days, Group = ~ 1),
                      data = sleepstudy2
    )
    
    lmm2
#> Linear mixed-effects model fit by REML
#>   Data: sleepstudy2 
#>   Log-restricted-likelihood: -871.0085
#>   Fixed: Reaction ~ Days 
#> (Intercept)        Days 
#>   252.21968    10.42173 
#> 
#> Random effects:
#>  Formula: ~Days | Subject
#>  Structure: General positive-definite, Log-Cholesky parametrization
#>             StdDev   Corr  
#> (Intercept) 18.30302 (Intr)
#> Days         6.12880 0.048 
#> 
#>  Formula: ~1 | Group %in% Subject
#>         (Intercept) Residual
#> StdDev:    15.84422 25.09344
#> 
#> Number of Observations: 180
#> Number of Groups: 
#>            Subject Group %in% Subject 
#>                 18                 26
    
    #Fixed effects tidy great!
    tidy(lmm2, "fixed")
#> # A tibble: 2 × 7
#>   effect term        estimate std.error    df statistic  p.value
#>   <chr>  <chr>          <dbl>     <dbl> <dbl>     <dbl>    <dbl>
#> 1 fixed  (Intercept)    252.       6.64   153     38.0  7.74e-80
#> 2 fixed  Days            10.4      1.62   153      6.44 1.43e- 9
    
    #Random effects can't be evaluated?
    tidy(lmm2, "ran_pars")
#> Warning in tidy.lme(lmm2, "ran_pars"): ran_pars not yet implemented for
#> multiple levels of nesting
#> # A tibble: 0 × 1
#> # ℹ 1 variable: effect <chr>

Created on 2024-08-30 with reprex v2.1.1

Using VarCorr() it should be possible to extract the relevant components

vc <- VarCorr(lmm2)
vc
#>            Variance        StdDev   Corr  
#> Subject =   pdLogChol(Days)                
(#> Intercept) 335.00060       18.30302 (Intr)
#> Days         37.56219        6.12880 0.048 
#> Group =     pdLogChol(1)                   
#> (Intercept) 251.03921       15.84422       
#> Residual    629.68083       25.09344    

I get an error trying to create this into a tibble.

@phargarten2
Copy link
Author

I also noticed that there was no method to tidy a Variance Components object of a fit from lme4 or nlme. While it may seem not necessary due to the extensive work of tidying an object directly, these functions would be helpful tools working toward tidying a nested random effects model.

Here is a proposed and preliminary solution of a regular random effects model with lme4 and nlme:

### Tidying VarCorr of Mixed Effect Models

#nlme VarCorr object: VarCorr.lme
#lmer VarCorr object: VarCorr.merMod

# vc <- lme4::VarCorr(x)

tidy.VarCorr.lme <- function(
    vc,    
    # effects = c("ran_pars", "fixed"),  #effects are always "ran_pars"
    scales = NULL, 
    conf.int = FALSE, 
    conf.level = 0.95, 
    ...
){
  
  if(any(class(vc) == "VarCorr.lme")){
    #Need to convert nlme object to a proper tibble
    vc <- convert_VarCorr.MerMod_to_VarCorr.lme(vc)
  } 
  
  
  if(conf.int){
    cli::cli_alert_info("No confidence intervals is possible for random effects in a crossed model.")
  }
  
  if(scales == "vcov" & !is.null(scales)){
    ests_random <- as.tibble(vc) %>%
      rename(group = grp, term = var1, estimate_var = vcov) %>%
      mutate(
        term = case_when(
          !is.na(term) & !is.na(var2) ~ paste0("cov_", term, "_", var2),
          group == "Residual" & is.na(term) ~ paste0("var_", "Observation"),
          TRUE ~ paste0("var_", term)
        )
      ) %>%
      mutate(effect = "ran_pars", .before = "group")
    
    total_var <- ests_random %>% filter(is.na(var2)) %>% summarize(s = sum(estimate_var)) %>% pull(s)
    
    ests_random <- ests_random %>%
      mutate(prop_var = case_when(
        !stringr::str_detect(term, "cov_") ~ estimate_var/total_var,
        TRUE ~ NA_real_
      )
      ) %>%
      select(-var2, -sdcor)
    
    #Check
    print(as.tibble(vc))
    
  } else if(scales == "sdcor" | is.null(scales)){
    ests_random <- as.tibble(vc) %>%
      rename(group = grp, term = var1, estimate = sdcor) %>%
      mutate(
        term = case_when(
          !is.na(term) & !is.na(var2) ~ paste0("cor_", term, "_", var2),
          group == "Residual" & is.na(term) ~ paste0("sd_", "Observation"),
          TRUE ~ paste0("sd_", term)
        )
      ) %>%
      mutate(effect = "ran_pars", .before = "group")
    
    #Can estimate covariance if requested with a bit more work...
    
    total_var <- ests_random %>% filter(is.na(var2)) %>% summarize(s = sum(estimate^2)) %>% pull(s)
    
    ests_random <- ests_random %>%
      mutate(prop_var = case_when(
        !str_detect(term, "cor_") ~ estimate^2/total_var,
        TRUE ~ NA_real_
      )
      ) %>%
      select(-var2, -vcov)
    
    #Check
    print(vc)
  }
  
  return(ests_random)
}

#  lmm.lme4 <- lme4::lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
#as.tibble(lme4::VarCorr(lmm.lme4))
# grp      var1        var2    vcov   sdcor
# <chr>    <chr>       <chr>  <dbl>   <dbl>
# 1 Subject  (Intercept) NA    612.   24.7   
# 2 Subject  Days        NA     35.1   5.92  
# 3 Subject  (Intercept) Days    9.60  0.0656
# 4 Residual NA          NA    655.   25.6  


#This function converts VarCorr on a nlme object to a tibble form of an lme4 object
#i.e. VarCorr.merMod -> tibble of VarCorr.lme
#
#'@param vc  A variance-correlation component of nlme::lme object. 
#'@return The tibble version of VarCorr.lme object from lme4 package.
convert_VarCorr.MerMod_to_VarCorr.lme <- function(vc){
  A <- as.matrix(vc)
  A
  row.residual <- str_which("Residual", rownames(A))
  
  ##Unsure how to generalize this
  corr.A <- A[-row.residual, -(1:2), drop = TRUE]
  corr.A
  corr.A[-1]
  t.corr <- tibble(var = names(corr.A), corr = as.vector(corr.A))
  corr <- tibble(
    var1 = t.corr[1, "var", drop = TRUE],
    var2 = t.corr[2, "var", drop = TRUE],
    sdcor = t.corr[2, "corr", drop = TRUE]
  )
  
  tib <-  tibble(grp = NA_character_, var1 = rownames(A), var2 = NA, vcov= A[ ,"Variance"], sdcor = A[ , "StdDev"]) %>%
    bind_rows(corr) %>%
    mutate(
      grp = case_when(
        var1 != "Residual" ~ "Subject",
        var1 == "Residual" ~ "Residual",
        TRUE ~ NA_character_
      ),
      vcov = as.numeric(vcov),
      sdcor = as.numeric(sdcor)
    )
  
  return(tib)
}

data("sleepstudy", package="lme4")
lmm.nlme <- lme(Reaction ~ Days, random=~ Days|Subject, sleepstudy)
# > VarCorr(lmm.nlme)
# Subject = pdLogChol(Days) 
# Variance StdDev    Corr  
# (Intercept) 612.0795 24.740241 (Intr)
# Days         35.0713  5.922103 0.066 
# Residual    654.9424 25.591843       

A <- convert_VarCorr.MerMod_to_VarCorr.lme(VarCorr(lmm.nlme))
tidy.VarCorr.lme(A, scales = "vcov")
tidy.VarCorr.lme(A, scales = "sdcor")
tidy(VarCorr(lmm.nlme))
#Error: No tidy method for objects of class VarCorr.lme
tidy(lmm.nlme)

@phargarten2
Copy link
Author

phargarten2 commented Aug 30, 2024

There is no issue in tidying a crossed random effects model using lme4::lmer(). The tidy.merMod() function tidies the random effects fine. (I added this in case other readers are curious)

library(magrittr); 
library(broom.mixed); 
data("sleepstudy", package="lme4")
sleepstudy2 <- sleepstudy %>%
    tibble::add_column(Group = rep(LETTERS[1:10], each = 18))

#Crossed Random Model 
lmm.lme4 <- lmerTest::lmer(Reaction ~ Days + (Days|Subject) + (1|Group), data = sleepstudy2)
lmm.lme4
#> Linear mixed model fit by REML ['lmerModLmerTest']
#> Formula: Reaction ~ Days + (Days | Subject) + (1 | Group)
#>    Data: sleepstudy2
#> REML criterion at convergence: 1743.496
#> Random effects:
#>  Groups   Name        Std.Dev. Corr
#>  Subject  (Intercept) 25.287       
#>           Days         5.973   0.04
#>  Group    (Intercept)  6.071       
#>  Residual             25.397       
#> Number of obs: 180, groups:  Subject, 18; Group, 10
#> Fixed Effects:
#> (Intercept)         Days  
#>      251.47        10.45

tidy(lmm.lme4)
#> # A tibble: 7 × 8
#>   effect   group    term            estimate std.error statistic    df   p.value
#>   <chr>    <chr>    <chr>              <dbl>     <dbl>     <dbl> <dbl>     <dbl>
#> 1 fixed    <NA>     (Intercept)     251.          7.19     35.0   9.95  9.55e-12
#> 2 fixed    <NA>     Days             10.5         1.56      6.72 16.7   3.96e- 6
#> 3 ran_pars Subject  sd__(Intercept)  25.3        NA        NA    NA    NA       
#> 4 ran_pars Subject  cor__(Intercep…   0.0420     NA        NA    NA    NA       
#> 5 ran_pars Subject  sd__Days          5.97       NA        NA    NA    NA       
#> 6 ran_pars Group    sd__(Intercept)   6.07       NA        NA    NA    NA       
#> 7 ran_pars Residual sd__Observation  25.4        NA        NA    NA    NA
Created on 2024-08-30 with [reprex v2.1.1](https://reprex.tidyverse.org/)


@phargarten2
Copy link
Author

phargarten2 commented Aug 30, 2024

TIDYING NLME CROSSED INTERCEPT MODELS

If we can create a tibble from VarCorr.lme (nlme::lme). that would solve the problem...

Here is some preliminary work in attempt to do just that.

rownames.v <- rownames(vc)
 
subject_row <-  stringr::str_which("Subject = ", rownames.v)
group_row <- stringr::str_which("Group = ", rownames.v)
residual_row <- stringr::str_which("Residual", rownames.v)
 
vc_subject <- vc[(subject_row+1):(group_row-1), ]
vc_group <- vc[(group_row+1):(resiudal_row-1), ]
vc_residual <- vc[residual_row, ]


 tibble(
   effect = "ran_pars",
   subject = vc[rownames.v[1], 1],
   var1 = rownames.v,
   var2 = NA,
   estimate = if(scales == "vcov") vc[, "Variance"] else vc[, "StdDev"]
 ) %>%
   mutate(
     group = case_when(
       var1 == "Residual" ~ "Residual",
       TRUE ~ group
     )

Could we apply the function tidy.VarCorr.lme() for each section of the vc's output?

@bbolker
Copy link
Owner

bbolker commented Oct 28, 2024

Revisiting this. Have started to implement bits of it but still a little bit confused/uncertain about some parts.

  • I don't like the prop_var column; this won't be applicable to all models (e.g. variance components have to be separated by level/term/etc.), would rather leave this to users to do by group_by() + mutate() (providing examples if necessary) rather than risk providing incorrect output
  • these are nested, not crossed models (nlme doesn't do crossed models without some hacking by the user)
  • I will work on/think about how to pull apart multi-part VarCorr() objects (it might actually better to ask to have getVarCov extended upstream, in nlme ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants