Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHARMa implementation for new check_residuals() function #643

Merged
merged 78 commits into from
Mar 18, 2024

Conversation

@strengejacke
Copy link
Member Author

@mccarthy-m-g I invited you to get write access, which should allow you to push to this PR, so we can collaboratively work on this.

@codecov

This comment was marked as outdated.

@strengejacke strengejacke self-assigned this Oct 26, 2023
@mccarthy-m-g
Copy link
Collaborator

Thanks! Accepted.

It will probably be a month or two before I can dedicate time to this, but I'll look at the issues you linked above today.

@strengejacke
Copy link
Member Author

I don't think you need to look into all those issues I linked to, that was more a reminder for myself ;-) We should focus on #595, and then I'll check to which extent other issues might be resolved.

@mccarthy-m-g
Copy link
Collaborator

Sounds good, although I already started looking... 😅 (DHARMa doesn't support objects of class "lrm", so you can drop #471 from consideration)

I think the performance::check_model.performance_simres method that will be added could probably address some of the issues indirectly (e.g., #501), but that could be dealt with after this PR is merged.

@strengejacke
Copy link
Member Author

but that could be dealt with after this PR is merged

Agreed.

@strengejacke
Copy link
Member Author

So the basic workflow would be:

  1. out <- simulate_residuals(model)
  2. check_model(out) or check_<whatever>(out)?

Then we wouldn't need particular plot() or print() methods, except for some additional functions (like the suggested check_residuals()).

@mccarthy-m-g
Copy link
Collaborator

So the basic workflow would be:

  1. out <- simulate_residuals(model)
  2. check_model(out) or check_<whatever>(out)?

Then we wouldn't need particular plot() or print() methods, except for some additional functions (like the suggested check_residuals()).

Yes and no. That's the basic workflow, but we'll want a print method for performance_simres and the check_() functions (at least under my vision for how this should get implemented), and we don't want to return the residuals directly in simulate_residuals()

I'll add some rough commits with comments for context.

@mccarthy-m-g
Copy link
Collaborator

If you look at the changes from those commits, particularly the vignette I added, does my rationale for why we don't want to prematurely return residuals in simulate_residuals() make sense?

@strengejacke
Copy link
Member Author

If you look at the changes from those commits, particularly the vignette I added, does my rationale for why we don't want to prematurely return residuals in simulate_residuals() make sense?

I see, yes, makes sense!

I'm note sure how to align the plot() method. For testing purposes, I opened a PR (easystats/see#312), but the plots look rather different.

library(performance)
library(glmmTMB)

model <- glmmTMB(
  count ~ mined + spp + (1 | site),
  family = poisson,
  data = Salamanders
)

res <- simulate_residuals(model)
x <- check_normality(res)
plot(x)

plot(DHARMa::simulateResiduals(model))
#> DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

Created on 2023-10-27 with reprex v2.0.2

@mccarthy-m-g
Copy link
Collaborator

I'm note sure how to align the plot() method. For testing purposes, I opened a PR, but the plots look rather different.

DHARMa uses a uniform distribution, not a normal distribution, for its left plot. CRAN is down right now and I don't have qqplotr installed so I can't share a reprex, but this code should be what we want as the plot method for check_residuals():

library(glmmTMB)

model <- glmmTMB(
  count ~ mined + spp + (1 | site),
  family = poisson,
  data = Salamanders
)

simulated_residuals <- simulate_residuals(model)

plot_simulated_residuals <- function(x) {

  dp <- list(min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
  ggplot2::ggplot(
    tibble::tibble(scaled_residuals = residuals(x)),
    ggplot2::aes(sample = scaled_residuals)
  ) +
    qqplotr::stat_qq_band(distribution = "unif", dparams = list(min = 0, max = 1), alpha = .2) +
    qqplotr::stat_qq_line(distribution = "unif", dparams = dp, size = .8, colour = "#3aaf85") +
    qqplotr::stat_qq_point(distribution = "unif", dparams = dp, size = .5, alpha = .8, colour = "#1b6ca8") +
    ggplot2::labs(
      title = "Uniformity of Residuals",
      subtitle = "Dots should fall along the line",
      x = "Standard Uniform Distribution Quantiles",
      y = "Sample Quantiles"
    ) +
    see::theme_lucid()
}

plot_simulated_residuals(simulated_residuals)

I'm a bit skeptical about having a check_normality() method here too, unless we restrict it to the models DHARMa supports where that would make sense to do (e.g., lm). Otherwise it might be better to just return a message and NULL value pointing the user to the check_residuals() function.

@bwiernik
Copy link
Contributor

I would suggest we do check_residuals(model) as the workflow so that we can have correct titles for the plot instead of "Normality of residuals"

@strengejacke
Copy link
Member Author

Here's the implementation for check_outliers():

library(performance)

# For statistical models ---------------------------------------------
# select only mpg and disp (continuous)
mt1 <- mtcars[, c(1, 3, 4)]
# create some fake outliers and attach outliers to main df
mt2 <- rbind(mt1, data.frame(
  mpg = c(37, 40), disp = c(300, 400),
  hp = c(110, 120)
))
# fit model with outliers
model <- lm(disp ~ mpg + hp, data = mt2)

res <- simulate_residuals(model)
check_outliers(res)
#> # Outliers detection
#> 
#>   Proportion of observed outliers: 2.94%
#>   Proportion of expected outliers: 0.80%, 95% CI [0.07, 15.33]
#> No outliers were detected (p = 0.238).
DHARMa::testOutliers(res, plot = FALSE)
#> 
#>  DHARMa outlier test based on exact binomial test with approximate
#>  expectations
#> 
#> data:  res
#> outliers at both margin(s) = 1, observations = 34, p-value = 0.2381
#> alternative hypothesis: true probability of success is not equal to 0.007968127
#> 95 percent confidence interval:
#>  0.0007443642 0.1532676696
#> sample estimates:
#> frequency of outliers (expected: 0.00796812749003984 ) 
#>                                             0.02941176

DHARMa::testOutliers(res, type = "bootstrap", plot = FALSE)
#> 
#>  DHARMa bootstrapped outlier test
#> 
#> data:  res
#> outliers at both margin(s) = 1, observations = 34, p-value = 0.78
#> alternative hypothesis: two.sided
#>  percent confidence interval:
#>  0.00000000 0.05882353
#> sample estimates:
#> outlier frequency (expected: 0.015 ) 
#>                           0.02941176
check_outliers(res, type = "bootstrap", iterations = 200)
#> # Outliers detection
#> 
#>   Proportion of observed outliers: 2.94%
#>   Proportion of expected outliers: 1.32%, 95% CI [0.00, 5.88]
#> No outliers were detected (p = 0.690).

Created on 2024-03-18 with reprex v2.1.0

@strengejacke
Copy link
Member Author

The more I dive into the DHARMa topic, the more I realize that this approach is quite similar to what we aimed at with check_predictions().

@strengejacke strengejacke merged commit 0f1125c into main Mar 18, 2024
24 of 26 checks passed
@strengejacke strengejacke deleted the strengejacke/issue595 branch March 18, 2024 10:49
This was referenced Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DHARMa implementation for new check_residuals() function
3 participants