DHARMa implementation for new `check_residuals()` function #643

strengejacke · 2023-10-26T17:04:14Z

Fixes #595

I'm not sure, but I think this PR could also be related to following issues:

To do

check_overdispersion() based on simulated_residuals()
plot() for check_overdispersion()

Fixes #595

strengejacke · 2023-10-26T17:14:31Z

@mccarthy-m-g I invited you to get write access, which should allow you to push to this PR, so we can collaboratively work on this.

mccarthy-m-g · 2023-10-26T19:54:02Z

Thanks! Accepted.

It will probably be a month or two before I can dedicate time to this, but I'll look at the issues you linked above today.

strengejacke · 2023-10-26T20:11:44Z

I don't think you need to look into all those issues I linked to, that was more a reminder for myself ;-) We should focus on #595, and then I'll check to which extent other issues might be resolved.

mccarthy-m-g · 2023-10-26T20:23:34Z

Sounds good, although I already started looking... 😅 (DHARMa doesn't support objects of class "lrm", so you can drop #471 from consideration)

I think the performance::check_model.performance_simres method that will be added could probably address some of the issues indirectly (e.g., #501), but that could be dealt with after this PR is merged.

strengejacke · 2023-10-26T21:30:57Z

but that could be dealt with after this PR is merged

Agreed.

strengejacke · 2023-10-26T21:33:16Z

So the basic workflow would be:

out <- simulate_residuals(model)
check_model(out) or check_<whatever>(out)?

Then we wouldn't need particular plot() or print() methods, except for some additional functions (like the suggested check_residuals()).

mccarthy-m-g · 2023-10-26T22:35:49Z

So the basic workflow would be:

out <- simulate_residuals(model)

check_model(out) or check_<whatever>(out)?

Then we wouldn't need particular plot() or print() methods, except for some additional functions (like the suggested check_residuals()).

Yes and no. That's the basic workflow, but we'll want a print method for performance_simres and the check_() functions (at least under my vision for how this should get implemented), and we don't want to return the residuals directly in simulate_residuals()

I'll add some rough commits with comments for context.

…nstration and discussion

mccarthy-m-g · 2023-10-26T23:37:13Z

If you look at the changes from those commits, particularly the vignette I added, does my rationale for why we don't want to prematurely return residuals in simulate_residuals() make sense?

strengejacke · 2023-10-27T09:57:43Z

If you look at the changes from those commits, particularly the vignette I added, does my rationale for why we don't want to prematurely return residuals in simulate_residuals() make sense?

I see, yes, makes sense!

I'm note sure how to align the plot() method. For testing purposes, I opened a PR (easystats/see#312), but the plots look rather different.

library(performance)
library(glmmTMB)

model <- glmmTMB(
  count ~ mined + spp + (1 | site),
  family = poisson,
  data = Salamanders
)

res <- simulate_residuals(model)
x <- check_normality(res)
plot(x)

plot(DHARMa::simulateResiduals(model))
#> DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

^{Created on 2023-10-27 with reprex v2.0.2}

mccarthy-m-g · 2023-10-27T21:00:05Z

I'm note sure how to align the plot() method. For testing purposes, I opened a PR, but the plots look rather different.

DHARMa uses a uniform distribution, not a normal distribution, for its left plot. CRAN is down right now and I don't have qqplotr installed so I can't share a reprex, but this code should be what we want as the plot method for check_residuals():

library(glmmTMB)

model <- glmmTMB(
  count ~ mined + spp + (1 | site),
  family = poisson,
  data = Salamanders
)

simulated_residuals <- simulate_residuals(model)

plot_simulated_residuals <- function(x) {

  dp <- list(min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
  ggplot2::ggplot(
    tibble::tibble(scaled_residuals = residuals(x)),
    ggplot2::aes(sample = scaled_residuals)
  ) +
    qqplotr::stat_qq_band(distribution = "unif", dparams = list(min = 0, max = 1), alpha = .2) +
    qqplotr::stat_qq_line(distribution = "unif", dparams = dp, size = .8, colour = "#3aaf85") +
    qqplotr::stat_qq_point(distribution = "unif", dparams = dp, size = .5, alpha = .8, colour = "#1b6ca8") +
    ggplot2::labs(
      title = "Uniformity of Residuals",
      subtitle = "Dots should fall along the line",
      x = "Standard Uniform Distribution Quantiles",
      y = "Sample Quantiles"
    ) +
    see::theme_lucid()
}

plot_simulated_residuals(simulated_residuals)

I'm a bit skeptical about having a check_normality() method here too, unless we restrict it to the models DHARMa supports where that would make sense to do (e.g., lm). Otherwise it might be better to just return a message and NULL value pointing the user to the check_residuals() function.

bwiernik · 2023-10-28T12:11:08Z

I would suggest we do check_residuals(model) as the workflow so that we can have correct titles for the plot instead of "Normality of residuals"

strengejacke · 2024-03-18T09:18:13Z

Here's the implementation for check_outliers():

library(performance)

# For statistical models ---------------------------------------------
# select only mpg and disp (continuous)
mt1 <- mtcars[, c(1, 3, 4)]
# create some fake outliers and attach outliers to main df
mt2 <- rbind(mt1, data.frame(
  mpg = c(37, 40), disp = c(300, 400),
  hp = c(110, 120)
))
# fit model with outliers
model <- lm(disp ~ mpg + hp, data = mt2)

res <- simulate_residuals(model)
check_outliers(res)
#> # Outliers detection
#> 
#>   Proportion of observed outliers: 2.94%
#>   Proportion of expected outliers: 0.80%, 95% CI [0.07, 15.33]
#> No outliers were detected (p = 0.238).
DHARMa::testOutliers(res, plot = FALSE)
#> 
#>  DHARMa outlier test based on exact binomial test with approximate
#>  expectations
#> 
#> data:  res
#> outliers at both margin(s) = 1, observations = 34, p-value = 0.2381
#> alternative hypothesis: true probability of success is not equal to 0.007968127
#> 95 percent confidence interval:
#>  0.0007443642 0.1532676696
#> sample estimates:
#> frequency of outliers (expected: 0.00796812749003984 ) 
#>                                             0.02941176

DHARMa::testOutliers(res, type = "bootstrap", plot = FALSE)
#> 
#>  DHARMa bootstrapped outlier test
#> 
#> data:  res
#> outliers at both margin(s) = 1, observations = 34, p-value = 0.78
#> alternative hypothesis: two.sided
#>  percent confidence interval:
#>  0.00000000 0.05882353
#> sample estimates:
#> outlier frequency (expected: 0.015 ) 
#>                           0.02941176
check_outliers(res, type = "bootstrap", iterations = 200)
#> # Outliers detection
#> 
#>   Proportion of observed outliers: 2.94%
#>   Proportion of expected outliers: 1.32%, 95% CI [0.00, 5.88]
#> No outliers were detected (p = 0.690).

^{Created on 2024-03-18 with reprex v2.1.0}

strengejacke · 2024-03-18T09:43:22Z

The more I dive into the DHARMa topic, the more I realize that this approach is quite similar to what we aimed at with check_predictions().

DHARMa implementation for new check_residuals() function

137a627

Fixes #595

This comment was marked as outdated.

Sign in to view

strengejacke self-assigned this Oct 26, 2023

draft function

4fc9274

strengejacke assigned mccarthy-m-g Oct 26, 2023

update pkgdown

e38cc08

mccarthy-m-g added 4 commits October 26, 2023 16:17

Don't prematurely return residuals in simulate_residuals()

4c805bf

draft check_residuals()

bb1a2f3

Add (temporary?) vignette for basic simulated residuals workflow demo…

de272d7

…nstration and discussion

clarify why model fit is needed

a6a5afe

strengejacke added 6 commits October 27, 2023 09:43

fix vignette issues

5b96362

lintr

ddc4d6b

RD; update namespace

a51e77a

add method for check_normality

6ffa8b6

add class attr for see::plot

991b483

save necessary info

542ea79

strengejacke and others added 3 commits October 27, 2023 12:03

fix check issues

0e9a2ae

rd

30038c9

Add basic print output for simulate_residuals()

50f1f3d

strengejacke added 8 commits March 16, 2024 21:56

fix

18eed23

check_model for DHARMa and simres

42d20d9

docs

22cb2a6

tweak

d9e22e8

...

825d456

try

4d00962

Fix #696

fe1ee76

fix

a473296

strengejacke mentioned this pull request Mar 17, 2024

Error in performance::check_distribution(): in call bw.SJ() #696

Closed

strengejacke added 9 commits March 17, 2024 11:25

add tests

567f2c1

add tests

24faae5

fix test

c094025

detect over and underdispersion

3d78e36

docs

37e7adc

fix Underdispersion? #263

35b5e19

typo, tests

50735e3

check_outliers for DHARMa

0180abd

fix

276a847

docs

7b7ba1c

strengejacke added 3 commits March 18, 2024 11:16

docs

a6fd86d

add tests

54327de

lintrs, spelling

4127d9d

strengejacke mentioned this pull request Mar 18, 2024

Revising check_model() #697

Open

strengejacke merged commit 0f1125c into main Mar 18, 2024
24 of 26 checks passed

strengejacke deleted the strengejacke/issue595 branch March 18, 2024 10:49

This was referenced Mar 18, 2024

Revising check_model() #698

Open

CRAN release? easystats/see#330

Closed

strengejacke mentioned this pull request Mar 27, 2024

Plotting normality of residuals: Scaling issues / differences easystats/see#335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DHARMa implementation for new `check_residuals()` function #643

DHARMa implementation for new `check_residuals()` function #643

strengejacke commented Oct 26, 2023 •

edited

Loading

strengejacke commented Oct 26, 2023

This comment was marked as outdated.

mccarthy-m-g commented Oct 26, 2023

strengejacke commented Oct 26, 2023

mccarthy-m-g commented Oct 26, 2023

strengejacke commented Oct 26, 2023

strengejacke commented Oct 26, 2023

mccarthy-m-g commented Oct 26, 2023

mccarthy-m-g commented Oct 26, 2023

strengejacke commented Oct 27, 2023

mccarthy-m-g commented Oct 27, 2023

bwiernik commented Oct 28, 2023

strengejacke commented Mar 18, 2024

strengejacke commented Mar 18, 2024

DHARMa implementation for new check_residuals() function #643

DHARMa implementation for new check_residuals() function #643

Conversation

strengejacke commented Oct 26, 2023 • edited Loading

strengejacke commented Oct 26, 2023

This comment was marked as outdated.

mccarthy-m-g commented Oct 26, 2023

strengejacke commented Oct 26, 2023

mccarthy-m-g commented Oct 26, 2023

strengejacke commented Oct 26, 2023

strengejacke commented Oct 26, 2023

mccarthy-m-g commented Oct 26, 2023

mccarthy-m-g commented Oct 26, 2023

strengejacke commented Oct 27, 2023

mccarthy-m-g commented Oct 27, 2023

bwiernik commented Oct 28, 2023

strengejacke commented Mar 18, 2024

strengejacke commented Mar 18, 2024

DHARMa implementation for new `check_residuals()` function #643

DHARMa implementation for new `check_residuals()` function #643

strengejacke commented Oct 26, 2023 •

edited

Loading