Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for display of error messages and ability to check observed and expected values about performance_hosmer() #578

Open
indenkun opened this issue May 4, 2023 · 0 comments
Labels
Docs 📚 Something to be adressed in docs and/or vignettes Enhancement 💥 Implemented features can be improved or revised

Comments

@indenkun
Copy link

indenkun commented May 4, 2023

For example, when trying to run the Hosmer-Lemeshow goodness-of-fit test
for the following multiple logistic regression analysis, the following
error message is displayed.

data("Titanic")
df <- data.frame(Titanic)
df <- data.frame(Class = rep(df$Class, df$Freq),
                 Sex = rep(df$Sex, df$Freq),
                 Age = rep(df$Age, df$Freq),
                 Survived = rep(df$Survived, df$Freq))
model <- glm(Survived ~ . ,data = df, family = binomial())
library(performance)
performance_hosmer(model)
## Error in cut.default(yhat, breaks = stats::quantile(yhat, probs = seq(0, : 'breaks' are not unique

As the error message indicates, this is because the range of quantiles
cannot be uniquely determined when taking deciles.

However, I think this error message alone is somewhat informational and
it is difficult to understand why the calculation was not completed
correctly.

For example, I think it would be more informative if the following error
message were displayed to indicate that the calculation could not be
performed with the specified n_bins when the quantile range could not be
determined.

cutyhat <- try(
  cut(
    yhat,
    breaks = stats::quantile(yhat, probs = seq(0, 1, 1 / n_bins)),
    include.lowest = TRUE
  ),
  silent = TRUE
)
if("try-error" %in% is(cutyhat))
  insight::format_error("Subgroups cannot be defined with the specified `n_bins`.")

I think this would provide a little information in case of errors.

performance_hosmer(model)
## Error: Subgroups cannot be defined with the specified `n_bins`.

Also, I would like to refer to the observed and expected value
information for each subgroup after the calculation to check for biased
subgroups. For example, so that I can call it up in chisq.test() using
$.

x <- matrix(c(12, 5, 7, 7), ncol = 2)
chisq.test(x)$observed
##      [,1] [,2]
## [1,]   12    7
## [2,]    5    7
chisq.test(x)$expected
##           [,1]     [,2]
## [1,] 10.419355 8.580645
## [2,]  6.580645 5.419355

For example, the following would achieve this.

hoslem <- list(
    chisq = chisq,
    df = n_bins - 2,
    p.value = p.value,
    observed = obs,
    expected = expect
  )
model <- glm(vs ~ wt + mpg, data = mtcars, family = "binomial")
performance_hosmer(model)$observed
##                 
## cutyhat          V1 y
##   [0.0174,0.051]  4 0
##   (0.051,0.0828]  3 0
##   (0.0828,0.158]  3 0
##   (0.158,0.243]   2 1
##   (0.243,0.369]   2 1
##   (0.369,0.52]    2 1
##   (0.52,0.615]    1 2
##   (0.615,0.868]   0 3
##   (0.868,0.982]   1 2
##   (0.982,0.998]   0 4
performance_hosmer(model)$expected
##                 
## cutyhat                  V1       yhat
##   [0.0174,0.051] 3.87984951 0.12015049
##   (0.051,0.0828] 2.77779066 0.22220934
##   (0.0828,0.158] 2.67732615 0.32267385
##   (0.158,0.243]  2.38429554 0.61570446
##   (0.243,0.369]  2.06107259 0.93892741
##   (0.369,0.52]   1.67780793 1.32219207
##   (0.52,0.615]   1.35146089 1.64853911
##   (0.615,0.868]  0.90896607 2.09103393
##   (0.868,0.982]  0.24958571 2.75041429
##   (0.982,0.998]  0.03184495 3.96815505

This observed and expected value information does not have to be
displayed when print.performance_hosmer() is done, but it would be
nice if it could be called out with $.

Also, I think the value in the document should be “An object of class
performance_hosmer with the following values”, not “An object of class
hoslem_test with the following values”.

Of course, I know there are various arguments for the Hosmer-Lemeshow
test, I would appreciate it if you would consider whether or not to
implement each of these, and I don’t care how they are implemented.

If these are as suggested, I’ll do a pull request, if not, that’s fine.

@strengejacke strengejacke added Enhancement 💥 Implemented features can be improved or revised Docs 📚 Something to be adressed in docs and/or vignettes labels Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs 📚 Something to be adressed in docs and/or vignettes Enhancement 💥 Implemented features can be improved or revised
Projects
None yet
Development

No branches or pull requests

2 participants