Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add likelihood ratio, score tests and wald tests for GLMs(logistic regression) #15703

Open
karthikkannappan opened this issue Aug 17, 2023 · 7 comments
Assignees
Labels

Comments

@karthikkannappan
Copy link
Member

Would like to see some model statistics like the likelihood ratio test, score tests and wald tests being reported for logistic regression as well. We report these model statistics for the CoxPH estimator, ( https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/coxph.html#model-statistics ), and there are customers for whom it's useful to have these statistics for logistic regression as well.

[Alternatives]
The likelihood ratio test is easy to compute after the fact, if you build a null model and then use the negative likelihood of that model and the full model with the fitted coefficients. It'll still be nicer to have these model stats as part of the native GLM implementation.

@wendycwong
Copy link
Contributor

Please break this into multiple issues.

@wendycwong
Copy link
Contributor

wendycwong commented Feb 20, 2024

Regarding the Wald Test, it is basically the z-score. This is the value before we look up the p-value. I believe we already calculate this when a user set compute_p_values = True. Hence, there is nothing that needs to be done in this case. However, please add documentation on this. You can derive the documentation change from this paragraph:

image

image

@wendycwong
Copy link
Contributor

These youtube videos are very useful:

https://www.youtube.com/watch?v=TFKbyXAfr1M (Wald Test)
https://www.youtube.com/watch?v=Ck7EChMRQ9o (Score Test)
https://www.youtube.com/watch?v=Tn5y2i_MqQ8 (likelihood ratio test)

@wendycwong
Copy link
Contributor

wendycwong commented Feb 20, 2024

likelihood ratio test

H0: the coefficient of the GLM model is beta_H0.

The task is now to figure out if the beta_H0 in Hypothesis H0 is acceptable. One way to do this is to use the likelihood ratio test.

The likelihood ratio test is LR = 2*(loglikelihood(beta_ML)-loglikelihood(beta_H0) ~ Chi-square with dimension q where q is the number of predictors.

where beta_H0 is a coefficient vector that someone is interested in knowing about. Basically, this test is used to determine if the beta_H0 in the hypothesis H0 is close enough to the maximum likelihood estimate beta.

You can find the loglikelihood(beta_ML) by running our GLM algorithm and set calc_like = true.
For the loglikelihood of beta_H0, you don't have to run the whole GLM model again. You just need to calculate it at beta = beta_H0. Note that the beta in beta_H0 may not be the null model. It can be anything the user wants it to be.

@wendycwong
Copy link
Contributor

wendycwong commented Feb 20, 2024

Wald Test

Again, using the same setup as the likelihood ratio test, we want to know if we can accept the hypothesis H0.

In this case, we want to use the Wald Test, w = transpose(beta_ML - beta_H0)inverse(variance at beta_ML)(beta_ML - beta_H0).

Note that if you set compute_p_value=True, you will have calculated the inverse(variance at beta_ML) at the end of running the GLM model, I think it is the standard error (but I am not 100% sure, please double check). So, this one should be easy to calculate once you have inverse variance.

@wendycwong
Copy link
Contributor

wendycwong commented Feb 20, 2024

Score Test

Using the hypothesis in likelihood ratio again, we want to evaluate the hypothesis H0 this time with Score test.

The score test = transpose(gradient of likelihood at beta_H0)inverse(variance at beta_H0)(gradient of likelihood at beta_H0)

Even though we don't care about beta_ML, you may still need to run glm model to get the dispersion parameter estimation.

Next, you will need to estimate the standard error with beta set to beta_H0 and not the beta_ML. May have to extract the part about estimating the standard error square (which is the variance) in the compute_p_value=True part in the Java backend code.

@wendycwong
Copy link
Contributor

Since we are looking into maximum likelihood estimation, we will not allow any regularization in this case.

@wendycwong wendycwong assigned wendycwong and unassigned syzonyuliia Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants