Add likelihood ratio, score tests and wald tests for GLMs(logistic regression) #15703

karthikkannappan · 2023-08-17T20:28:42Z

Would like to see some model statistics like the likelihood ratio test, score tests and wald tests being reported for logistic regression as well. We report these model statistics for the CoxPH estimator, ( https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/coxph.html#model-statistics ), and there are customers for whom it's useful to have these statistics for logistic regression as well.

[Alternatives]
The likelihood ratio test is easy to compute after the fact, if you build a null model and then use the negative likelihood of that model and the full model with the fitted coefficients. It'll still be nicer to have these model stats as part of the native GLM implementation.

wendycwong · 2023-10-11T20:19:08Z

Please break this into multiple issues.

wendycwong · 2024-02-20T17:40:08Z

Regarding the Wald Test, it is basically the z-score. This is the value before we look up the p-value. I believe we already calculate this when a user set compute_p_values = True. Hence, there is nothing that needs to be done in this case. However, please add documentation on this. You can derive the documentation change from this paragraph:

wendycwong · 2024-02-20T22:58:57Z

These youtube videos are very useful:

https://www.youtube.com/watch?v=TFKbyXAfr1M (Wald Test)
https://www.youtube.com/watch?v=Ck7EChMRQ9o (Score Test)
https://www.youtube.com/watch?v=Tn5y2i_MqQ8 (likelihood ratio test)

wendycwong · 2024-02-20T23:08:28Z

likelihood ratio test

H0: the coefficient of the GLM model is beta_H0.

The task is now to figure out if the beta_H0 in Hypothesis H0 is acceptable. One way to do this is to use the likelihood ratio test.

The likelihood ratio test is LR = 2*(loglikelihood(beta_ML)-loglikelihood(beta_H0) ~ Chi-square with dimension q where q is the number of predictors.

where beta_H0 is a coefficient vector that someone is interested in knowing about. Basically, this test is used to determine if the beta_H0 in the hypothesis H0 is close enough to the maximum likelihood estimate beta.

You can find the loglikelihood(beta_ML) by running our GLM algorithm and set calc_like = true.
For the loglikelihood of beta_H0, you don't have to run the whole GLM model again. You just need to calculate it at beta = beta_H0. Note that the beta in beta_H0 may not be the null model. It can be anything the user wants it to be.

wendycwong · 2024-02-20T23:22:28Z

Wald Test

Again, using the same setup as the likelihood ratio test, we want to know if we can accept the hypothesis H0.

In this case, we want to use the Wald Test, w = transpose(beta_ML - beta_H0)inverse(variance at beta_ML)(beta_ML - beta_H0).

Note that if you set compute_p_value=True, you will have calculated the inverse(variance at beta_ML) at the end of running the GLM model, I think it is the standard error (but I am not 100% sure, please double check). So, this one should be easy to calculate once you have inverse variance.

wendycwong · 2024-02-20T23:55:52Z

Score Test

Using the hypothesis in likelihood ratio again, we want to evaluate the hypothesis H0 this time with Score test.

The score test = transpose(gradient of likelihood at beta_H0)inverse(variance at beta_H0)(gradient of likelihood at beta_H0)

Even though we don't care about beta_ML, you may still need to run glm model to get the dispersion parameter estimation.

Next, you will need to estimate the standard error with beta set to beta_H0 and not the beta_ML. May have to extract the part about estimating the standard error square (which is the variance) in the compute_p_value=True part in the Java backend code.

wendycwong · 2024-02-21T00:50:32Z

Since we are looking into maximum likelihood estimation, we will not allow any regularization in this case.

karthikkannappan added the feature label Aug 17, 2023

wendycwong assigned syzonyuliia Oct 11, 2023

wendycwong assigned wendycwong and unassigned syzonyuliia Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add likelihood ratio, score tests and wald tests for GLMs(logistic regression) #15703

Add likelihood ratio, score tests and wald tests for GLMs(logistic regression) #15703

karthikkannappan commented Aug 17, 2023

wendycwong commented Oct 11, 2023

wendycwong commented Feb 20, 2024 •

edited

Loading

wendycwong commented Feb 20, 2024

wendycwong commented Feb 20, 2024 •

edited by syzonyuliia

Loading

wendycwong commented Feb 20, 2024 •

edited by syzonyuliia

Loading

wendycwong commented Feb 20, 2024 •

edited by syzonyuliia

Loading

wendycwong commented Feb 21, 2024

Add likelihood ratio, score tests and wald tests for GLMs(logistic regression) #15703

Add likelihood ratio, score tests and wald tests for GLMs(logistic regression) #15703

Comments

karthikkannappan commented Aug 17, 2023

wendycwong commented Oct 11, 2023

wendycwong commented Feb 20, 2024 • edited Loading

wendycwong commented Feb 20, 2024

wendycwong commented Feb 20, 2024 • edited by syzonyuliia Loading

wendycwong commented Feb 20, 2024 • edited by syzonyuliia Loading

wendycwong commented Feb 20, 2024 • edited by syzonyuliia Loading

wendycwong commented Feb 21, 2024

wendycwong commented Feb 20, 2024 •

edited

Loading

wendycwong commented Feb 20, 2024 •

edited by syzonyuliia

Loading

wendycwong commented Feb 20, 2024 •

edited by syzonyuliia

Loading

wendycwong commented Feb 20, 2024 •

edited by syzonyuliia

Loading