-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add likelihood ratio, score tests and wald tests for GLMs(logistic regression) #15703
Comments
Please break this into multiple issues. |
Regarding the Wald Test, it is basically the z-score. This is the value before we look up the p-value. I believe we already calculate this when a user set compute_p_values = True. Hence, there is nothing that needs to be done in this case. However, please add documentation on this. You can derive the documentation change from this paragraph: |
These youtube videos are very useful: https://www.youtube.com/watch?v=TFKbyXAfr1M (Wald Test) |
likelihood ratio test H0: the coefficient of the GLM model is beta_H0. The task is now to figure out if the beta_H0 in Hypothesis H0 is acceptable. One way to do this is to use the likelihood ratio test. The likelihood ratio test is LR = 2*(loglikelihood(beta_ML)-loglikelihood(beta_H0) ~ Chi-square with dimension q where q is the number of predictors. where beta_H0 is a coefficient vector that someone is interested in knowing about. Basically, this test is used to determine if the beta_H0 in the hypothesis H0 is close enough to the maximum likelihood estimate beta. You can find the loglikelihood(beta_ML) by running our GLM algorithm and set calc_like = true. |
Wald Test Again, using the same setup as the likelihood ratio test, we want to know if we can accept the hypothesis H0. In this case, we want to use the Wald Test, w = transpose(beta_ML - beta_H0)inverse(variance at beta_ML)(beta_ML - beta_H0). Note that if you set compute_p_value=True, you will have calculated the inverse(variance at beta_ML) at the end of running the GLM model, I think it is the standard error (but I am not 100% sure, please double check). So, this one should be easy to calculate once you have inverse variance. |
Score Test Using the hypothesis in likelihood ratio again, we want to evaluate the hypothesis H0 this time with Score test. The score test = transpose(gradient of likelihood at beta_H0)inverse(variance at beta_H0)(gradient of likelihood at beta_H0) Even though we don't care about beta_ML, you may still need to run glm model to get the dispersion parameter estimation. Next, you will need to estimate the standard error with beta set to beta_H0 and not the beta_ML. May have to extract the part about estimating the standard error square (which is the variance) in the compute_p_value=True part in the Java backend code. |
Since we are looking into maximum likelihood estimation, we will not allow any regularization in this case. |
Would like to see some model statistics like the likelihood ratio test, score tests and wald tests being reported for logistic regression as well. We report these model statistics for the CoxPH estimator, ( https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/coxph.html#model-statistics ), and there are customers for whom it's useful to have these statistics for logistic regression as well.
[Alternatives]
The likelihood ratio test is easy to compute after the fact, if you build a null model and then use the negative likelihood of that model and the full model with the fitted coefficients. It'll still be nicer to have these model stats as part of the native GLM implementation.
The text was updated successfully, but these errors were encountered: