Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-16312 constrainted glm issues [nocheck] #16317

Merged
merged 14 commits into from
Aug 9, 2024

Conversation

wendycwong
Copy link
Contributor

This PR fixes problems in #16312

I fixed three issues here:

  1. constrained GLM finished before constraint conditions are satisfied. There is a bug in gradient check and I fixed it. However, the constraint conditions are not satisfied at the end of the iteration. This is caused by either line search failure or no progress was made. However, if you compare the results with and without the constraints, the coefficients with constraints are closer to the constraints then the ones from model without constraints.
  2. When specifying beta constraints with only the lower bounds, npe values occurred. I fixed this one by checking to make sure the beta constraint is not null before accessing its elements.
  3. A set of three linear constraints are evaluated to be duplicated or conflicting when the constraints are neither. I fixed this error also because I dropped the last column/rows which are supposed to be for the constant. However, in generating the constraints matrix, I did not include the constant.

Copy link
Contributor

@maurever maurever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wendycwong . I suggest one improvement; otherwise, LGTM.

model = glm(**params)
model.train(x = predictors, y = response, training_frame = train_data)
print(model.coef())
print(glm.getConstraintsInfo(model))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote: no need to check anything. But what about adding an assertion at the end that the model is not None? I suggest adding a try-catch and throwing a clear message instead of letting the test fail (in case of some problem).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the model is None then it would fail in model.train(...). What benefit would an assert have (especially if it would be at the end)? Or did you mean something like:

    model = glm(**params)
+   assert model is not None, "The model is None"
    model.train(x = predictors, y = response, training_frame = train_data)
    print(model.coef())
    print(glm.getConstraintsInfo(model))

?

If so, I don't understand the reason of adding the try except clause.

(Sorry if it is obvious, I haven't properly started the review yet so I might still get it.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of failing with an assert message is to explain why it failed. If the NPE occurs and we know the potential problem, we can explain it in the message and save time in the future. We can also catch only NPE and other errors shown as usual.

I suggest improving the message from "The model is None" to something like "The model training failed with NPE. It could be caused by beta constraints setting."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without my fix, the test will fail. However, after my fix, the NPE is no longer there. However, I did add a test because I know the beta constraints are always satisfied. Hence, I check that the beta constraints are always satisfied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @maurever , she is not comfortable for tests without some kind of assert statement. However, @tomasfryda feels that if the assert statement is too general, it is not very meaningful.

Taking both of your suggestions, I added assert statements that will check and make sure that the beta constraints are satisfied at the end of the model building process.

@wendycwong wendycwong requested a review from maurever July 18, 2024 21:43
maurever
maurever previously approved these changes Jul 31, 2024
Copy link
Contributor

@maurever maurever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wendycwong. Thanks for improving the test. I went through the PR again and found nothing else to fix. I just saw that the Jenkins test has not been passed. But it may be due to infrastructure reasons. Please check that; otherwise, LGTM.

h2o-algos/src/test/java/hex/glm/GLMConstrainedTest.java Outdated Show resolved Hide resolved
@wendycwong wendycwong changed the title GH-16312 constrainted glm issues GH-16312 constrainted glm issues [nocheck] Jul 31, 2024
maurever
maurever previously approved these changes Aug 1, 2024
@wendycwong wendycwong merged commit 17467da into master Aug 9, 2024
67 of 69 checks passed
@wendycwong wendycwong deleted the wendy_gh_16312_constrainted_GLM_issues branch August 9, 2024 17:23
wendycwong added a commit that referenced this pull request Sep 3, 2024
* Continue to double check algo.

* fix bug in gradient update.

* implemented various version of IRLSM

* Found GLM original with gradient magnitude change best

* GH-16312: fix wrong error raised by duplicated/conflicted constraints.

* force beta constraint to be satisfied at the end if it is not.

* GH-16312: add assert check to test per Veronika suggestion.

* GH-16312: fix tests after fixing constrained GLM bugs.

* GH-16312: fixed NPE error in checkCoeffsBounds

* GH-16312: fix test failure.

* remove conflicting constraint tests as we currently do not have the capability to do so right now.
* change dataset path from AWS to local
wendycwong added a commit that referenced this pull request Sep 11, 2024
* Continue to double check algo.

* fix bug in gradient update.

* implemented various version of IRLSM

* Found GLM original with gradient magnitude change best

* GH-16312: fix wrong error raised by duplicated/conflicted constraints.

* force beta constraint to be satisfied at the end if it is not.

* GH-16312: add assert check to test per Veronika suggestion.

* GH-16312: fix tests after fixing constrained GLM bugs.

* GH-16312: fixed NPE error in checkCoeffsBounds

* GH-16312: fix test failure.

* remove conflicting constraint tests as we currently do not have the capability to do so right now.
* change dataset path from AWS to local
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants