WIP: [python-package] support sub-classing scikit-learn estimators #6783
+277
−125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I recently saw a Stack Overflow post ("Why can't I wrap LGBM?") expressing the same concerns from #4426 ... it's difficult to sub-class
lightgbm
'sscikit-learn
estimators.It doesn't have to be! Look how minimal the code is for
XGBRFRegressor
:https://github.com/dmlc/xgboost/blob/45009413ce9f0d2bdfcd0c9ea8af1e71e3c0a191/python-package/xgboost/sklearn.py#L1869
This PR proposes borrowing some patterns I learned while working on
xgboost
'sscikit-learn
estimators to make it easier to sub-classlightgbm
estimators. This also has the nice side effect of simplifying thelightgbm.dask
code 😁Notes for Reviewers
Why is this labeled "breaking"?
As part of this PR, I'm proposing immediately switching the constructors for
scikit-learn
estimators here (including those inlightgbm.dask
) to only supporting keyword arguments.Why I'm proposing this instead of a deprecation cycle:
scikit-learn
itself does this (HistGradientBoostingClassifier example)I posted a related answer to that Stack Overflow question
https://stackoverflow.com/a/79344862/3986677