WIP: [python-package] support sub-classing scikit-learn estimators #6783

jameslamb · 2025-01-10T06:39:24Z

I recently saw a Stack Overflow post ("Why can't I wrap LGBM?") expressing the same concerns from #4426 ... it's difficult to sub-class lightgbm's scikit-learn estimators.

It doesn't have to be! Look how minimal the code is for XGBRFRegressor:

https://github.com/dmlc/xgboost/blob/45009413ce9f0d2bdfcd0c9ea8af1e71e3c0a191/python-package/xgboost/sklearn.py#L1869

This PR proposes borrowing some patterns I learned while working on xgboost's scikit-learn estimators to make it easier to sub-class lightgbm estimators. This also has the nice side effect of simplifying the lightgbm.dask code 😁

Notes for Reviewers

Why is this labeled "breaking"?

As part of this PR, I'm proposing immediately switching the constructors for scikit-learn estimators here (including those in lightgbm.dask) to only supporting keyword arguments.

Why I'm proposing this instead of a deprecation cycle:

scikit-learn itself does this (HistGradientBoostingClassifier example)
- so all of its machinery passing parameters around as keyword arguments
- keyword arguments are recommended throughout https://scikit-learn.org/stable/developers/develop.html
I strongly suspect that using positional arguments for these constructors is rare
anyone relying on positional arguments will get a loud and easy-to-diagnose-and-fix error, so the effort to adjust should be minimal

import lightgbm as lgb
lgb.LGBMClassifier("gbdt")
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: LGBMClassifier.__init__() takes 1 positional argument but 2 were given

I posted a related answer to that Stack Overflow question

https://stackoverflow.com/a/79344862/3986677

…htGBM into python/sklearn-subclassing

jameslamb added 3 commits January 4, 2025 01:59

[python-package] make sub-classing scikit-learn estimators easier

3b5f648

tests passing

02c48c3

add docs

7b720cb

jameslamb added in progress breaking labels Jan 10, 2025

jameslamb added 3 commits January 10, 2025 00:40

Update tests/python_package_test/test_sklearn.py

51b5e64

remove docs links

81178fd

Merge branch 'python/sklearn-subclassing' of github.com:microsoft/Lig…

110b0e1

…htGBM into python/sklearn-subclassing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: [python-package] support sub-classing scikit-learn estimators #6783

WIP: [python-package] support sub-classing scikit-learn estimators #6783

jameslamb commented Jan 10, 2025 •

edited

Loading

WIP: [python-package] support sub-classing scikit-learn estimators #6783

Are you sure you want to change the base?

WIP: [python-package] support sub-classing scikit-learn estimators #6783

Conversation

jameslamb commented Jan 10, 2025 • edited Loading

Notes for Reviewers

Why is this labeled "breaking"?

I posted a related answer to that Stack Overflow question

jameslamb commented Jan 10, 2025 •

edited

Loading