-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Accept numpy generators as random_state
#6174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this! I definitely support it, but think some changes are needed in this PR.
Lots of other software such as SciPy have moved towards Generator
In the future, we'd appreciate if claims like this were supported with some evidence. I do see scipy/scipy#11680 from 3 years ago, looks like scipy
has been supporting this for a while.
Although I think it'd be more accurate to say they "added support for" Generator
than "moved to" it, as scipy
still supports RandomState
:
RandomState
is deprecated
That may be true, but I don't support raising lightgbm
's oldest support numpy
version just to support it.
As of this writing that's numpy==1.16.6
:
LightGBM/.ci/test-python-oldest.sh
Line 12 in 1600422
'numpy==1.16.6' \ |
It looks to me that np.random.Generator
didn't make it into numpy
until 1.17.0
:
That's why the CI job where we test lightgbm's
compatibility with the oldest versions it supports is failing on this PR:
Traceback (most recent call last):
File "./examples/python-guide/advanced_example.py", line 11, in <module>
import lightgbm as lgb
File "/usr/local/lib/python3.6/site-packages/lightgbm/__init__.py", line 13, in <module>
from .sklearn import LGBMClassifier, LGBMModel, LGBMRanker, LGBMRegressor
File "/usr/local/lib/python3.6/site-packages/lightgbm/sklearn.py", line 391, in <module>
class LGBMModel(_LGBMModelBase):
File "/usr/local/lib/python3.6/site-packages/lightgbm/sklearn.py", line 414, in LGBMModel
importance_type: str = 'split',
AttributeError: module 'numpy.random' has no attribute 'Generator'
Can you please introduce this in a way that's backwards-compatible with versions of numpy
that didn't yet have np.random.Generator
? That'd mean the following changes:
- add a block in
compat.py
which defines anp_random_Generator
within a try-catch block, like this:
LightGBM/python-package/lightgbm/compat.py
Lines 218 to 234 in 1600422
"""cpu_count()""" | |
try: | |
from joblib import cpu_count | |
def _LGBMCpuCount(only_physical_cores: bool = True) -> int: | |
return cpu_count(only_physical_cores=only_physical_cores) | |
except ImportError: | |
try: | |
from psutil import cpu_count | |
def _LGBMCpuCount(only_physical_cores: bool = True) -> int: | |
return cpu_count(logical=not only_physical_cores) or 1 | |
except ImportError: | |
from multiprocessing import cpu_count | |
def _LGBMCpuCount(only_physical_cores: bool = True) -> int: | |
return cpu_count() |
- use that
compat.np_random_Generator
in code here, similar to howcompat.pd_DataFrame
is used:
LightGBM/python-package/lightgbm/sklearn.py
Lines 769 to 770 in 1600422
if not isinstance(X, (pd_DataFrame, dt_DataTable)): | |
_X, _y = _LGBMCheckXY(X, y, accept_sparse=True, force_all_finite=False, ensure_min_samples=2) |
- put all of the type hints touched in this PR in double quotes, so their evaluation will be deferred to type-checking time and not cause import errors in environments with older versions of
numpy
like this:
LightGBM/python-package/lightgbm/basic.py
Line 317 in 1600422
dtype: "np.typing.DTypeLike", |
as described in https://mypy.readthedocs.io/en/stable/runtime_troubles.html
@@ -534,11 +534,12 @@ def test_non_serializable_objects_in_callbacks(tmp_path): | |||
assert gbm.booster_.attr_set_inside_callback == 40 | |||
|
|||
|
|||
def test_random_state_object(): | |||
@pytest.mark.parametrize("rng_constructor", [np.random.RandomState, np.random.default_rng]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to add in my review... thanks very much for adding this to this test!
Added a compat entry and quoted type hints for running with older numpy. |
I'm not sure why the linter is complaining about seemingly unrelated files. This is what I found in the logs:
None of those files are being modified here. |
Those Lines 19 to 22 in aeafccf
That job is failing on this PR for reasons unrelated to this PR, caused by the latest release of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! This looks good to me. I'll take care of updating it to latest master
and merging it once the unrelated CI issues are fixed.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This PR adds support for numpy generators as possible inputs for
random_state
. Generators are the recommended way of drawing random numbers from numpy, whileRandomState
is deprecated. Lots of other software such as SciPy have moved towardsGenerator
and/or allow both (example:scipy.sparse.random
).