[python-package] Accept numpy generators as `random_state` #6174

david-cortes · 2023-11-01T10:44:27Z

This PR adds support for numpy generators as possible inputs for random_state. Generators are the recommended way of drawing random numbers from numpy, while RandomState is deprecated. Lots of other software such as SciPy have moved towards Generator and/or allow both (example: scipy.sparse.random).

jameslamb

Thanks so much for this! I definitely support it, but think some changes are needed in this PR.

Lots of other software such as SciPy have moved towards Generator

In the future, we'd appreciate if claims like this were supported with some evidence. I do see scipy/scipy#11680 from 3 years ago, looks like scipy has been supporting this for a while.

Although I think it'd be more accurate to say they "added support for" Generator than "moved to" it, as scipy still supports RandomState:

https://github.com/scipy/scipy/blob/6ce4aaf4bc5222e3d1355cd8a392e88f45b4f959/scipy/_lib/_util.py#L63-L64

RandomState is deprecated

That may be true, but I don't support raising lightgbm's oldest support numpy version just to support it.

As of this writing that's numpy==1.16.6:

LightGBM/.ci/test-python-oldest.sh

Line 12 in 1600422

'numpy==1.16.6' \

It looks to me that np.random.Generator didn't make it into numpy until 1.17.0:

numpy/numpy#13163

That's why the CI job where we test lightgbm's compatibility with the oldest versions it supports is failing on this PR:

Traceback (most recent call last):
  File "./examples/python-guide/advanced_example.py", line 11, in <module>
    import lightgbm as lgb
  File "/usr/local/lib/python3.6/site-packages/lightgbm/__init__.py", line 13, in <module>
    from .sklearn import LGBMClassifier, LGBMModel, LGBMRanker, LGBMRegressor
  File "/usr/local/lib/python3.6/site-packages/lightgbm/sklearn.py", line 391, in <module>
    class LGBMModel(_LGBMModelBase):
  File "/usr/local/lib/python3.6/site-packages/lightgbm/sklearn.py", line 414, in LGBMModel
    importance_type: str = 'split',
AttributeError: module 'numpy.random' has no attribute 'Generator'

(build link)

Can you please introduce this in a way that's backwards-compatible with versions of numpy that didn't yet have np.random.Generator? That'd mean the following changes:

add a block in compat.py which defines a np_random_Generator within a try-catch block, like this:

LightGBM/python-package/lightgbm/compat.py

Lines 218 to 234 in 1600422

    
           """cpu_count()""" 
        
           try: 
        
               from joblib import cpu_count 
        
               def _LGBMCpuCount(only_physical_cores: bool = True) -> int: 
        
                   return cpu_count(only_physical_cores=only_physical_cores) 
        
           except ImportError: 
        
               try: 
        
                   from psutil import cpu_count 
        
                   def _LGBMCpuCount(only_physical_cores: bool = True) -> int: 
        
                       return cpu_count(logical=not only_physical_cores) or 1 
        
               except ImportError: 
        
                   from multiprocessing import cpu_count 
        
                   def _LGBMCpuCount(only_physical_cores: bool = True) -> int: 
        
                       return cpu_count()

use that compat.np_random_Generator in code here, similar to how compat.pd_DataFrame is used:

LightGBM/python-package/lightgbm/sklearn.py

Lines 769 to 770 in 1600422

    
           if not isinstance(X, (pd_DataFrame, dt_DataTable)): 
        
               _X, _y = _LGBMCheckXY(X, y, accept_sparse=True, force_all_finite=False, ensure_min_samples=2)

put all of the type hints touched in this PR in double quotes, so their evaluation will be deferred to type-checking time and not cause import errors in environments with older versions of numpy

like this:

LightGBM/python-package/lightgbm/basic.py

Line 317 in 1600422

dtype: "np.typing.DTypeLike",

as described in https://mypy.readthedocs.io/en/stable/runtime_troubles.html

jameslamb · 2023-11-07T02:29:26Z

tests/python_package_test/test_sklearn.py

@@ -534,11 +534,12 @@ def test_non_serializable_objects_in_callbacks(tmp_path):
    assert gbm.booster_.attr_set_inside_callback == 40


-def test_random_state_object():
+@pytest.mark.parametrize("rng_constructor", [np.random.RandomState, np.random.default_rng])


forgot to add in my review... thanks very much for adding this to this test!

david-cortes · 2023-11-08T17:21:06Z

Added a compat entry and quoted type hints for running with older numpy.

david-cortes · 2023-11-08T17:44:50Z

I'm not sure why the linter is complaining about seemingly unrelated files. This is what I found in the logs:

python-package/lightgbm/basic.py:805: error: Incompatible return value type (got "tuple[Any, list[str], list[str] | list[int], list[list[Any]]]", expected "tuple[Any, list[str], list[str], list[list[Any]]]")  [return-value]
python-package/lightgbm/basic.py:2939: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:2953: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:4619: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:4633: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:4843: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:4859: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/engine.py:294: error: Incompatible types in assignment (expression has type "list[tuple[str, str, float, bool]] | list[tuple[str, str, float, bool, float]]", variable has type "list[tuple[str, str, float, bool]]")  [assignment]
Found 8 errors in 2 files (checked 9 source files)

None of those files are being modified here.

jameslamb · 2023-11-08T17:51:37Z

Those mypy errors don't actually cause the lint CI job to fail:

LightGBM/.ci/lint-python.sh

Lines 19 to 22 in aeafccf

    
           mypy \ 
        
               --config-file=./python-package/pyproject.toml \ 
        
               ./python-package \ 
        
           || true

That job is failing on this PR for reasons unrelated to this PR, caused by the latest release of {lintr}, which will be fixed once we merge #6180.

jameslamb

Thanks for the changes! This looks good to me. I'll take care of updating it to latest master and merging it once the unrelated CI issues are fixed.

github-actions · 2024-11-13T00:25:46Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

accept numpy generators

de92fcc

david-cortes requested review from guolinke, jameslamb, shiyu1994 and jmoralez as code owners November 1, 2023 10:44

david-cortes added 2 commits November 1, 2023 11:51

fix tests

67143d1

Merge branch 'master' into take_generator

5bde06a

jameslamb added in progress feature awaiting review and removed in progress labels Nov 1, 2023

use python int

122be33

jameslamb requested changes Nov 7, 2023

View reviewed changes

jameslamb reviewed Nov 7, 2023

View reviewed changes

jameslamb added in progress and removed awaiting review labels Nov 7, 2023

david-cortes added 2 commits November 8, 2023 18:20

add compat workaround for older numpy

1af35b4

Merge branch 'master' into take_generator

c6316ca

jameslamb removed the in progress label Nov 8, 2023

jameslamb approved these changes Nov 8, 2023

View reviewed changes

Merge branch 'master' into take_generator

e86776e

jameslamb merged commit 501e6e6 into microsoft:master Nov 9, 2023
39 checks passed

github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Accept numpy generators as `random_state` #6174

[python-package] Accept numpy generators as `random_state` #6174

david-cortes commented Nov 1, 2023

jameslamb left a comment

jameslamb Nov 7, 2023

david-cortes commented Nov 8, 2023

david-cortes commented Nov 8, 2023

jameslamb commented Nov 8, 2023

jameslamb left a comment

github-actions bot commented Nov 13, 2024

	"""cpu_count()"""
	try:
	from joblib import cpu_count

	def _LGBMCpuCount(only_physical_cores: bool = True) -> int:
	return cpu_count(only_physical_cores=only_physical_cores)
	except ImportError:
	try:
	from psutil import cpu_count

	def _LGBMCpuCount(only_physical_cores: bool = True) -> int:
	return cpu_count(logical=not only_physical_cores) or 1
	except ImportError:
	from multiprocessing import cpu_count

	def _LGBMCpuCount(only_physical_cores: bool = True) -> int:
	return cpu_count()

	if not isinstance(X, (pd_DataFrame, dt_DataTable)):
	_X, _y = _LGBMCheckXY(X, y, accept_sparse=True, force_all_finite=False, ensure_min_samples=2)

[python-package] Accept numpy generators as random_state #6174

[python-package] Accept numpy generators as random_state #6174

Conversation

david-cortes commented Nov 1, 2023

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb Nov 7, 2023

Choose a reason for hiding this comment

david-cortes commented Nov 8, 2023

david-cortes commented Nov 8, 2023

jameslamb commented Nov 8, 2023

jameslamb left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 13, 2024

[python-package] Accept numpy generators as `random_state` #6174

[python-package] Accept numpy generators as `random_state` #6174