Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Separately check whether pyarrow and cffi are installed #6785

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mlondschien
Copy link
Contributor

The only tests I can think of would require a runner with pyarrow but not cffi installed.

Note that the LightGBMErrors will only raise when pyarrow is installed, but cffi is not. If pyarrow is not installed, pa_Table is a dummy class and isinstance(data, pa_Table) returns False.

This is a breaking change for users who didn't install lightgbm[arrow], but rather just installed lightgbm and pyarrow separately. Even if not intended, they could previously train a model on a pyarrow.Table, as this was converted via to a scipy.sparse.csr_matrix(data). The fix is simply to install cffi or to transform manually with scipy.sparse.csr_matrix.

Still, it is good to inform people that they are not "natively" training from a pyarrow.Table, incurring an unnecessary copy.

As already suggested in #6782, an alternative would be to raise a warning.

@jameslamb jameslamb changed the title Separately check whether pyarrow and cffi are installed [pyhton-package] Separately check whether pyarrow and cffi are installed Jan 12, 2025
@jameslamb jameslamb changed the title [pyhton-package] Separately check whether pyarrow and cffi are installed [python-package] Separately check whether pyarrow and cffi are installed Jan 12, 2025
@jameslamb jameslamb added the fix label Jan 12, 2025
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only tests I can think of would require a runner with pyarrow but not cffi installed.

Could you try adding tests that mock cffi not being available by mocking sys.modules? I tried that in an environment with scikit-learn (another optional dependency of lightgbm) installed and it seemed to work ok:

import sys
from unittest import mock

with mock.patch.dict(sys.modules, {'sklearn': None}):
    import lightgbm as lgb
    print(lgb.compat.SKLEARN_INSTALLED)
    # False

import lightgbm as lgb
print(lgb.compat.SKLEARN_INSTALLED)
# True

We don't have any examples of that in lightgbm's test suite, but I think it'd be interesting to try.

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
@mlondschien
Copy link
Contributor Author

It appears this does not work. I don't really understand why.

@mlondschien mlondschien requested a review from jameslamb January 16, 2025 09:23
@jameslamb jameslamb mentioned this pull request Jan 23, 2025
30 tasks
@mlondschien
Copy link
Contributor Author

@jameslamb How would you like to continue here?

@jameslamb
Copy link
Collaborator

I feel it'd be easy to accidentally undo this work in future refactorings. I will try to find a way to add a test covering this.

if not PYARROW_INSTALLED:
raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` installed.")
if not (PYARROW_INSTALLED and CFFI_INSTALLED):
raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` and `cffi` installed.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` and `cffi` installed.")
raise LightGBMError("Cannot init Dataset from Arrow without `pyarrow` and `cffi` installed.")

This really should be Dataset, not dataframe... I'll make that change when I push testing changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants