-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Separately check whether pyarrow
and cffi
are installed
#6785
base: master
Are you sure you want to change the base?
Conversation
pyarrow
and cffi
are installedpyarrow
and cffi
are installed
pyarrow
and cffi
are installedpyarrow
and cffi
are installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only tests I can think of would require a runner with pyarrow but not cffi installed.
Could you try adding tests that mock cffi
not being available by mocking sys.modules
? I tried that in an environment with scikit-learn
(another optional dependency of lightgbm
) installed and it seemed to work ok:
import sys
from unittest import mock
with mock.patch.dict(sys.modules, {'sklearn': None}):
import lightgbm as lgb
print(lgb.compat.SKLEARN_INSTALLED)
# False
import lightgbm as lgb
print(lgb.compat.SKLEARN_INSTALLED)
# True
We don't have any examples of that in lightgbm
's test suite, but I think it'd be interesting to try.
It appears this does not work. I don't really understand why. |
@jameslamb How would you like to continue here? |
I feel it'd be easy to accidentally undo this work in future refactorings. I will try to find a way to add a test covering this. |
if not PYARROW_INSTALLED: | ||
raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` installed.") | ||
if not (PYARROW_INSTALLED and CFFI_INSTALLED): | ||
raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` and `cffi` installed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise LightGBMError("Cannot init dataframe from Arrow without `pyarrow` and `cffi` installed.") | |
raise LightGBMError("Cannot init Dataset from Arrow without `pyarrow` and `cffi` installed.") |
This really should be Dataset
, not dataframe
... I'll make that change when I push testing changes.
The only tests I can think of would require a runner with
pyarrow
but notcffi
installed.Note that the
LightGBMErrors
will only raise whenpyarrow
is installed, butcffi
is not. Ifpyarrow
is not installed,pa_Table
is a dummy class andisinstance(data, pa_Table)
returnsFalse
.This is a breaking change for users who didn't install
lightgbm[arrow]
, but rather just installedlightgbm
andpyarrow
separately. Even if not intended, they could previously train a model on apyarrow.Table
, as this was converted via to ascipy.sparse.csr_matrix(data)
. The fix is simply to installcffi
or to transform manually withscipy.sparse.csr_matrix
.Still, it is good to inform people that they are not "natively" training from a
pyarrow.Table
, incurring an unnecessary copy.As already suggested in #6782, an alternative would be to raise a warning.