Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Booster.refit() segfaults when label only contains 1 class #6792

Open
jameslamb opened this issue Jan 20, 2025 · 16 comments
Open

Comments

@jameslamb
Copy link
Collaborator

Description

Using the Python package, when fitting a binary classification model with only categorical features and using linear trees, refit() segfaults if label has only 1 class.

Reproducible example

import lightgbm as lgb
import numpy as np

X = np.random.randint(low=0, high=10, size=(1_000, 5))
y = np.ones(shape=(X.shape[0],))

dtrain = lgb.Dataset(
  data=X,
  label=y,
  categorical_feature=list(range(X.shape[1]))
)

bst = lgb.train(
  train_set=dtrain,
  params={
    "objective": "binary",
    "num_leaves": 7,
    "linear_tree": True
  },
  num_boost_round=2
).refit(X, label=y, categorical_feature=list(range(X.shape[1])))

Logs from refit():

[LightGBM] [Warning] Contains only one class
[LightGBM] [Info] Number of positive: 1000, number of negative: 0
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000032 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 55
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 5
Segmentation fault: 11

Training the same model but without linear_tree=True, refit() does not segfault or otherwise raise an exception... it just trains a stub model, as expected.

bst = lgb.train(
  train_set=dtrain,
  params={
    "objective": "binary",
    "num_leaves": 7,
  },
  num_boost_round=2
)
bst.refit(X, label=y, categorical_feature=list(range(X.shape[1])))

Logs from refit():

[LightGBM] [Warning] Contains only one class
[LightGBM] [Info] Number of positive: 1000, number of negative: 0
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000035 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 55
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 5
<lightgbm.basic.Booster object at 0x152a19690>

Tree content:

bst.dump_model()["tree_info"]
[{"tree_index": 0, "num_leaves": 1, "num_cat": 0, "shrinkage": 1, "tree_structure": {"leaf_value": 34.53957599234088, "leaf_count": 1000}}]

Environment info

LightGBM version or commit hash: 226e7f7

Command(s) you used to install LightGBM

cmake -B build -S . -DUSE_OPENMP=OFF
cmake --build build --target _lightgbm -j4
sh build-python.sh install --precompile

Additional Comments

Fitting a model with a single class is maybe not that useful of a case for LightGBM, but still... LightGBM should never do something that results in process-crashing segmentation fault.

And these sorts of situations can occur unintentionally, for example in AutoML systems where random subsets of training data are selected.

@jameslamb jameslamb added the bug label Jan 20, 2025
@jameslamb jameslamb changed the title [python-package] refit() segfaults when label only contains 1 class [python-package] Booster.refit() segfaults when label only contains 1 class Jan 20, 2025
@RMRcreator
Copy link

May I work on this issue?

@jameslamb
Copy link
Collaborator Author

Sure, thanks!

But please note... a solution to this should find the root cause, and might need to fix it in the C/C++ API. We'd want this to be fixed for other interfaces (like the R package) if it is a problem there too.

And in your contribution, please add unit tests that would prevent this from being reintroduced in the future.

@RMRcreator
Copy link

Understood, any tips on how to approach this?

@jameslamb
Copy link
Collaborator Author

Try to run the code from the description, confirm it fails in the same way on your machine.

Then try to figure out where the segfault is happening and why.

@RMRcreator
Copy link

RMRcreator commented Feb 8, 2025

I've tried running it, and I have noticed that it's reporting a value error. I did fork a copy of the repository as it is now and not three weeks ago so I don't know if that might've affected anything.

Image

@jameslamb
Copy link
Collaborator Author

Well that would be a bug too, so you could investigate that.

PIm the future, please post code and logs in text, not images, so others can find them from search and so they can more easily be copied.

@RMRcreator
Copy link

I've run the reproducible example a few times and I'm not seeing the segmentation fault on my machine, just the value error.

@jameslamb
Copy link
Collaborator Author

Sharing all the logs from when you built LightGBM would be helpful. Maybe this problem is compiler or operating-system specific.

Documenting why and where that Value Error happens would be helpful too.

One way you can help us here is to narrow down when and why these things happen.

@RMRcreator
Copy link

This is the log from when I ran the example, from what I can tell the Value Error occurs when .refit is called and leaf_preds.shape is used to initialize nrow and ncol, it's only returning one value instead of two.
2025-02-08 22:55:51,726 - ERROR - Exception occured: not enough values to unpack (expected 2, got 1)
Traceback (most recent call last):
File "C:\Users\rm\PycharmProjects\LightGBM\LightGBM\python-package\lightgbm\class_segfault.py", line 24, in <module> ).refit(X, label=y, categorical_feature=list(range(X.shape[1]))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rm\PycharmProjects\LightGBM\.venv\Lib\site-packages\lightgbm\basic.py", line 4866, in refit nrow, ncol = leaf_preds.shape ^^^^^^^^^^ ValueError: not enough values to unpack (expected 2, got 1)

The line number for leaf_preds.shape on my machine is different from where it is in the basic.py file in the Github repository.

nrow, ncol = leaf_preds.shape

@jameslamb
Copy link
Collaborator Author

jameslamb commented Feb 9, 2025

the log from when I ran the example

I specifically asked for the logs from when you BUILT LightGBM. These would tell us, for example, the operating system and compiler you used.

But....

The line number for leaf_preds.shape on my machine is different from where it is in the basic.py file in the Github repository.

That suggests to me that you did not build the latest version from source and install it. To work on this problem, you need to do that.

@RMRcreator
Copy link

I installed the latest version of lightgbm through Pypi as documented on the installation guide, am I missing a step?

@jameslamb
Copy link
Collaborator Author

Yes, you have to build the project from source if you want to investigate this bug. There are many changes on the master branch that aren't in the latest release on PyPI.

All of the installation steps to reproduce the bug are in the description of this issue, including how to install the development version (see "Command(s) you used to install LightGBM" in the description).

@RMRcreator
Copy link

I've been attempting to install lightgbm the way you did after figuring out what you did wrong, but I've hit a snag where I can't run sh build-python.sh install --precompile because my machine, which is running on Windows 10, doesn't recognize sh, and I'm unsure of how to correct it.

@jameslamb
Copy link
Collaborator Author

Are you using Git for Windows to work with git? If so, run that build-python.sh script in a Git for Windows shell.

Alternatively, you could:

@RMRcreator
Copy link

Ok, I'm trying to use Git for Windows now, but it keeps stopping because it says it can't find the command pip at line 184.

@jameslamb
Copy link
Collaborator Author

it can't find the command pip at line 184

You'll need to install pip and make sure its location is in the PATH environment variable. You mentioned earlier that you had installed lightgbm from PyPI, so I'd expect you had pip already, but maybe you used some other installer like uv?

If you have not used git or shell scripts or pip before and are not comfortable resolving issues like these yourself, I think you're going to find it very challenging to work on this issue. And I'm sorry, but we cannot devote the time to teach you all those things here.

You might want to look for other ways to contribute to LightGBM that don't involve building the project from source, like clarifying or expanding the documentation or improving the warnings / error messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants