Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different number of trees in CVBooster object between versions 3.3.5 and 4.1.0 #6211

Closed
dtararuj opened this issue Nov 24, 2023 · 3 comments
Closed
Labels

Comments

@dtararuj
Copy link

dtararuj commented Nov 24, 2023

Description

Hi, I faced a strange issue. I've tried to create model to predict possitive and negative values as an output.
I am using default objectice, and CV with 3 folds.

When I executed code using lgb==3.3.5 I have many more trees in my model file than when I used newer version.

Using newest version i i have one or two tree in the model .txt file, using oldest version many more.

Is it expected behaviour ?
Related to this I have also worse accuracy.

Reproducible example

import numpy as np
import pandas as pd
import lightgbm as lgb
import random
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import OrdinalEncoder
def lgb_mae(preds, train_data):
    y_true = train_data.get_label()
    error = mean_absolute_error(y_true, preds)
    return "mae", error, False


# !pip install lightgbm==3.3.5 --user

print(lgb.__version__)
model_name = f'test_{lgb.__version__}'


n_records = 120
account_id = ['xxx'] * n_records
volume= random.sample(range(-1000, 1000), n_records)
date = pd.date_range(pd.to_datetime('2021-01-01'), periods =n_records, freq = '1W')
var1 = (np.random.uniform(low=0, high=100, size=(n_records,))).round(2)
var2 = (np.random.uniform(low=0.5, high=20, size=(n_records,))).round(2)


df = pd.DataFrame(list(zip(account_id, volume, date, var1, var2)),
               columns =['acc_id', 'volume','date', 'var1','var2'])

df['quarter'] = df.date.dt.quarter
df['week'] = df.date.dt.week
df['month'] = df.date.dt.month

train = df.loc[df['date']<'2023-01-01']
test = df.loc[df['date']>='2023-01-01']

features = df.columns[~df.columns.isin(['volume','acc_id','date'])]


lgb_train = lgb.Dataset(
        train[features], train.loc[:, "volume"])
lgb_test = lgb.Dataset(test[features], test.loc[:, "volume"])

num_iterations = 100 
lgb_fun = lgb_mae
constraints={"var2": -1}
stop_rounds = 50
nfolds = 3

lgb_params = {
        'monotone_constraints': [0,-1,0,0,0],
        'monotone_constraints_method': 'advanced',
        "verbose": -1,
        "metrics": "None",
        "feature_pre_filter": False,
        'learning_rate': 0.170714,
        'num_leaves': 160,
        'max_depth': 18,
        'min_data_in_leaf' : 16,
        'bagging_fraction': 0.979877,
        'feature_fraction': 0.452952,
        'lambda_l1': 0.0105841,
        'lambda_l2': 9.63261e-08,
    }

model_cv = lgb.cv(
        params=lgb_params,
        train_set=lgb_train,
        num_boost_round=num_iterations,
        nfold=nfolds,
        return_cvbooster=True,
        stratified=False,
        feval=lgb_fun,
        callbacks=[
            lgb.early_stopping(stopping_rounds=stop_rounds, verbose=False),
            lgb.log_evaluation(0),
        ],
)
model_1 = model_cv.get("cvbooster")
for i in range(len(model_1.boosters)):
    model_1.boosters[i].save_model(
        f"model_{model_name}_cv{i}.txt"
    )
models = list(model_1.boosters)

@dtararuj dtararuj changed the title Different tree deepth between lgb==3.3.5 and 4.1.0 Different tree deepth between lgb==3.3.5 and 4.1.0, bug ? Nov 28, 2023
@jmoralez
Copy link
Collaborator

jmoralez commented Nov 29, 2023

Hey @dtararuj, thanks for using LightGBM. This is due to #5066. Previously the individual boosters in the CVBooster object would keep all training iterations, regardless of what the best iteration was. So for example if early stopping was performed and the best iteration was 5, previously the boosters had 55 rounds (since you set stopping_rounds=50), which were all saved. Now it's only saving up until the best iteration (5 in this example). If you want to save all of them you can do something like:

for bst in model_1.boosters:
    bst.save_model(
        f'model_{model_name}_cv{i}.txt',
        num_iteration=bst.current_iteration(),
    )

@jmoralez jmoralez changed the title Different tree deepth between lgb==3.3.5 and 4.1.0, bug ? Different number of trees in CVBooster object between versions 3.3.5 and 4.1.0 Nov 29, 2023
@dtararuj dtararuj closed this as completed Dec 5, 2023
@dtararuj
Copy link
Author

dtararuj commented Dec 5, 2023

ok it makes sense, thank you

Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants