Different number of trees in CVBooster object between versions 3.3.5 and 4.1.0 #6211

dtararuj · 2023-11-24T10:09:09Z

Description

Hi, I faced a strange issue. I've tried to create model to predict possitive and negative values as an output.
I am using default objectice, and CV with 3 folds.

When I executed code using lgb==3.3.5 I have many more trees in my model file than when I used newer version.

Using newest version i i have one or two tree in the model .txt file, using oldest version many more.

Is it expected behaviour ?
Related to this I have also worse accuracy.

Reproducible example

import numpy as np
import pandas as pd
import lightgbm as lgb
import random
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import OrdinalEncoder
def lgb_mae(preds, train_data):
    y_true = train_data.get_label()
    error = mean_absolute_error(y_true, preds)
    return "mae", error, False


# !pip install lightgbm==3.3.5 --user

print(lgb.__version__)
model_name = f'test_{lgb.__version__}'


n_records = 120
account_id = ['xxx'] * n_records
volume= random.sample(range(-1000, 1000), n_records)
date = pd.date_range(pd.to_datetime('2021-01-01'), periods =n_records, freq = '1W')
var1 = (np.random.uniform(low=0, high=100, size=(n_records,))).round(2)
var2 = (np.random.uniform(low=0.5, high=20, size=(n_records,))).round(2)


df = pd.DataFrame(list(zip(account_id, volume, date, var1, var2)),
               columns =['acc_id', 'volume','date', 'var1','var2'])

df['quarter'] = df.date.dt.quarter
df['week'] = df.date.dt.week
df['month'] = df.date.dt.month

train = df.loc[df['date']<'2023-01-01']
test = df.loc[df['date']>='2023-01-01']

features = df.columns[~df.columns.isin(['volume','acc_id','date'])]


lgb_train = lgb.Dataset(
        train[features], train.loc[:, "volume"])
lgb_test = lgb.Dataset(test[features], test.loc[:, "volume"])

num_iterations = 100 
lgb_fun = lgb_mae
constraints={"var2": -1}
stop_rounds = 50
nfolds = 3

lgb_params = {
        'monotone_constraints': [0,-1,0,0,0],
        'monotone_constraints_method': 'advanced',
        "verbose": -1,
        "metrics": "None",
        "feature_pre_filter": False,
        'learning_rate': 0.170714,
        'num_leaves': 160,
        'max_depth': 18,
        'min_data_in_leaf' : 16,
        'bagging_fraction': 0.979877,
        'feature_fraction': 0.452952,
        'lambda_l1': 0.0105841,
        'lambda_l2': 9.63261e-08,
    }

model_cv = lgb.cv(
        params=lgb_params,
        train_set=lgb_train,
        num_boost_round=num_iterations,
        nfold=nfolds,
        return_cvbooster=True,
        stratified=False,
        feval=lgb_fun,
        callbacks=[
            lgb.early_stopping(stopping_rounds=stop_rounds, verbose=False),
            lgb.log_evaluation(0),
        ],
)
model_1 = model_cv.get("cvbooster")
for i in range(len(model_1.boosters)):
    model_1.boosters[i].save_model(
        f"model_{model_name}_cv{i}.txt"
    )
models = list(model_1.boosters)

The text was updated successfully, but these errors were encountered:

jmoralez · 2023-11-29T19:18:43Z

Hey @dtararuj, thanks for using LightGBM. This is due to #5066. Previously the individual boosters in the CVBooster object would keep all training iterations, regardless of what the best iteration was. So for example if early stopping was performed and the best iteration was 5, previously the boosters had 55 rounds (since you set stopping_rounds=50), which were all saved. Now it's only saving up until the best iteration (5 in this example). If you want to save all of them you can do something like:

for bst in model_1.boosters:
    bst.save_model(
        f'model_{model_name}_cv{i}.txt',
        num_iteration=bst.current_iteration(),
    )

dtararuj · 2023-12-05T11:43:38Z

ok it makes sense, thank you

github-actions · 2024-12-25T00:26:12Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added the question label Nov 25, 2023

dtararuj changed the title ~~Different tree deepth between lgb==3.3.5 and 4.1.0~~ Different tree deepth between lgb==3.3.5 and 4.1.0, bug ? Nov 28, 2023

jmoralez added the awaiting response label Nov 29, 2023

jmoralez changed the title ~~Different tree deepth between lgb==3.3.5 and 4.1.0, bug ?~~ Different number of trees in CVBooster object between versions 3.3.5 and 4.1.0 Nov 29, 2023

dtararuj closed this as completed Dec 5, 2023

github-actions bot removed the awaiting response label Dec 5, 2023

github-actions bot locked as resolved and limited conversation to collaborators Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different number of trees in CVBooster object between versions 3.3.5 and 4.1.0 #6211

Different number of trees in CVBooster object between versions 3.3.5 and 4.1.0 #6211

dtararuj commented Nov 24, 2023 •

edited

Loading

jmoralez commented Nov 29, 2023 •

edited

Loading

dtararuj commented Dec 5, 2023

github-actions bot commented Dec 25, 2024

Different number of trees in CVBooster object between versions 3.3.5 and 4.1.0 #6211

Different number of trees in CVBooster object between versions 3.3.5 and 4.1.0 #6211

Comments

dtararuj commented Nov 24, 2023 • edited Loading

Description

Reproducible example

jmoralez commented Nov 29, 2023 • edited Loading

dtararuj commented Dec 5, 2023

github-actions bot commented Dec 25, 2024

dtararuj commented Nov 24, 2023 •

edited

Loading

jmoralez commented Nov 29, 2023 •

edited

Loading