Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cannot get prediction for boosting models #1329

Open
DRMPN opened this issue Sep 1, 2024 · 8 comments · May be fixed by #1320
Open

[Bug]: cannot get prediction for boosting models #1329

DRMPN opened this issue Sep 1, 2024 · 8 comments · May be fixed by #1320
Assignees
Labels
bug Something isn't working core Core logic related to graph optimisation

Comments

@DRMPN
Copy link
Collaborator

DRMPN commented Sep 1, 2024

Expected Behavior

Prediction can be obtained after pipeline fitting for boosting models.

Current Behavior

Cannot get prediction for regression boosting models after fitting.

For xgboostreg:

2024-09-01 13:12:59,554 - PipelineNode - Obtain prediction in pipeline node by operation: xgboostreg
Traceback (most recent call last):
  File "c:\Users\nnikitin-user\Desktop\automl-september\run_xgboost\run_xgboost.py", line 36, in <module>
    prediction = auto_model.predict(features=test)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\main.py", line 297, in predict
    self.prediction = self.data_processor.define_predictions(current_pipeline=self.current_pipeline,
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\api_data.py", line 102,
in define_predictions
    output_prediction = current_pipeline.predict(test_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 285, in predict
    result = self.root_node.predict(input_data=copied_input_data, output_mode=output_mode)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\node.py", line 231, in
predict
    operation_predict = self.operation.predict(fitted_operation=self.fitted_operation,
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 105, in predict
    return self._predict(fitted_operation, data, params, output_mode, is_fit_stage=False)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 133, in _predict
    prediction = self._eval_strategy.predict(
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\boostings.py", line 73, in predict
    prediction = trained_operation.predict(predict_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 65, in predict
    X, _ = self.convert_to_dataframe(input_data, self.params.get('enable_categorical'))
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 95, in convert_to_dataframe
    dataframe['target'] = np.ravel(data.target)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3980, in __setitem__
    self._set_item(key, value)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 4174, in _set_item
    value = self._sanitize_column(value)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 4915, in _sanitize_column
    com.require_length_match(value, self.index)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\common.py", line 571, in require_length_match
    raise ValueError(
ValueError: Length of values (0) does not match length of index (125690)

For lgbmreg:

2024-09-01 20:54:37,325 - PipelineNode - Obtain prediction in pipeline node by operation: lgbmreg
Traceback (most recent call last):
  File "c:\Users\nnikitin-user\Desktop\automl-september\run_lgbm\run_lgbm.py", line 39, in <module>
    prediction = auto_model.predict(features=test)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\main.py", line 297, in predict
    self.prediction = self.data_processor.define_predictions(current_pipeline=self.current_pipeline,
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\api_data.py", line 102,
in define_predictions
    output_prediction = current_pipeline.predict(test_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 285, in predict
    result = self.root_node.predict(input_data=copied_input_data, output_mode=output_mode)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\node.py", line 231, in
predict
    operation_predict = self.operation.predict(fitted_operation=self.fitted_operation,
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 105, in predict
    return self._predict(fitted_operation, data, params, output_mode, is_fit_stage=False)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 133, in _predict
    prediction = self._eval_strategy.predict(
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\boostings.py", line 73, in predict
    prediction = trained_operation.predict(predict_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 202, in predict
    X, _ = self.convert_to_dataframe(input_data, identify_cats=self.params.get('enable_categorical'))
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 240, in convert_to_dataframe
    dataframe['target'] = np.ravel(data.target)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3980, in __setitem__
    self._set_item(key, value)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 4174, in _set_item
    value = self._sanitize_column(value)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 4915, in _sanitize_column
    com.require_length_match(value, self.index)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\common.py", line 571, in require_length_match
    raise ValueError(
ValueError: Length of values (0) does not match length of index (125690)

Possible Solution

At first glance, I thought the hardcoded "target" was a problem. However, when I tested it, the name of the target column didn't matter at all.

Steps to Reproduce

Data can be obtained from: https://www.kaggle.com/competitions/playground-series-s4e9/data
Simple notebook to reproduce the bug:

import pandas as pd
import numpy as np
from fedot.api.main import Fedot
from fedot.core.pipelines.pipeline_builder import PipelineBuilder

train = pd.read_csv("C:/Users/nnikitin-user/Desktop/automl-september/playground-series-s4e9/train.csv")
test = pd.read_csv("C:/Users/nnikitin-user/Desktop/automl-september/playground-series-s4e9/test.csv")
sub = pd.read_csv("C:/Users/nnikitin-user/Desktop/automl-september/playground-series-s4e9/sample_submission.csv")

train.drop(columns=["id"], inplace=True)
test.drop(columns=["id"], inplace=True)

auto_model = Fedot(
    problem="regression",
    metric=["rmse"],
    preset="best_quality",
    with_tuning=True,
    timeout=5,
    cv_folds=10,
    seed=42,
    n_jobs=1,
    logging_level=10,
    initial_assumption=PipelineBuilder().add_node("xgboostreg").build(),
    use_pipelines_cache=False,
    use_auto_preprocessing=False,
)

auto_model.fit(features=train, target="price")

auto_model.current_pipeline.save(
    path="C:/Users/nnikitin-user/Desktop/automl-september/run_xgboost/saved_pipelines",
    create_subdir=True,
    is_datetime_in_path=True,
)

prediction = auto_model.predict(features=test)

sub["price"] = prediction.ravel()
sub.to_csv("submission.csv", index=False)

Context [OPTIONAL]

Participating in a Kaggle competition https://www.kaggle.com/competitions/playground-series-s4e9

@DRMPN DRMPN added bug Something isn't working core Core logic related to graph optimisation labels Sep 1, 2024
@aPovidlo
Copy link
Collaborator

aPovidlo commented Sep 3, 2024

@DRMPN Да, действительно это баг. При классификации не падает в convert_to_dataset видимо из-за того, что во время predict data.target = None. А при регрессии он чем-то заполнен, скорее всего np.array([]), поэтому и падает при попытке записать данные в dataframe для бустинговых моделей. Fix я думаю тут в том, чтобы добавить новое условие сюда:
if data.target is not None or data.target.size != 0:

@DRMPN
Copy link
Collaborator Author

DRMPN commented Sep 7, 2024

@DRMPN Да, действительно это баг. При классификации не падает в convert_to_dataset видимо из-за того, что во время predict data.target = None. А при регрессии он чем-то заполнен, скорее всего np.array([]), поэтому и падает при попытке записать данные в dataframe для бустинговых моделей. Fix я думаю тут в том, чтобы добавить новое условие сюда: if data.target is not None or data.target.size != 0:

Не помогло, проблема где-то еще.

@aPovidlo
Copy link
Collaborator

aPovidlo commented Sep 7, 2024

А ошибка все таже?

@DRMPN
Copy link
Collaborator Author

DRMPN commented Sep 8, 2024

А ошибка все таже?

Да

@aPovidlo
Copy link
Collaborator

aPovidlo commented Sep 8, 2024

У меня получилось если добавить условие с data.target.size > 0. Добавил его не только для XGBoost, но и для LightGBM. Там тоже потенциально может возникнуть такая ошибка

@aPovidlo
Copy link
Collaborator

aPovidlo commented Sep 8, 2024

@DRMPN думаю, что я починил эту проблему в #1320. Запустил код воспроизведения и он отработал штатно в том PR.

@DRMPN
Copy link
Collaborator Author

DRMPN commented Sep 8, 2024

Отлично, тогда я еще раз проверю и закрою, когда ПР будет влит в мастер!

@Lopa10ko Lopa10ko linked a pull request Sep 26, 2024 that will close this issue
@DRMPN DRMPN self-assigned this Oct 16, 2024
@DRMPN
Copy link
Collaborator Author

DRMPN commented Oct 22, 2024

Tested, the bug is fixed in #1320.
Attached log files for the test run: logs.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Core logic related to graph optimisation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants