Skip to content

Alexandre-aksenov/predict-car-prices-using-H2O-tpot

Repository files navigation

This repository presents an example of prediction of car prices using each car's age and technical performance. A baseline linear regression model is compared to 2 models generated by AutoML (H2O, TPOT).

About the dataset.

This dataset is a reduced version of this one: https://www.kaggle.com/datasets/klkwak/toyotacorollacsv

Each row of the table presents the data of a transaction of selling a car: the properties of a car and the amount of transaction.

The 9 features are:

  • 5 numeric : Age, KM, HP, CC (== Cubic capacity), Weight;
  • 3 categorical encoded by numbers: MetColor, Automatic, Doors ;
  • 1 categorical encoded by string: FuelType (values: Petrol, Diesel, CNG)

About the problem. The model accuracy is evaluated using its RMSE on the test set.

Selected models and AutoML instruments.

  • Baseline: linear regression on 3 features 'Age', 'HP', 'Weight' (lin_reg_crossval.ipynb).
  • model selected using H2O (h2o_regression.ipynb)
  • model selected using TPOT (TPOT_regression.ipynb, exported file: TPOT_regression.py). RandomForestRegressor

EDA and the baseline model were run locally in the environment specified in environment.yml.

AutoML model selection was performed on the servers of GoogleColab.

Results.

  • RMSE of the baseline model: 1592$
  • RMSE of the model selected by H20 in 1 minute: 1122$
  • RMSE of the model selected by TPOT in 30 sec (RandomForestRegressor): 1195$

Feedback and additional questions.

All questions about the source code should be adressed to its author Alexandre Aksenov:

About

example use of AutoML for predicting car prices

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published