Skip to content

Ironhack-data-bcn-oct-2023/lab-machine-learning-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

lab-kaggle-competition

portada

Description

  • Find the best machine learning model and params for a given dataset.

Instructions

Find the Kaggle competition with your cohort name, i.e. diamonds-databcn0722, https://www.kaggle.com/competitions/diamonds-databcn0722/overview

train.csv

    1. Processing/cleaning the dataset: this should be later modularized in functions.
    1. Train a model (fit & predict) with the data in train.csv. This file DOES contain a y (price).

      • Do train, test, split on train.csv if necessary.
      • Choose the best model regarding the metrics. In this case, the lowest RMSE (error).

      2.1. Export the model: we don't want to invest time/RAM resources on training the model again in the future.

test.csv

    1. Apply the same cleaning to test.csv. This files does NOT contain a y (no price column).
    1. We'll apply the already trained model from step 2 to the test.csv file. With this we'll generate a new column with the predicted values.

my_submission.csv

    1. Generate a submission.csv file with only two columns: the ID of the diamond & the predicted price (y).

In other words: use train.csv to generate and save a model. Use test.csv to predict new values. Then generate a DF with ID & predicted.

Deliverables

  • Jupyter notebooks where you show the process you followed to get to your submissions.

  • A slide (.ppt, ipynb, etc) with a summary of the metrics you obtained and the rationale behind it.

    • Why do those params work better than others?

Tips

  • Check often for the df.shape & len of the things your working with
  • Do make sure you ONLY have two columns. When saving the file, you might save an "extra column" (index). So make sure you don't include it. There should only be two columns: id & price
  • Take advantage of the daily submissions. Try at least one today!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published