lab-kaggle-competition

Description

Find the best machine learning model and params for a given dataset.

Instructions

Find the Kaggle competition with your cohort name, i.e. diamonds-databcn0722, https://www.kaggle.com/competitions/diamonds-databcn0722/overview

train.csv

1. Processing/cleaning the dataset: this should be later modularized in functions.
1. Train a model (fit & predict) with the data in train.csv. This file DOES contain a y (price).
  - Do train, test, split on train.csv if necessary.
  - Choose the best model regarding the metrics. In this case, the lowest RMSE (error).
  2.1. Export the model: we don't want to invest time/RAM resources on training the model again in the future.

test.csv

1. Apply the same cleaning to test.csv. This files does NOT contain a y (no price column).
1. We'll apply the already trained model from step 2 to the test.csv file. With this we'll generate a new column with the predicted values.

my_submission.csv

1. Generate a submission.csv file with only two columns: the ID of the diamond & the predicted price (y).

In other words: use train.csv to generate and save a model. Use test.csv to predict new values. Then generate a DF with ID & predicted.

Deliverables

Jupyter notebooks where you show the process you followed to get to your submissions.
A slide (.ppt, ipynb, etc) with a summary of the metrics you obtained and the rationale behind it.
- Why do those params work better than others?

Tips

Check often for the df.shape & len of the things your working with
Do make sure you ONLY have two columns. When saving the file, you might save an "extra column" (index). So make sure you don't include it. There should only be two columns: id & price
Take advantage of the daily submissions. Try at least one today!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Guided example		Guided example
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lab-kaggle-competition

Description

Instructions

train.csv

test.csv

my_submission.csv

Deliverables

Tips

About

Releases

Packages

Languages

Ironhack-data-bcn-oct-2023/lab-machine-learning-pipeline

Folders and files

Latest commit

History

Repository files navigation

lab-kaggle-competition

Description

Instructions

train.csv

test.csv

my_submission.csv

Deliverables

Tips

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages