Skip to content

A tutorial for Kaggle's Titanic: Machine Learning from Disaster competition.

Notifications You must be signed in to change notification settings

eKariuki-sleepy/Titanic-Dataset

 
 

Repository files navigation

Titanic-Dataset: How to score 0.80861 on the public leaderboard (top10%)

Score in public leaderboard

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

Thus, the goal of this compaetition is to predict if a passenger survived the sinking of the Titanic or not. (Binary classification problem) based on a set of features describing him such as his age, his sex, or his passenger class on the boat.

Metric

Your score is the percentage of passengers you correctly predict. This is known simply as "accuracy”.

Solution

In a form of a jupyter notebook, my solution goes through the basic steps of a data science pipeline:

  • Exploratory data analysis with visualizations
  • Data cleaning
  • Feature engineering
  • Modeling
  • Modelfine-tuning

Note that I have included a script with stacking for information only as it achive lower score.

Competition Website: https://www.kaggle.com/c/titanic

About

A tutorial for Kaggle's Titanic: Machine Learning from Disaster competition.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%