One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.
Thus, the goal of this compaetition is to predict if a passenger survived the sinking of the Titanic or not. (Binary classification problem) based on a set of features describing him such as his age, his sex, or his passenger class on the boat.
Your score is the percentage of passengers you correctly predict. This is known simply as "accuracy”.
In a form of a jupyter notebook, my solution goes through the basic steps of a data science pipeline:
- Exploratory data analysis with visualizations
- Data cleaning
- Feature engineering
- Modeling
- Modelfine-tuning
Note that I have included a script with stacking for information only as it achive lower score.
Competition Website: https://www.kaggle.com/c/titanic