project developed for Mathematics in machine learning course @PoliTO.
The aim of this project is to develop a machine learning pipeline to detect the presence or absence of heart disease based on medical patient data. The analysis consisted in the development
of a pipeline composed of the following steps:
• data cleaning
• categorical features handling
• features scaling
• outliers detection and removal
• dimensionality reduction
• resampling technique for unbalance data
• K-fold cross validation for hyperparameters tuning
The models tested for the classification task were KNN, decision tree, random forest and support vector machine. The report propose also mathematical insights about the algorithm used. This project has been graded with full score.