According to the World Health Organization (WHO) cardiovascular diseases (CVDs) are the global leading cause of death. Every year 17.9 million people (32% among all caused death) die due to heart diseases. In Germany CVDs are causing a total of approximately 40% of all deaths.
Research issue: How well can heart diseases be predicted based on given clinical data?
As heart disease patients are increasing every year, huge amount of medical data is available. By applying data mining techniques on this data, we can detect or predict heart diseases in early stages and support doctors on making smart clinical decisions. This kind of approach is becoming increasingly important as machine learning has a greater and greater role in healthcare.
Dataset available at Kaggle (https://www.kaggle.com/ronitf/heart-disease-uci)
- 303 patients
- 14 attributes
1. Exploring the dataset
- Overview over different informations from the data set
- Correlation matrix
2. Data preprocessing
- Subset selection methods:
- Best Subset, Forward Stepwise, Validation Model
- Pre-process:
- Scaling, Encoding, Splitting
3. (Un-)Supervised learning
- k-means clustering, Logistic Regression, LDA, QDA, k-Nearest neighbor, Support vector machine, Decision trees, Random forest
4. Results/Validation
- Accuracies of all algorithms, Confusion matrix, ROC Curve, k-Fold Cross Validation