Skip to content

Latest commit

 

History

History
40 lines (25 loc) · 2.83 KB

README.md

File metadata and controls

40 lines (25 loc) · 2.83 KB

HeartDieaseClassification

1 DATA.

Link: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

This is database of patient about heart disease. This data was taken by University of Switzerland and V.A. Medical Center, Long Beach and Cleveland Clinic Foundation, Hungarian Institute of Cardiology, Budapest.Each of them have different number of samples. Cleveland:303, Hungarian:294,Switzerland:123, and long beach VA:200. All attributes are numeric value. Each database has the same instance format. This databases have 76 features, all published experiments refer to using a subset of 14 of them (age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak , slope, ca, thal, diagnosis). The output is presence of heart disease in the patient from 0 to 4 (5 outputs):Coronary artery (atherosclerotic) heart disease that affects the arteries to the heart ( value 4), Valvular heart disease that affects how the valves function to regulate blood flow in and out of the hear (value 3), Cardiomyopathy that affects how the heart muscle squeezes (value 2), Heart rhythm disturbances (arrhythmias) that affect the electrical conduction (value 1), Absence of heart disease (value 0).

2 DESCRIBING FEATURE

m

3

PREPROCESSING

aaa Repace ? by Nanvalue bbb Remove Nan value ggg Normalization ttt Result

ccc Random forest curve close to the perfect ROC curve have a better performance level than the ones ddd Random forest with limited feature by using feature selection curve close to the perfect ROC curve have a better performance level than the others. eee Kth nearest neighbor with limited feature by using feature extraction curve close to the perfect ROC curve have a better performance level than the others.

CONCLUSION

Using random forests has produced the best performances in test error rate and having true positives/true negatives

n