Exploratory data analysis popularly known as EDA is a process of performing some initial investigations on the dataset to discover the structure and the content of the given dataset. It is an unavoidable step in the entire journey of data analysis right from the business understanding part to the deployment of the models created.
EDA is where we get the basic understanding of the data in hand which then helps us in the further process of Data Cleaning & Data Preparation.
In this repository I performed EDA on Research Problems: Model to predict the Behavioral Challenges in ASD_Kids(1-18 Years) dataset. This dataset contains the factors involving in developing ASD for children. It consists of:
- A10_Autism_Spectrum_Quotient
- Social_Responsiveness_Scale
- Age_Years
- Qchat_10_Score
- Speech Delay/Language Disorder
- Learning disorder
- Genetic_Disorders
- Depression
- Global developmental delay/intellectual disability
- Social/Behavioural Issues
- Childhood Autism Rating Scale
- Anxiety_disorder
- Sex
- Ethnicity
- Jaundice
- Family_mem_with_ASD
- Who_completed_the_test
After performing EDA I have explored different kinds of classification algorithms and compared the results obtained by them. The algorithms which I have used and their score after 10Fold stratified split are:
- Logistic Regression (96.7%)
- Decision trees(97.34%)
- Support Vector Machines(99.57%)
- Adaboost Classifier(95.6%)
- Random Forest classifier(100%)
- Gradient Boosting(98.54%)
- XGboost classifier(98.3%)
We can see that Random Forest Classifier performed the best among all other classification algorithms getting an accuracy of over 100 percentage followed by Kernel Support Vector Machine with accuracy of 99.5%.
Adaboost Classifier and Logistic regression performed the worst among all other algorithms with accuracies of 95.6% and 97.3% respectively.
Other algorithms performed decently with score between(97.3 -98.3)percentage
As Random Forest Classifier starts by taking feature importance and splits into branches, I have observed that through criterion:'entropy' for splitting Qchat_10_Score feature has got the highest importance and contributes more in deciding the child develops Autism spectrum disorder in future. Children with more Qchat_10_Score got more probability in developing ASD_traits.