Introduction:

Exploratory data analysis popularly known as EDA is a process of performing some initial investigations on the dataset to discover the structure and the content of the given dataset. It is an unavoidable step in the entire journey of data analysis right from the business understanding part to the deployment of the models created.

EDA is where we get the basic understanding of the data in hand which then helps us in the further process of Data Cleaning & Data Preparation.

Overview:

In this repository I performed EDA on Research Problems: Model to predict the Behavioral Challenges in ASD_Kids(1-18 Years) dataset. This dataset contains the factors involving in developing ASD for children. It consists of:

A10_Autism_Spectrum_Quotient
Social_Responsiveness_Scale
Age_Years
Qchat_10_Score
Speech Delay/Language Disorder
Learning disorder
Genetic_Disorders
Depression
Global developmental delay/intellectual disability
Social/Behavioural Issues
Childhood Autism Rating Scale
Anxiety_disorder
Sex
Ethnicity
Jaundice
Family_mem_with_ASD
Who_completed_the_test

Results and observations:

After performing EDA I have explored different kinds of classification algorithms and compared the results obtained by them. The algorithms which I have used and their score after 10Fold stratified split are:

Logistic Regression (96.7%)
Decision trees(97.34%)
Support Vector Machines(99.57%)
Adaboost Classifier(95.6%)
Random Forest classifier(100%)
Gradient Boosting(98.54%)
XGboost classifier(98.3%)

We can see that Random Forest Classifier performed the best among all other classification algorithms getting an accuracy of over 100 percentage followed by Kernel Support Vector Machine with accuracy of 99.5%.

Adaboost Classifier and Logistic regression performed the worst among all other algorithms with accuracies of 95.6% and 97.3% respectively.

Other algorithms performed decently with score between(97.3 -98.3)percentage

As Random Forest Classifier starts by taking feature importance and splits into branches, I have observed that through criterion:'entropy' for splitting Qchat_10_Score feature has got the highest importance and contributes more in deciding the child develops Autism spectrum disorder in future. Children with more Qchat_10_Score got more probability in developing ASD_traits.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
EDA_CODE.ipynb		EDA_CODE.ipynb
README.md		README.md
data_csv.csv		data_csv.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction:

Overview:

Results and observations:

About

Releases

Packages

Languages

y656/Data-Analytics-model-on-Behavioural-Challenges-of-ASD-kids

Folders and files

Latest commit

History

Repository files navigation

Introduction:

Overview:

Results and observations:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages