- Data Source
- Data Attributes
- Objective
- Task Name
- Steps followed
child_mort: Death of children under 5 years of age per 1000 live births.
exports: Exports of goods and services per capita. Given as a percentage of the GDP per capita.
health: Total health spending per capita. Given as a percentage of GDP per capita.
imports: Imports of goods and services per capita. Given as a percentage of the GDP per capita.
income: Net income per person.
inflation: The measurement of the annual growth rate of the Total GDP.
life_expec: The average number of years a new born child would live if the current mortality patterns remain the same.
total_fer: The number of children that would be born to each woman if the current age-fertility rates remains the same.
gdpp: The GDP per capita. Calculated as the Total GDP divided by the total population.
To group countries using socio-economic and health factors to determine the development status of the country. Follow steps have been followed:
- Loading the Country-data.csv dataset.
- Dropping any non-numeric columns from the dataset.
- Plotting nine different scatter plots with different combinations of variables against GDPP and child_mort. For example, GDPP vs health.
- Note which of these plots looks the most promising for separating into clusters.
- Normalising the dataset using MinMaxScaler from sklearn.
- Finding the optimal number of clusters using the elbow and silhouette score method.
- Fitting the scaled dataset to the optimal number of clusters. Reporting back on the silhouette score of the model.
- Visualising the clusters for the following two groups:
- Child mortality vs GDPP
- Inflation vs GDP
- Labeling the groups of countries in the plots created based on child mortality, GDPP, and inflation using terms such as:
- least developed, developing, and developed.