This project was created in order to obtain more knowledge in the field for a job interview. It aims to analyze and visualize diverse high school student data using Power BI. The dataset includes various demographic factors, academic metrics, and lifestyle variables, providing valuable insights into the factors influencing student performance.
The data set was downloaded from the following website: https://www.kaggle.com/datasets/dillonmyrick/high-school-student-performance-and-demographics?resource=download . The file will be added as "space_missions.csv".
-
Data Exploration: Dive deep into the dataset to understand the demographics, academic metrics, and lifestyle factors of high school students.
-
Interactive Dashboards: Create interactive dashboards and reports to visualize key insights and trends in student performance.
-
Statistical Analysis: Conduct statistical analysis to identify correlations and relationships between various factors and academic outcomes.
We'll create a variety of charts and visualizations to present the data effectively:
- Area Chart of Average Grades by Study Time: Visualizes the average grades achieved by students based on their weekly study time.
- Stacked Column Chart of Count of Family Relationship Level: Displays the distribution of family relationship levels among students using a stacked column chart.
- Scatter Plot of Grades by Family Support and Absences: Shows the relationship between student grades and absences, with their level of family support as an additional factor.
- Pie Chart with Student Health Percentages: Illustrates the distribution of student health statuses using a pie chart.
- Bar Charts of Average Grades by Family Size, Father's Job, and Parents' Education: Provides separate bar charts to compare average grades based on family size, father's job, and parents' education levels.
- Pie Chart of Different Guardian Percentages: Represents the percentage distribution of different guardians among students using a pie chart.
- Bar Chart of Count of Each Address Type (Rural or Urban): Displays the count of students from rural and urban areas using a bar chart.
- Column Chart of Average Grades by Internet Access: Shows the average grades achieved by students based on their access to the internet at home.
- Area Chart of Average Grades by Travel Time: Visualizes the relationship between student grades and travel time to school using an area chart.
- Pie Chart of Free Time: Illustrates the distribution of free time after school among students using a pie chart.
- 4 Tables representative of Student grades categorized by the respective metrics of each page.
- Microsoft SQL Server Management Studio for data manipulation and analysis.
- PowerBI for data visualization and statistical analysis.
- Obsidian for documentation purposes.
-
school - student's school (binary: "GP" - Gabriel Pereira or "MS" - Mousinho da Silveira)
-
sex - student's sex (binary: "F" - female or "M" - male)
-
age - student's age (numeric: from 15 to 22)
-
address_type - student's home address type (binary: "Urban" or "Rural")
-
family_size - family size (binary: "Less or equal to 3" or "Greater than 3")
-
parent_status - parent's cohabitation status (binary: "Living together" or "Apart")
-
mother_education - mother's education (ordinal: "none", "primary education (4th grade)", "5th to 9th grade", "secondary education" or "higher education")
-
father_education - father's education (ordinal: "none", "primary education (4th grade)", "5th to 9th grade", "secondary education" or "higher education")
-
mother_job - mother's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")
-
father_job - father's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")
-
reason - reason to choose this school (nominal: close to "home", school "reputation", "course" preference or "other")
-
guardian - student's guardian (nominal: "mother", "father" or "other")
-
travel_time - home to school travel time (ordinal: "<15 min.", "15 to 30 min.", "30 min. to 1 hour", or 4 - ">1 hour")
-
study_time - weekly study time (ordinal: 1 - "<2 hours", "2 to 5 hours", "5 to 10 hours", or ">10 hours")
-
class_failures - number of past class failures (numeric: n if 1<=n<3, else 4)
-
school_support - extra educational support (binary: yes or no)
-
family_support - family educational support (binary: yes or no)
-
extra_paid_classes - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
-
activities - extra-curricular activities (binary: yes or no)
-
nursery - attended nursery school (binary: yes or no)
-
higher_ed - wants to take higher education (binary: yes or no)
-
internet - Internet access at home (binary: yes or no)
-
romantic_relationship - with a romantic relationship (binary: yes or no)
-
family_relationship - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
-
free_time - free time after school (numeric: from 1 - very low to 5 - very high)
-
social - going out with friends (numeric: from 1 - very low to 5 - very high)
-
weekday_alcohol - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
-
weekend_alcohol - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
-
health - current health status (numeric: from 1 - very bad to 5 - very good)
-
absences - number of school absences (numeric: from 0 to 93)
Select parent_status, ROUND(AVG(CAST(final_grade as float)),3) as Average_grade
From Math
Group by parent_status
WITH parent_status_counts AS (
SELECT parent_status, COUNT(*) AS Total_count,(COUNT(*) / CAST((SELECT COUNT(*) FROM Math) AS float)) * 100 AS Percentage
FROM
Math
GROUP BY
parent_status
)
SELECT parent_status, Total_count,ROUND(Percentage, 2) AS Percentage
FROM
parent_status_counts
ORDER BY
parent_status
Select address_type, ROUND(AVG(CAST(final_grade as float)),3) as Average_grade
From Math
Group by address_type
SELECT address_type, COUNT(*) AS total_amount, SUM(CASE WHEN internet_access = 1 THEN 1 ELSE 0 END) AS with_internet_access
FROM
Math
GROUP BY
address_type
Select guardian, ROUND(AVG(CAST(final_grade as float)),3) as Average_grade
From Math
Group by guardian
Travel time based on the Address type (since the data isn't clear I created a middle ground for each entry at the cost of result accuracy)
Select address_type,
ROUND(SUM(CASE
WHEN travel_time = '<15 min.' THEN 7
WHEN travel_time = '15 to 30 min.' THEN 22.5
WHEN travel_time = '30 min. to 1 hour' THEN 45
WHEN travel_time = '>1 hour' THEN 75
ELSE 0
END)/ COUNT(*), 2) AS average_travel_time
From Math
Group by address_type
Another method of comparison:
select address_type, travel_time, count(travel_time)
from Math
Group by address_type, travel_time
Order by address_type
Select family_size, ROUND(AVG(CAST(final_grade as float)),3) as Average_grade
From Math
Group by family_size
WITH subquery AS (
SELECT mother_job AS job_type, 'mother' AS parent_type FROM Math
UNION ALL
SELECT father_job AS job_type, 'father' AS parent_type FROM Math
)
SELECT
job_type,
COUNT(CASE WHEN parent_type = 'mother' THEN 1 END) AS mother_job_count,
COUNT(CASE WHEN parent_type = 'father' THEN 1 END) AS father_job_count,
ROUND((COUNT(CASE WHEN parent_type = 'mother' THEN 1 END) / CAST(COUNT(*) AS float) * 100), 2) AS mother_job_percentage,
ROUND((COUNT(CASE WHEN parent_type = 'father' THEN 1 END) / CAST(COUNT(*) AS float) * 100), 2) AS father_job_percentage
FROM subquery
GROUP BY job_type;
Select mother_education, ROUND(AVG(CAST(final_grade as float)),3) as Average_grade
From Math
Group by mother_education
Having mother_education != 'none'
Select father_education, ROUND(AVG(CAST(final_grade as float)),3) as Average_grade
From Math
Group by father_education
Having father_education != 'none'
Select mother_education, ROUND(CAST(SUM(CASE WHEN higher_ed = 1 THEN 1 ELSE 0 END) as FLOAT)/ count(*),2) as higher_ed_percentage
From Math
Group by mother_education
Having mother_education != 'none'
Select father_education, ROUND(CAST(SUM(CASE WHEN higher_ed = 1 THEN 1 ELSE 0 END) as FLOAT)/ count(*),2) as higher_ed_percentage
From Math
Group by father_education
Having father_education != 'none'
Select guardian, ROUND(AVG(CAST(final_grade as float)),3) as Average_grade
From Math
Group by guardian
Select CASE WHEN family_support = 0 THEN 'Has support' ELSE 'No support' END AS Family_Support, ROUND(AVG(CAST(final_grade as float)),3) as Average_grade
From Math
Group by Family_support
SELECT mother_education,
SUM(CASE WHEN school_choice_reason = 'home' THEN 1 ELSE 0 END) AS close_to_home,
SUM(CASE WHEN school_choice_reason = 'reputation' THEN 1 ELSE 0 END) AS school_reputation,
SUM(CASE WHEN school_choice_reason = 'course' THEN 1 ELSE 0 END) AS course_preference,
SUM(CASE WHEN school_choice_reason = 'other' THEN 1 ELSE 0 END) AS other
FROM Math
GROUP BY mother_education
HAVING mother_education != 'none'
ORDER BY mother_education
Select study_time, ROUND(AVG(CAST(final_grade as float)),2) as Average_grade
From Math
Group by study_time
Select CASE
WHEN weekday_alcohol = 1 THEN 'very low'
WHEN weekday_alcohol = 2 THEN 'low'
WHEN weekday_alcohol = 3 THEN 'medium'
WHEN weekday_alcohol = 4 THEN 'high'
WHEN weekday_alcohol = 5 THEN 'very high'
ELSE 'unknown' END AS Alcohol_consumption
,ROUND(AVG(CAST(final_grade as float)),2)
From Math
Group by weekday_alcohol
Select study_time, ROUND(CAST(SUM(weekday_alcohol) as float)/count(*),2) as Alcohol_frequency
From Math
Group by study_time
Select CASE WHEN activities = 0 THEN 'yes' ELSE 'no' END AS Activities, ROUND(CAST(SUM(health) as float)/count(*),2) as Average_health
From Math
Group by activities
Select study_time, ROUND(CAST(SUM(free_time) as float)/count(*),2) as Free_time_average
From Math
Group by study_time
Select CASE
WHEN weekday_alcohol = 1 THEN 'very low'
WHEN weekday_alcohol = 2 THEN 'low'
WHEN weekday_alcohol = 3 THEN 'medium'
WHEN weekday_alcohol = 4 THEN 'high'
WHEN weekday_alcohol = 5 THEN 'very high'
ELSE 'unknown' END AS Alcohol_consumption,
ROUND(AVG(CAST(class_failures as float)),2) as Average_class_failures
From Math
GROUP by weekday_alcohol
Select CASE
WHEN absences < 5 THEN 'very low'
WHEN absences < 10 THEN 'low'
WHEN absences < 15 THEN 'medium'
WHEN absences < 25 THEN 'high'
WHEN absences >= 25 THEN 'very high'
Else 'uknown' END as Absence,
ROUND(AVG(CAST(final_grade as float)),2) as average_grade
From Math
Group by CASE
WHEN absences < 5 THEN 'very low'
WHEN absences < 10 THEN 'low'
WHEN absences < 15 THEN 'medium'
WHEN absences < 25 THEN 'high'
WHEN absences >= 25 THEN 'very high'
Else 'uknown' END
Select CASE WHEN sex = 'F' THEN 'Female' ELSE 'Male' END as Gender, ROUND(AVG(CAST(final_grade as float)),2) as average_grade
From Math
Group by sex
Select CASE WHEN school_support = 0 THEN 'No support' ELSE 'Support' END as School_Support, ROUND(AVG(CAST(final_grade as float)),2) as average_grade
From Math
Group by school_support
Select class_failures, ROUND(AVG(CAST(final_grade as float)),2) as average_grade
From Math
Group by class_failures
Select ROUND(AVG(CAST(grade_1 as float)),2) as First_Quarter, ROUND(AVG(CAST(grade_2 as float)),2) as Second_Quarter, ROUND(AVG(CAST(final_grade as float)),2) as Final_Grade
From Math
Select CASE
WHEN absences < 5 THEN 'very low'
WHEN absences < 10 THEN 'low'
WHEN absences < 15 THEN 'medium'
WHEN absences < 25 THEN 'high'
WHEN absences >= 25 THEN 'very high'
Else 'uknown' END as Absence,
ROUND(AVG(CAST(grade_1 as float)),2) as First_Quarter, ROUND(AVG(CAST(grade_2 as float)),2) as Second_Quarter, ROUND(AVG(CAST(final_grade as float)),2) as Final_Grade, COUNT(*) as Student_Number
From Math
GROUP BY CASE
WHEN absences < 5 THEN 'very low'
WHEN absences < 10 THEN 'low'
WHEN absences < 15 THEN 'medium'
WHEN absences < 25 THEN 'high'
WHEN absences >= 25 THEN 'very high'
Else 'uknown' END
-
Conditional Sorting Columns: Created two conditional sorting columns for travel time and study time to facilitate better visualization and analysis.
-
Data Type Conversion: Changed data types from numerical to string for columns such as student health, family relationship, free time, social, weekday, and weekend alcohol. This adjustment enhances readability and supports more intuitive visualizations.
-
Sorting Column Creation: Generated sorting columns for the aforementioned categorical variables using Power Query. These sorting columns streamline data organization and enable easier navigation within visuals.
-
Text Formatting: Capitalized each word in parent's education and jobs, ensuring consistency and improving visual aesthetics. Additionally, adjusted "at_home" to "At Home" for enhanced readability and clarity in visualizations.
-
Boolean to String Conversion: Converted the "internet access" column from a boolean to a string format. This modification helps align the data representation with visualization requirements and user understanding.
-
Study Time vs. Grades: There's a significant correlation between study time and average grades. The average grades that students with less than 2 hours of study a week get is 10 out of 20 and at 5 to 10 hours about 11.5 out of 20. It seems that the 5+ hours a week mark is what creates the biggest difference in between grades as Students who dedicate more than 5 hours per week to studying tend to achieve higher average grades, surpassing those who study less intensively.
-
Family Relationship Levels: The stacked column chart illustrates varying levels of family relationship among students. A considerable portion of students reports good to excellent relationships, while a smaller percentage indicates poor family relationships.
-
Impact of Family Support and Absences on Grades: The scatter plot highlights the influence of absences on grades, showing a negative correlation as there is not a one single student with more than 25 absences getting higher than 11 out of 20 for a final grade. Conversely, the impact of family support appears inconclusive and requires further investigation.
-
Student Health Distribution: Student health is distributed across various categories, with a notable percentage reporting average to high health. Understanding these health dynamics can inform interventions to support student well-being.
-
Family Size and Parental Education: Smaller families tend to correlate with better student grades, possibly due to increased parental attention. From the same graph we can also see that on average families where the parents are divorced/living apart usually harbour students with better grades. Additionally, parents' higher education levels very significantly impact student performance, with children of highly educated parents achieving better grades.
-
Father's Occupation and Guardian Distribution: Students with fathers employed in teaching exhibit higher average grades, likely benefiting from parental involvement in academic matters. Guardianship is predominantly held by mothers, followed by fathers and then other guardians.
-
Urban vs. Rural Living: The majority of students reside in urban areas, potentially indicating different educational and environmental factors compared to rural counterparts.
-
Internet Access and Travel Time: Students with internet access tend to achieve higher grades, suggesting a correlation between digital resources and academic success. Additionally, shorter travel times to school have a big positive impact on student performance, with most high-performing students having a commute of 15 minutes or less as can be observed by the table next to the Travel Time Graph.
-
Free Time Distribution: Free time distribution among students varies, with a significant portion reporting average to high free time availability. Understanding these dynamics can inform strategies to balance academic workload and leisure activities effectively.
The place a student calls home plays a pivotal role in their academic journey. In our analysis, students residing in Rural Areas achieve an average grade of 9.5 out of 20, whereas their Urban counterparts soar to an average of 10.6. This stark contrast underscores the profound effect of living environments on academic outcomes. Moreover, this disparity could be attributed to factors such as commute time, with students in Urban Areas enjoying shorter travel times to school. The urban locale of the school itself could also contribute, with easier access to educational resources and facilities.
Furthermore, it's essential to consider the broader context. Students in Rural Areas may face challenges beyond academics, including limited internet access—20% lack internet compared to 10% in Urban Areas. These multifaceted dynamics intertwine to shape student performance, emphasizing the need for comprehensive support mechanisms.
The identity of a student's guardian emerges as a significant influencer in academic performance. While students under paternal guardianship achieve an average grade of 10.7, maternal guardianship closely follows at 10.5 out of 20. However, those under 'other' guardianship exhibit a notable dip, averaging at 9.0. This divergence suggests that the guardian's role, likely the primary caregiver, profoundly impacts student outcomes.
This impact may extend beyond academic support, reflecting broader familial dynamics. Students under 'other' guardianship may navigate more challenging family situations, which could adversely affect their educational journey.
Our analysis underscores a compelling correlation between parental education levels and student aspirations. Offspring of highly educated parents demonstrate a staggering 99% inclination towards pursuing higher education. Conversely, students with parents educated up to primary level show a notable decline in aspiration, with 89% expressing a desire for further education. This marked disparity underscores the profound influence of parental educational attainment on shaping students' academic ambitions.
In the culmination of this project, a rich tapestry of insights has been woven, unveiling the intricate dynamics that underpin student performance analysis. Through meticulous examination of socio-demographic, familial, and lifestyle factors, the project has shed light on the multifaceted interplay between students' contexts and their academic outcomes.
From dissecting the rural-urban dichotomy to unraveling the nuances of familial guardianship, the findings underscore the pivotal role of context in shaping educational trajectories. It becomes evident that academic success is not solely a product of individual effort but is deeply intertwined with the environments in which students reside and learn. Factors such as access to resources, familial support structures, and socio-economic conditions emerge as influential determinants that can either propel or hinder educational attainment.
Furthermore, the analysis underscores the enduring impact of parental education on student aspirations, highlighting how the legacies of learning are transmitted across generations. The disparities in aspirations across different levels of parental education serve as poignant reminders of the profound influence of familial dynamics on shaping students' aspirations for the future.
In navigating the complexities of education, it becomes imperative to recognize that student success is intricately linked to broader socio-economic and familial contexts. By delving into these interconnected realms, the project not only offers insights into the drivers of academic achievement but also underscores the need for holistic, multi-dimensional approaches to supporting student success.
In conclusion, the project stands as a testament to the power of data-driven insights in unraveling the mysteries of education. By harnessing the potential of analytics, it embarks on a transformative journey toward fostering a more equitable and inclusive educational landscape—one where every student has the opportunity to thrive, irrespective of their circumstances. As the findings inspire action and advocacy, let them serve as beacons of hope in the pursuit of educational excellence for all.