Skip to content

This is a repository that I have created to highlight my expertise, present my projects, and document my journey in Data Analytics / Data science related topics.

Notifications You must be signed in to change notification settings

d-parkin/Data-Analysis-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

Data Analysis Portfolio

Introduction

Hello!

Welcome to my Data Analyst Project Portfolio!

I am a data analyst with experience using real-time data to foster informative decisions based on patterns and trends. I am proficient in Python, SQL, Excel, and PowerBI while currently learning Tableau and polishing off my skills in Python libraries such as Pandas and Matplotlib.

Below, I will showcase projects that display my data analysis skills and my process of extracting, cleaning, analyzing, and visualizing the datasets.

CSULB Student Review's Sentiment Analysis

Summary

The goal of this project was to extract all reviews of professors at California State University Long Beach from 1999-2023 and find out which attributes (grade, quality rating, etc.) correlated to positive or negative student reviews analyzed by each college. After the reviews were collected and the dataset was cleaned, sentiment analysis was performed on each review to classify each review as negative, neutral, or positive. Conclusions and visualizations were created which I will expand in more detail about the complete process below.

After producing line plots and box plots with t-tests, I found that for some colleges (not all our shown below) there is a relationship between COVID and online courses and lower reviews of quality and more negative reviews.

Research Questions

How did students feel about the overall quality and difficulty of online courses during the COVID-19 era (2020-2022)?

Which metrics are most important to a positive review (high quality, low difficulty, etc.)?

Conclusion

Based on my findings I conclude that online learning platforms may not be effective for certain colleges and courses. A business or educational institution may want to focus more resources into improving their user experience and provide better student support for future online learning courses.

Description of image Description of image

Below, the p-values are both less than 0.05 which means I can reject the null hypothesis (there is no relationship between the student's quality ratings before and during online courses) and the more negative reviews during COVID-19 are significantly significant.

TValue | DF | PValue
--- | --- | ---
3.43 | 2231 | 0.001
TValue | DF | PValue
--- | --- | ---
2.36 | 3345 | 0.019

Description of image Description of image

Challenges

The most significant challenge I faced was after collecting all the reviews, I noticed that the same professor would have more than one entry because students either misspelled their name or they taught courses in slightly different departments. To resolve this issue, I combined two string distance metrics (Levenshtein and Jaro-Winkler) to perform name matching (done in Python) and create a mapping table (a CSV manipulated in Excel) that I used to match reviews that pertained to the same professor using their instructorIDs. My mapping table consists of over 250 paired professors.

Description of image

Data Collection and Preparation

To collect the required student review data I used the following API: https://github.com/Nobelz/RateMyProfessorAPI and the Python package Selenium to extract the elements I was looking for. I used the XPath of the element on the page to tell the script what to extract and store all these attributes and instructors in a MySQL database. I then extracted the database into a CSV for cleansing the dataset and managing it in Excel.

Description of image

To prepare my data for creating visualizations using Minitab and PowerBI I needed to organize my data by college, aggregate the dates into years and remove duplicate or misspelled majors (ex. Computer Science and ComputerScience need to be one major). To accomplish this, I created pivot tables in Excel for each college that I could use to create visualizations of the data as seen above.

PowerBI Interactive Dashboard

Below is a dashboard I created that uses a slicer to give insights and analyze snapshots of the data between different dates (top right).

Description of image

About

This is a repository that I have created to highlight my expertise, present my projects, and document my journey in Data Analytics / Data science related topics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published