Hello!
Welcome to my Data Analyst Project Portfolio!
I am a data analyst with experience using real-time data to foster informative decisions based on patterns and trends. I am proficient in Python, SQL, Excel, and PowerBI while currently learning Tableau and polishing off my skills in Python libraries such as Pandas and Matplotlib.
Below, I will showcase projects that display my data analysis skills and my process of extracting, cleaning, analyzing, and visualizing the datasets.
The goal of this project was to extract all reviews of professors at California State University Long Beach from 1999-2023 and find out which attributes (grade, quality rating, etc.) correlated to positive or negative student reviews analyzed by each college. After the reviews were collected and the dataset was cleaned, sentiment analysis was performed on each review to classify each review as negative, neutral, or positive. Conclusions and visualizations were created which I will expand in more detail about the complete process below.
After producing line plots and box plots with t-tests, I found that for some colleges (not all our shown below) there is a relationship between COVID and online courses and lower reviews of quality and more negative reviews.
How did students feel about the overall quality and difficulty of online courses during the COVID-19 era (2020-2022)?
Which metrics are most important to a positive review (high quality, low difficulty, etc.)?
Based on my findings I conclude that online learning platforms may not be effective for certain colleges and courses. A business or educational institution may want to focus more resources into improving their user experience and provide better student support for future online learning courses.
Below, the p-values are both less than 0.05 which means I can reject the null hypothesis (there is no relationship between the student's quality ratings before and during online courses) and the more negative reviews during COVID-19 are significantly significant.
TValue | DF | PValue --- | --- | --- 3.43 | 2231 | 0.001 |
TValue | DF | PValue --- | --- | --- 2.36 | 3345 | 0.019 |
The most significant challenge I faced was after collecting all the reviews, I noticed that the same professor would have more than one entry because students either misspelled their name or they taught courses in slightly different departments. To resolve this issue, I combined two string distance metrics (Levenshtein and Jaro-Winkler) to perform name matching (done in Python) and create a mapping table (a CSV manipulated in Excel) that I used to match reviews that pertained to the same professor using their instructorIDs. My mapping table consists of over 250 paired professors.
To collect the required student review data I used the following API: https://github.com/Nobelz/RateMyProfessorAPI and the Python package Selenium to extract the elements I was looking for. I used the XPath of the element on the page to tell the script what to extract and store all these attributes and instructors in a MySQL database. I then extracted the database into a CSV for cleansing the dataset and managing it in Excel.
To prepare my data for creating visualizations using Minitab and PowerBI I needed to organize my data by college, aggregate the dates into years and remove duplicate or misspelled majors (ex. Computer Science and ComputerScience need to be one major). To accomplish this, I created pivot tables in Excel for each college that I could use to create visualizations of the data as seen above.
Below is a dashboard I created that uses a slicer to give insights and analyze snapshots of the data between different dates (top right).