Skip to content

akshay307/MyProjects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MyProjects

This repository consists of my academic research in data science and hands-on projects using data analytics and simple machine learning implementations using Python.

ACADEMIC RESEARCH PAPERS

  1. Visual Analysis of Grocery Expenses to Interpret Dietary Habits - Data visualization of expenditure on food can simplify decison making for healthier choices and also reduce costs by analyzing diet patterns. The research is conducted on relationships between expenditure on different categories of food, cost of each item, nutritional value and dietary trends. The study targets two important aspects: dietary patterns and monthly expenditure which can generate awareness on food consumption habits, promote healthier choices and also track expenses. The analysis is based on data collected (educational purpose only) from two apartments each uniquely consisting of male and female members. The visualization report mainly focuses on three parts: (1) Individual Expense Trends (2) Expenditure Comparison (3) Annual Intake Trend.

  2. Data Analysis of Airline On-time Performance - Most critical challenge and operational risk for the airline industry is on-time performance of flights. The goal of this research is to perform a statistical analysis on transportation data and determine which feature has the highest impact on airline performance, which particular airline the customer should choose and finally the time and day of the week on which a passenger must travel to avoid flight delays. The analysis will improve scheduling and logistics for airline and allow passengers to make informed decisions before traveling.

PROJECTS COMPLETED FROM DATACAMP

  1. PREDICT CREDIT CARD APPROVAL - Commercial banks receive a lot of applications for credit cards and many of them get rejected for several reasons, like high loan balances, low income levels, or too many inquiries on an individual's credit report. Manually analyzing these applications is mundane, error-prone, and time-consuming so using machine learning I will demonstrate how to automate a credit card approval predictor using sample data.

Dataset- UCI Machine Learning Repository - Credit Card Approval.

  • Import and inspect credit card applications data
  • Pre-process the data by handling missing values and other transformations
  • Split the dataset into train and test sets and further preprocess as required
  • Fit a logistic regression model to the train set
  • Make predictions and evaluate model performance
  • Grid searching and optimizing model performance
  • Finding the best performing model
  1. PREDICT SUPERBOWL WINNER - In this notebook, I have performed a comprehensive analysis of the Android app market by comparing over ten thousand apps in Google Play across different categories. I will demonsstrate how to look for insights in the data and devise strategies to drive growth and retention.

Dataset- scraped and polished from Wikipedia.

  • Import dataset and explore data issues.
  • Analyze combined points for each Super Bowl by visualizing distributions and identify Super Bowls with the highest and lowest scores.
  • Do blowouts translate to lost viewers?
  • Compare viewership and the ad industry over time.
  • Who has the most halftime show appearances?
  • Who performed the most songs in a halftime show?
  1. CLASSIFICATION OF SONG GENRES - Music streaming services categorize music for personalized recommendations based on user's history. In this notebook, I will demonstrate how to look through a given dataset and classify songs as being either 'Hip-Hop' or 'Rock' using direct analysis of the raw audio information in a given song and scoring this raw data on a variety of metrics.

Dataset- Echo Nest

  • Import the data and plot pairwise relationships between continuous variables.
  • Normalize the feature data and perform Principal Component Analysis (PCA) on scaled data.
  • Train a decision tree to perform classification and compare performance with a logistic regression model.
  • Balance the data to improve model bias.
  • Use cross validation (CV) to evaluate model performance.
  1. ANALYZE ANDROID APP MARKET ON GOOGLE PLAY - In this notebook, I have performed a comprehensive analysis of the Android app market by comparing over ten thousand apps in Google Play across different categories. I will demonstrate how to look for insights in the data and devise strategies to drive growth and retention.

Dataset- open source Google play app data and reviews.

  • Import data for google play store apps and reviews and perform data cleaning as required for analysis.
  • Explore app categories and plot distribution of app details to analyze relation between category and price.
  • Filter out "junk" apps and analyze popularity of paid apps vs free apps.
  • Perform sentiment analysis of user reviews.
  1. ANSWER KEY QUESTIONS ON GITHUB HISTORY OF SCALA LANGUAGE - Scala is an open source project With almost 30k commit history over ten years. Open source projects have the advantage that their entire development histories - who made changes, what was changed, code reviews, etc. are publicly available. In this project I will demonstrate how to answer several questions and find out who has had the most influence on Scala's development and who are the experts.

Dataset- Previously mined and extracted from Git and GitHub.

  • Import Scala's real-world project repository data and pre process
  • Merge dataframes and answer following questions.
  • Is the project still actively maintained?
  • Is there camaraderie in the project?
  • What files were changed in the last ten pull requests?
  • Who made the most pull requests to a given file?
  • Who made the last ten pull requests on a given file?
  • Viewing pull requests of specific developers.
  • Visualizing the contributions of each developer.

About

Academic projects in data analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published