Skip to content

Mooseburger1/Springboard-Data-Science-Immersive

Repository files navigation

Springboard-Data-Science-Immersive

This repository will house all code, data, and files related to my work in the Springboard Data Science Immersive program. The following acts as a table of contents for the whole repository with links to the respective work cited

Capstone 1


Key Skills

  • Web Scraping
  • NLP - Natural Language Processing
  • Time Series Analysis
  • Deep Neural Networks

Custom Sentiment Analysis Library Created to facilitate in Overall Sentiment Analysis on Cryptocurrency News Articles scraped form the web. Used in conjunction with historical price data, the analysis is used in a deep neural network in order to predict future pricing for a crypto coin of interest

Capstone 2


Key Skills

  • Image Processing
  • Video Processing
  • H5 Storage
  • Object Oriented Programming
  • Tensorflow
  • Tensorboard
  • Convolutional Neural Networks
  • Object Detection

Exploring different image preprocessing techniques and methods in order to speed up CNN training. As a positive side effect, the transformation of original full scale data results in a smaller memory expense, both hard drive and RAM.

Clustering Methods


Key Skills

  • K-Means
  • PCA - Principle Component Analysis
  • Elbow Sum of Squares Method

Mini project on customer segmentation and being able to identify different types of customers and then figure out ways to find more of those individuals so you can get more customers! The data comes from John Foreman's book Data Smart. The dataset contains both information on marketing newsletters/e-mail campaigns (e-mail offers sent) and transaction level data from customers (which offer customers responded to and what they bought).

Exploratory Data Analysis' (EDA)


Key Skills

  • Central Limit Theorem
  • Statistical Analysis
  • Data Visualization
  • z-test
  • t-test
  • Margin of Error (MOE)
  • Chi-Squared Test
  • Bootstrap Statistics

Several EDA's performed on varying data categories. Hospital Readmittance performs a statistical analysis on a previously done analysis to critique its validity. Human Temperature EDA uses bootstrap statistics to determine the true average temperature of the human body in both male and females. Racial Discrimination performs a statistical analysis on if race has a meaningful impact on the callback rate of candidates who have submitted resumes to jobs of interest.

Machine Learning Algorithms

Key Skills

  • Logistic Regression
  • Linear Regression
  • Naive Bayes

Performing several Machine Learning Algorithms in miniprojects such as: Labeling an obersvation as either male or female based on height and weight data (Logistic Regression), Regression Price Estimate on Boston Housing data using Linear Regression, and predicting movie reviews with Naive Bayes Models

PYSPARK

Performing several exercises utlitizing MapReduce Pyspark (RDD) with a touch of MLlib

Key Skills

  • Pyspark
  • RDD
  • Spark Dataframes

SQL

Key Skills

  • SQL
  • Time Series Analysis
  • Applied Plotting and Charting

This is a SQL case study as proposed from Mode Analytics at https://modeanalytics.com/. The Jupyter notebook in this repository is a cleaned up verison of the original case study which contains all original SQL queries, and can be found here: https://modeanalytics.com/mooseburger/reports/14cbbb5670b8

JSON

Key Skills

  • JSON Manipulation and Extraction
  • Applied Plotting and Charting

An exercise of data extraction and exploration utilizing a JSON data source

Take Home Data Challenges

Key Skills

  • Full Stack Data Scientist

Relax Challenge - Defining an "adopted user" as a user who has logged into a product on three separate days in at least one seven-day period, identify which factors predict future user adoption. You are given two datasets

  1. A user table ("takehome_users") with data on 12,000 users who signed up for the product in the last two years
  2. A usage summary table ("takehome_user_engagement") that has a row for each day that a user logged into the product.

Ultimate Challenge

  • Part 1 ‐ Exploratory data analysis
  • Part 2 ‐ Experiment and metrics design
  • Part 3 - Predictive Modelling