GitHub - ChiWang03/Various-Data-Science-Projects: A repository that consists some short data science projects I worked on

Please feel free to contact me at [email protected] for any potential opportunities, collaboration or any inquiries of the repositories! Thanks!

Unsupervised Learning

Unsupervised Learning 1
- Done for data 573: Unsupervised Learning
- Hiearchical clustering using different linkage methods on the bank data set. (Continuous Variables)
- Used Mclust and k-means clustering to find clusters in the lots data
- Compared the coded up k-means and built in Mclust for which is the better clustering method for this data based on the nature of the alogrithm and the Rand Index
Unsupervised Learning 2
- Hiearchical clustering using different linkages on HouseVotes84 (mlbench) data, transformed Categorical/Binary variables in the data with Gower's distance
- Perforemd Factor Analysis on ability.cov data
- Using PCA and NMF on hand written digits data to decide how many components are to be kept for digit recognition.

Job Post Key Skills Extraction

Extracting key skills from Data Scientists and Machine Learning Engineer job posts

Scraped data off of indeed using BeautifulSoup
Conducted Topic modelling using ldamallet from gensim and Nonnegative Matrix Factorization and LDA from sklearn.
Very interstingly this project requires a great understanding of the data itself. When we think of key skills of data scientists we think of natural language processing, machine vision, machine learning, etc. These are not unigrams but bigrams or trigrams. This is why preprocessing the data and running topic modelling methods using n-grams is so important in key skills extraction! Enjoy some intersting findings!

Customs Wait Time Analysis

Create Multiple Metrics to optimize customs wait time efficiency
EDA
- Exploratory data analysis on customs wait time
- suggest ways to optimize efficiency for customs boarder patrol agents
Multivariate Outlier Identification
- Using the PyOD libary in python to identify outliers
- two methods used: CBLOF (K-means) Cluster Based Local Outlier Factor and K Nearest Neighbors
Note: comparison part of the outlier removal notebook is slightly messy still needs to be updated.

Plotly Dash Zillow Housing Dashboard

Visualized Zillow's property dataset and the housing dataset from Kaggle (boston property information)
Created a 4 tab dashboard using Plotly Dash for the visualizations
Includes: Geolocation plots of property location, Interactive Volume bar plots, Lasso Coefficients slider plots

Mining Association Rules on the OKcupid data set

This short notebook explores the OkCupid data set by mining association rules and finding latent information about dating profiles

PyTorch Neural Networks

Multiple Notebooks that uses PyTorch to explore Neural Neural Networks.
lab work done for data 586: Advanced Machine Learning
- lab1: Multilayer Perceptron
- lab2: Convolutional Neural Networks
- lab3: Recurrent Neural Networks
- lab4: Stochastic Gradient Descent and Regularization

Various Visualizations

Seaborn and Plotly visualizations for the SierraLeoneAIMS data set.
The Notebook cannot visualize interactive plotly graphs (Links are provided in the notebook for interactive purposes)

Twitter API

A short script that pulls twitter data from the twitter api based on a certain user.
In this case Elon Musk's tweets were pulled.

Introduction to Data Science

This was one of the first pandas EDA I've ever done (done in 2016)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unsupervised Learning

Job Post Key Skills Extraction

Customs Wait Time Analysis

Plotly Dash Zillow Housing Dashboard

Mining Association Rules on the OKcupid data set

PyTorch Neural Networks

Various Visualizations

Twitter API

Introduction to Data Science

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
Association Rules		Association Rules
Customs Wait Time Analysis		Customs Wait Time Analysis
Introduction to Data Science		Introduction to Data Science
Job Post Key Skills Extraction		Job Post Key Skills Extraction
Package Development		Package Development
Plotly Dash Zillow Housing Dashboard		Plotly Dash Zillow Housing Dashboard
Pytorch Neural Networks		Pytorch Neural Networks
Twitter API		Twitter API
Unsupervised Learning		Unsupervised Learning
Various Visualizations		Various Visualizations
web analytics (Revenue Predictions)		web analytics (Revenue Predictions)
README.md		README.md

ChiWang03/Various-Data-Science-Projects

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages