Predicting art purchase and user targeting for a startup

Capstone project for Galvanize

See the slides here!

Vango is an art marketplace where you can get original art from independent artists.

I scoped some challenges to focus on with their leadership team:

What’s the best timing to engage users?
Can we predict such an abstract purchase?
What are purchase habits like?
How much money can we make?

Pipeline process:

![Pipeline](images/Screen Shot 2016-07-10 at 11.07.38 AM.png)

Aggregate event based data from Mixpanel (12GB, 96 events tracking 141 different features) with user based data from Postgres Database (3GB, 104 tables).
Data Exploration and Processing
Feature Engineering trying different engagement metrics
Apply SMOTE to treat imbalance class issue
Logistic Regression to extract log odds impact on purchase
GridSearch Random Forest models to Predict probabilities of purchase per user
Setup automatic script to run every 3 months (time based on EDA discoveries of purchase cycles)

Findings:

What’s the best timing? Now.
Can we predict such an abstract purchase? Yes.
What are purchase habits like? Takes 103 days for first purchase, repeats every 108 days, and 90% of users only make one purchase
How much do they spend? About $250
How much money can we make? Conservatively, +14.3% revenue

![GridSearched Random Forest](images/Screen Shot 2016-07-10 at 11.08.18 AM.png) ![Who should we target](images/Screen Shot 2016-07-10 at 11.08.04 AM.png)

-> With the script, it is now possible for them to target users with best chances of making a purchase, and re-run the predictions at any time without retraining the entire model.

How to execute:

Run mixpanel_export.py script to get jsons from Mixpanel events (specify desired dates)
Restore the database dump with the following command: CREATE DATABASE test_data; psql -d test_data -f /Users/pauloarantes/Drive/galvanize/_capstone/art-project/prod_dump;

If it doesn't work, try: pg_restore --create --clean --if-exists -Fd -j8 --no-owner -Upauloarantes -d test_data /Users/pauloarantes/Drive/galvanize/_capstone/art-project/prod_dump;

Run load_json_postgres to process and load Mixpanel data into the database as JSONB objects
Run prep-queries.py to generate assistant tables on the database to improve querying process time and memory
Run queries.py to query from db and export 2 csvs (purchases.csv and dataset.csv)
Run purchase_cycle_query.py to query from db and export 1 csv (purchase_cycle.csv)
I ran once the models.py to train the best Random Forest Classifier
You should run predict.py every three months or so to predict on new users. The results will be in the "purchase_probs.csv" file with user id, purchase probability and if they already bought or not in reality.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Presentation		Presentation
images		images
Mixpanel JSON EDA.ipynb		Mixpanel JSON EDA.ipynb
README.md		README.md
clean_code_models.ipynb		clean_code_models.ipynb
dataset EDA.ipynb		dataset EDA.ipynb
load_json_postgres.py		load_json_postgres.py
mixpanel_export.py		mixpanel_export.py
models.py		models.py
predict.py		predict.py
prep-queries.py		prep-queries.py
purchase_cycle_query.py		purchase_cycle_query.py
queries.py		queries.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting art purchase and user targeting for a startup

Capstone project for Galvanize

Pipeline process:

Findings:

How to execute:

About

Uh oh!

Releases

Packages

Languages

pauloarantes/art-project

Folders and files

Latest commit

History

Repository files navigation

Predicting art purchase and user targeting for a startup

Capstone project for Galvanize

Pipeline process:

Findings:

How to execute:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages