Vango is an art marketplace where you can get original art from independent artists.
I scoped some challenges to focus on with their leadership team:
- What’s the best timing to engage users?
- Can we predict such an abstract purchase?
- What are purchase habits like?
- How much money can we make?
![Pipeline](images/Screen Shot 2016-07-10 at 11.07.38 AM.png)
- Aggregate event based data from Mixpanel (12GB, 96 events tracking 141 different features) with user based data from Postgres Database (3GB, 104 tables).
- Data Exploration and Processing
- Feature Engineering trying different engagement metrics
- Apply SMOTE to treat imbalance class issue
- Logistic Regression to extract log odds impact on purchase
- GridSearch Random Forest models to Predict probabilities of purchase per user
- Setup automatic script to run every 3 months (time based on EDA discoveries of purchase cycles)
- What’s the best timing? Now.
- Can we predict such an abstract purchase? Yes.
- What are purchase habits like? Takes 103 days for first purchase, repeats every 108 days, and 90% of users only make one purchase
- How much do they spend? About $250
- How much money can we make? Conservatively, +14.3% revenue
![GridSearched Random Forest](images/Screen Shot 2016-07-10 at 11.08.18 AM.png) ![Who should we target](images/Screen Shot 2016-07-10 at 11.08.04 AM.png)
-> With the script, it is now possible for them to target users with best chances of making a purchase, and re-run the predictions at any time without retraining the entire model.
-
Run mixpanel_export.py script to get jsons from Mixpanel events (specify desired dates)
-
Restore the database dump with the following command:
CREATE DATABASE test_data; psql -d test_data -f /Users/pauloarantes/Drive/galvanize/_capstone/art-project/prod_dump;
If it doesn't work, try:
pg_restore --create --clean --if-exists -Fd -j8 --no-owner -Upauloarantes -d test_data /Users/pauloarantes/Drive/galvanize/_capstone/art-project/prod_dump;
-
Run load_json_postgres to process and load Mixpanel data into the database as JSONB objects
-
Run prep-queries.py to generate assistant tables on the database to improve querying process time and memory
-
Run queries.py to query from db and export 2 csvs (purchases.csv and dataset.csv)
-
Run purchase_cycle_query.py to query from db and export 1 csv (purchase_cycle.csv)
-
I ran once the models.py to train the best Random Forest Classifier
-
You should run predict.py every three months or so to predict on new users. The results will be in the "purchase_probs.csv" file with user id, purchase probability and if they already bought or not in reality.