Skip to content

Latest commit

 

History

History
55 lines (36 loc) · 3.25 KB

README.md

File metadata and controls

55 lines (36 loc) · 3.25 KB

##Supervised Prediction of High Value Retail E-Commerce Customers Using Survival and LTV Analyses##

###Introduction and Usage:

  • The combination of the programs below produces a survival analysis and Lifetime Value (LTV) analysis of retail e-commerce customers. After getting LTV values for each customer, supervised learning techniques are used to predict high-value customers based on user and behavioral attributes. The programs were built around the business model that a user can "save" a third-party retail item she/he likes to a database and can later purchase the product when it goes on sale. The retail ecommerce company then receives a commission from that sale.
  • To conduct the full survival analysis, Lifetime Value (LTV) analysis, and prediction of high value customers, the programs should be run in sequential order as laid out below.

###frequencies.py

  • Main purpose: To calculate the mean and median frequencies of use in days (and standard deviation of the usage). For instance, someone who has a mean frequency of use of 7 days and a standard deviation of 0 days can be thought of as having used the product consistently once every week on the same day of the week.

  • Input: A .csv file with the following columns:

    • id: This identifies the instance when the item was saved.
    • created_on: The date and time when the item was saved.
    • user_id: Identifies the user who saved the item.
  • Output: A .csv file with the following columns:

    • user_id, mean_freq, median_freq, std_freq, first_use_date, last_use_date, use_count

###purchase_info.py

  • Main purpose: To summarize the purchasing habits at a user level.

  • Input: A .csv file with the following columns:

    • id: This identifies the instance when a saved item was purchased.
    • use_id: This identifies the instance when the item was saved.
    • user_id: Identifies the user who saved the item.
    • store_id: Third-party retailer from which the item was purchased.
    • transaction_date: The date and time when the item was purchased.
    • num_items: Number of items purchased.
    • total_order_value: Value at which the user purchased the item.
    • commission_value: The commission received from the purchase.
    • currency: Currency of the value at which the user purchased the item.
  • Output:

    • user_id, num_items_purch, total_order_value, commission_value, first_purchase_amount, last_purch_date, first_purch_date, most_used_store

###combine_freq_purch_info.py

  • Main purpose: To combine the output from frequencies.py with the output from purchase_info.py at a user level.

###survival_analysis.py

###lifetime_value.py

  • Main purpose: To conduct a customer Lifetime Value (LTV) analysis using the results from the survival analysis.

###model_pred.py

  • Main purpose: Prediction of high-value customers (using LTV labels) based on user attributes and behaviors such as amount of first purchase and time between first use and first purchase.

###plotting.py

  • Main purpose: Some plotting of survival functions, LTVs, and use of product count histograms.