Skip to content

Created using Colaboratory #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
352 changes: 351 additions & 1 deletion index.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1,351 @@
{"cells": [{"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["# Website A/B Testing - Lab\n", "\n", "## Introduction\n", "\n", "In this lab, you'll get another chance to practice your skills at conducting a full A/B test analysis. It will also be a chance to practice your data exploration and processing skills! The scenario you'll be investigating is data collected from the homepage of a music app page for audacity.\n", "\n", "## Objectives\n", "\n", "You will be able to:\n", "* Analyze the data from a website A/B test to draw relevant conclusions\n", "* Explore and analyze web action data"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## Exploratory Analysis\n", "\n", "Start by loading in the dataset stored in the file 'homepage_actions.csv'. Then conduct an exploratory analysis to get familiar with the data."]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["> Hints:\n", " * Start investigating the id column:\n", " * How many viewers also clicked?\n", " * Are there any anomalies with the data; did anyone click who didn't view?\n", " * Is there any overlap between the control and experiment groups? \n", " * If so, how do you plan to account for this in your experimental design?"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## Conduct a Statistical Test\n", "\n", "Conduct a statistical test to determine whether the experimental homepage was more effective than that of the control group."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## Verifying Results\n", "\n", "One sensible formulation of the data to answer the hypothesis test above would be to create a binary variable representing each individual in the experiment and control group. This binary variable would represent whether or not that individual clicked on the homepage; 1 for they did and 0 if they did not. \n", "\n", "The variance for the number of successes in a sample of a binomial variable with n observations is given by:\n", "\n", "## $n\\bullet p (1-p)$\n", "\n", "Given this, perform 3 steps to verify the results of your statistical test:\n", "1. Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group. \n", "2. Calculate the number of standard deviations that the actual number of clicks was from this estimate. \n", "3. Finally, calculate a p-value using the normal distribution based on this z-score."]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### Step 1:\n", "Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group. "]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### Step 2:\n", "Calculate the number of standard deviations that the actual number of clicks was from this estimate."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### Step 3: \n", "Finally, calculate a p-value using the normal distribution based on this z-score."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### Analysis:\n", "\n", "Does this result roughly match that of the previous statistical test?\n", "\n", "> Comment: **Your analysis here**"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## Summary\n", "\n", "In this lab, you continued to get more practice designing and conducting AB tests. This required additional work preprocessing and formulating the initial problem in a suitable manner. Additionally, you also saw how to verify results, strengthening your knowledge of binomial variables, and reviewing initial statistical concepts of the central limit theorem, standard deviation, z-scores, and their accompanying p-values."]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6"}}, "nbformat": 4, "nbformat_minor": 2}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Sammywarah/dsc-website-ab-testing-lab/blob/master/Website_A_B_Testing_Lab.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ErmSh__9dumC"
},
"source": [
"# Website A/B Testing - Lab\n",
"\n",
"## Introduction\n",
"\n",
"In this lab, you'll get another chance to practice your skills at conducting a full A/B test analysis. It will also be a chance to practice your data exploration and processing skills! The scenario you'll be investigating is data collected from the homepage of a music app page for audacity.\n",
"\n",
"## Objectives\n",
"\n",
"You will be able to:\n",
"* Analyze the data from a website A/B test to draw relevant conclusions\n",
"* Explore and analyze web action data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "F1zZd7jcdumF"
},
"source": [
"## Exploratory Analysis\n",
"\n",
"Start by loading in the dataset stored in the file 'homepage_actions.csv'. Then conduct an exploratory analysis to get familiar with the data."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rYyxHUl2dumF"
},
"source": [
"> Hints:\n",
" * Start investigating the id column:\n",
" * How many viewers also clicked?\n",
" * Are there any anomalies with the data; did anyone click who didn't view?\n",
" * Is there any overlap between the control and experiment groups?\n",
" * If so, how do you plan to account for this in your experimental design?"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_AYCNLzLfqVD",
"outputId": "1aeef5cc-5cb2-4ab9-df04-de3b0602ef20"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" timestamp id group action\n",
"0 2016-09-24 17:42:27.839496 804196 experiment view\n",
"1 2016-09-24 19:19:03.542569 434745 experiment view\n",
"2 2016-09-24 19:36:00.944135 507599 experiment view\n",
"3 2016-09-24 19:59:02.646620 671993 control view\n",
"4 2016-09-24 20:26:14.466886 536734 experiment view\n",
"Number of viewers who clicked: 1860\n",
"Number of clicks without view: 1860\n",
"Number of users in both control and experiment groups: 0\n"
]
}
],
"source": [
"# Import necessary libraries\n",
"import pandas as pd\n",
"\n",
"# Load the dataset\n",
"df = pd.read_csv('/content/homepage_actions.csv')\n",
"\n",
"# Display the first few rows of the dataset to get an overview\n",
"print(df.head())\n",
"\n",
"# Check the number of viewers who clicked\n",
"viewers_clicked = df[df['action'] == 'click']['id'].nunique()\n",
"print(\"Number of viewers who clicked:\", viewers_clicked)\n",
"\n",
"# Check for any anomalies - viewers who clicked without viewing\n",
"click_without_view = df[df['action'] == 'click']['id'].isin(df[df['action'] == 'view']['id']).sum()\n",
"print(\"Number of clicks without view:\", click_without_view)\n",
"\n",
"# Check for overlap between control and experiment groups\n",
"control_group = df[df['group'] == 'control']['id'].unique()\n",
"experiment_group = df[df['group'] == 'experiment']['id'].unique()\n",
"overlap = len(set(control_group).intersection(experiment_group))\n",
"print(\"Number of users in both control and experiment groups:\", overlap)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4CDr2r2qdumH"
},
"source": [
"## Conduct a Statistical Test\n",
"\n",
"Conduct a statistical test to determine whether the experimental homepage was more effective than that of the control group."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "RKzHiNVpdumI",
"outputId": "c297f67b-5c7c-453d-e449-5cd3450456b4"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Control CTR: 0.2797118847539016\n",
"Experiment CTR: 0.3097463284379172\n",
"Z-score: -2.618563885349469\n",
"P-value: 0.008830075576595804\n"
]
}
],
"source": [
"#Your code here\n",
"import statsmodels.api as sm\n",
"\n",
"# Calculate the number of viewers and clickers in each group\n",
"control_viewers = df[(df['group'] == 'control') & (df['action'] == 'view')]['id'].nunique()\n",
"control_clickers = df[(df['group'] == 'control') & (df['action'] == 'click')]['id'].nunique()\n",
"\n",
"experiment_viewers = df[(df['group'] == 'experiment') & (df['action'] == 'view')]['id'].nunique()\n",
"experiment_clickers = df[(df['group'] == 'experiment') & (df['action'] == 'click')]['id'].nunique()\n",
"\n",
"# Calculate the click-through rates (CTR) for each group\n",
"control_ctr = control_clickers / control_viewers\n",
"experiment_ctr = experiment_clickers / experiment_viewers\n",
"\n",
"# Perform the z-test\n",
"z_score, p_value = sm.stats.proportions_ztest([control_clickers, experiment_clickers], [control_viewers, experiment_viewers])\n",
"\n",
"# Print the results\n",
"print(\"Control CTR:\", control_ctr)\n",
"print(\"Experiment CTR:\", experiment_ctr)\n",
"print(\"Z-score:\", z_score)\n",
"print(\"P-value:\", p_value)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "od0se620dumI"
},
"source": [
"## Verifying Results\n",
"\n",
"One sensible formulation of the data to answer the hypothesis test above would be to create a binary variable representing each individual in the experiment and control group. This binary variable would represent whether or not that individual clicked on the homepage; 1 for they did and 0 if they did not.\n",
"\n",
"The variance for the number of successes in a sample of a binomial variable with n observations is given by:\n",
"\n",
"## $n\\bullet p (1-p)$\n",
"\n",
"Given this, perform 3 steps to verify the results of your statistical test:\n",
"1. Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group.\n",
"2. Calculate the number of standard deviations that the actual number of clicks was from this estimate.\n",
"3. Finally, calculate a p-value using the normal distribution based on this z-score."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7d-83O8xdumI"
},
"source": [
"### Step 1:\n",
"Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "i3xfbTuZdumJ",
"outputId": "b4a2633e-441f-4de5-d247-490335efe26a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Expected number of clicks for the experiment group: 838.0168067226891\n"
]
}
],
"source": [
"#Your code here\n",
"expected_experiment_clicks = experiment_viewers * control_ctr\n",
"print(\"Expected number of clicks for the experiment group:\", expected_experiment_clicks)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CSfsTH_DdumJ"
},
"source": [
"### Step 2:\n",
"Calculate the number of standard deviations that the actual number of clicks was from this estimate."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "kkan0GH_dumK",
"outputId": "2f8f7abf-c4f0-46b3-f9fe-6517f0a9e1ed"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of standard deviations: 3.6625360854823588\n"
]
}
],
"source": [
"#Your code here\n",
"import math\n",
"std_dev_experiment_clicks = math.sqrt(experiment_viewers * control_ctr * (1 - control_ctr))\n",
"\n",
"# Calculate the z-score\n",
"z_score = (experiment_clickers - expected_experiment_clicks) / std_dev_experiment_clicks\n",
"print(\"Number of standard deviations:\", z_score)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1Sg3bV4wdumK"
},
"source": [
"### Step 3:\n",
"Finally, calculate a p-value using the normal distribution based on this z-score."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_fNg-UO0dumL",
"outputId": "a72b55a4-316e-4818-ef72-728b0b19c18d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"P-value based on the z-score: 0.00012486528006949715\n"
]
}
],
"source": [
"#Your code here\n",
"from scipy.stats import norm\n",
"\n",
"# Step 3: Calculate the p-value using the normal distribution\n",
"p_value = 1 - norm.cdf(z_score)\n",
"print(\"P-value based on the z-score:\", p_value)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Yq4YrdhHdumL"
},
"source": [
"### Analysis:\n",
"\n",
"Does this result roughly match that of the previous statistical test?\n",
"\n",
"> Comment: While the values are not exactly the same, they are both relatively low (less than the conventional significance level of 0.05), indicating statistical significance. Additionally, both p-values suggest evidence against the null hypothesis, which is consistent with each other.\n",
"\n",
"Therefore, despite the slight discrepancy in the exact values, the results from both approaches align in indicating that the experimental homepage was more effective than the control group."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mRSWmlBPdumM"
},
"source": [
"## Summary\n",
"\n",
"In this lab, you continued to get more practice designing and conducting AB tests. This required additional work preprocessing and formulating the initial problem in a suitable manner. Additionally, you also saw how to verify results, strengthening your knowledge of binomial variables, and reviewing initial statistical concepts of the central limit theorem, standard deviation, z-scores, and their accompanying p-values."
]
}
],
"metadata": {
"colab": {
"include_colab_link": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}