learn-co-curriculum · Sammywarah · Apr 1, 2024 · Apr 1, 2024 · Apr 1, 2024
diff --git a/index.ipynb b/index.ipynb
@@ -1 +1,351 @@
-{"cells": [{"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["# Website A/B Testing - Lab\n", "\n", "## Introduction\n", "\n", "In this lab, you'll get another chance to practice your skills at conducting a full A/B test analysis. It will also be a chance to practice your data exploration and processing skills! The scenario you'll be investigating is data collected from the homepage of a music app page for audacity.\n", "\n", "## Objectives\n", "\n", "You will be able to:\n", "* Analyze the data from a website A/B test to draw relevant conclusions\n", "* Explore and analyze web action data"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## Exploratory Analysis\n", "\n", "Start by loading in the dataset stored in the file 'homepage_actions.csv'. Then conduct an exploratory analysis to get familiar with the data."]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["> Hints:\n", "    * Start investigating the id column:\n", "        * How many viewers also clicked?\n", "        * Are there any anomalies with the data; did anyone click who didn't view?\n", "        * Is there any overlap between the control and experiment groups? \n", "            * If so, how do you plan to account for this in your experimental design?"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## Conduct a Statistical Test\n", "\n", "Conduct a statistical test to determine whether the experimental homepage was more effective than that of the control group."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## Verifying Results\n", "\n", "One sensible formulation of the data to answer the hypothesis test above would be to create a binary variable representing each individual in the experiment and control group. This binary variable would represent whether or not that individual clicked on the homepage; 1 for they did and 0 if they did not. \n", "\n", "The variance for the number of successes in a sample of a binomial variable with n observations is given by:\n", "\n", "## $n\\bullet p (1-p)$\n", "\n", "Given this, perform 3 steps to verify the results of your statistical test:\n", "1. Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group. \n", "2. Calculate the number of standard deviations that the actual number of clicks was from this estimate. \n", "3. Finally, calculate a p-value using the normal distribution based on this z-score."]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### Step 1:\n", "Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group. "]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### Step 2:\n", "Calculate the number of standard deviations that the actual number of clicks was from this estimate."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### Step 3: \n", "Finally, calculate a p-value using the normal distribution based on this z-score."]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["#Your code here"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["### Analysis:\n", "\n", "Does this result roughly match that of the previous statistical test?\n", "\n", "> Comment: **Your analysis here**"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["## Summary\n", "\n", "In this lab, you continued to get more practice designing and conducting AB tests. This required additional work preprocessing and formulating the initial problem in a suitable manner. Additionally, you also saw how to verify results, strengthening your knowledge of binomial variables, and reviewing initial statistical concepts of the central limit theorem, standard deviation, z-scores, and their accompanying p-values."]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6"}}, "nbformat": 4, "nbformat_minor": 2}
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/Sammywarah/dsc-website-ab-testing-lab/blob/master/Website_A_B_Testing_Lab.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ErmSh__9dumC"
+      },
+      "source": [
+        "# Website A/B Testing - Lab\n",
+        "\n",
+        "## Introduction\n",
+        "\n",
+        "In this lab, you'll get another chance to practice your skills at conducting a full A/B test analysis. It will also be a chance to practice your data exploration and processing skills! The scenario you'll be investigating is data collected from the homepage of a music app page for audacity.\n",
+        "\n",
+        "## Objectives\n",
+        "\n",
+        "You will be able to:\n",
+        "* Analyze the data from a website A/B test to draw relevant conclusions\n",
+        "* Explore and analyze web action data"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "F1zZd7jcdumF"
+      },
+      "source": [
+        "## Exploratory Analysis\n",
+        "\n",
+        "Start by loading in the dataset stored in the file 'homepage_actions.csv'. Then conduct an exploratory analysis to get familiar with the data."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "rYyxHUl2dumF"
+      },
+      "source": [
+        "> Hints:\n",
+        "    * Start investigating the id column:\n",
+        "        * How many viewers also clicked?\n",
+        "        * Are there any anomalies with the data; did anyone click who didn't view?\n",
+        "        * Is there any overlap between the control and experiment groups?\n",
+        "            * If so, how do you plan to account for this in your experimental design?"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "_AYCNLzLfqVD",
+        "outputId": "1aeef5cc-5cb2-4ab9-df04-de3b0602ef20"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "                    timestamp      id       group action\n",
+            "0  2016-09-24 17:42:27.839496  804196  experiment   view\n",
+            "1  2016-09-24 19:19:03.542569  434745  experiment   view\n",
+            "2  2016-09-24 19:36:00.944135  507599  experiment   view\n",
+            "3  2016-09-24 19:59:02.646620  671993     control   view\n",
+            "4  2016-09-24 20:26:14.466886  536734  experiment   view\n",
+            "Number of viewers who clicked: 1860\n",
+            "Number of clicks without view: 1860\n",
+            "Number of users in both control and experiment groups: 0\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Import necessary libraries\n",
+        "import pandas as pd\n",
+        "\n",
+        "# Load the dataset\n",
+        "df = pd.read_csv('/content/homepage_actions.csv')\n",
+        "\n",
+        "# Display the first few rows of the dataset to get an overview\n",
+        "print(df.head())\n",
+        "\n",
+        "# Check the number of viewers who clicked\n",
+        "viewers_clicked = df[df['action'] == 'click']['id'].nunique()\n",
+        "print(\"Number of viewers who clicked:\", viewers_clicked)\n",
+        "\n",
+        "# Check for any anomalies - viewers who clicked without viewing\n",
+        "click_without_view = df[df['action'] == 'click']['id'].isin(df[df['action'] == 'view']['id']).sum()\n",
+        "print(\"Number of clicks without view:\", click_without_view)\n",
+        "\n",
+        "# Check for overlap between control and experiment groups\n",
+        "control_group = df[df['group'] == 'control']['id'].unique()\n",
+        "experiment_group = df[df['group'] == 'experiment']['id'].unique()\n",
+        "overlap = len(set(control_group).intersection(experiment_group))\n",
+        "print(\"Number of users in both control and experiment groups:\", overlap)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4CDr2r2qdumH"
+      },
+      "source": [
+        "## Conduct a Statistical Test\n",
+        "\n",
+        "Conduct a statistical test to determine whether the experimental homepage was more effective than that of the control group."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "RKzHiNVpdumI",
+        "outputId": "c297f67b-5c7c-453d-e449-5cd3450456b4"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Control CTR: 0.2797118847539016\n",
+            "Experiment CTR: 0.3097463284379172\n",
+            "Z-score: -2.618563885349469\n",
+            "P-value: 0.008830075576595804\n"
+          ]
+        }
+      ],
+      "source": [
+        "#Your code here\n",
+        "import statsmodels.api as sm\n",
+        "\n",
+        "# Calculate the number of viewers and clickers in each group\n",
+        "control_viewers = df[(df['group'] == 'control') & (df['action'] == 'view')]['id'].nunique()\n",
+        "control_clickers = df[(df['group'] == 'control') & (df['action'] == 'click')]['id'].nunique()\n",
+        "\n",
+        "experiment_viewers = df[(df['group'] == 'experiment') & (df['action'] == 'view')]['id'].nunique()\n",
+        "experiment_clickers = df[(df['group'] == 'experiment') & (df['action'] == 'click')]['id'].nunique()\n",
+        "\n",
+        "# Calculate the click-through rates (CTR) for each group\n",
+        "control_ctr = control_clickers / control_viewers\n",
+        "experiment_ctr = experiment_clickers / experiment_viewers\n",
+        "\n",
+        "# Perform the z-test\n",
+        "z_score, p_value = sm.stats.proportions_ztest([control_clickers, experiment_clickers], [control_viewers, experiment_viewers])\n",
+        "\n",
+        "# Print the results\n",
+        "print(\"Control CTR:\", control_ctr)\n",
+        "print(\"Experiment CTR:\", experiment_ctr)\n",
+        "print(\"Z-score:\", z_score)\n",
+        "print(\"P-value:\", p_value)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "od0se620dumI"
+      },
+      "source": [
+        "## Verifying Results\n",
+        "\n",
+        "One sensible formulation of the data to answer the hypothesis test above would be to create a binary variable representing each individual in the experiment and control group. This binary variable would represent whether or not that individual clicked on the homepage; 1 for they did and 0 if they did not.\n",
+        "\n",
+        "The variance for the number of successes in a sample of a binomial variable with n observations is given by:\n",
+        "\n",
+        "## $n\\bullet p (1-p)$\n",
+        "\n",
+        "Given this, perform 3 steps to verify the results of your statistical test:\n",
+        "1. Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group.\n",
+        "2. Calculate the number of standard deviations that the actual number of clicks was from this estimate.\n",
+        "3. Finally, calculate a p-value using the normal distribution based on this z-score."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7d-83O8xdumI"
+      },
+      "source": [
+        "### Step 1:\n",
+        "Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "i3xfbTuZdumJ",
+        "outputId": "b4a2633e-441f-4de5-d247-490335efe26a"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Expected number of clicks for the experiment group: 838.0168067226891\n"
+          ]
+        }
+      ],
+      "source": [
+        "#Your code here\n",
+        "expected_experiment_clicks = experiment_viewers * control_ctr\n",
+        "print(\"Expected number of clicks for the experiment group:\", expected_experiment_clicks)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "CSfsTH_DdumJ"
+      },
+      "source": [
+        "### Step 2:\n",
+        "Calculate the number of standard deviations that the actual number of clicks was from this estimate."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "kkan0GH_dumK",
+        "outputId": "2f8f7abf-c4f0-46b3-f9fe-6517f0a9e1ed"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Number of standard deviations: 3.6625360854823588\n"
+          ]
+        }
+      ],
+      "source": [
+        "#Your code here\n",
+        "import math\n",
+        "std_dev_experiment_clicks = math.sqrt(experiment_viewers * control_ctr * (1 - control_ctr))\n",
+        "\n",
+        "# Calculate the z-score\n",
+        "z_score = (experiment_clickers - expected_experiment_clicks) / std_dev_experiment_clicks\n",
+        "print(\"Number of standard deviations:\", z_score)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1Sg3bV4wdumK"
+      },
+      "source": [
+        "### Step 3:\n",
+        "Finally, calculate a p-value using the normal distribution based on this z-score."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "_fNg-UO0dumL",
+        "outputId": "a72b55a4-316e-4818-ef72-728b0b19c18d"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "P-value based on the z-score: 0.00012486528006949715\n"
+          ]
+        }
+      ],
+      "source": [
+        "#Your code here\n",
+        "from scipy.stats import norm\n",
+        "\n",
+        "# Step 3: Calculate the p-value using the normal distribution\n",
+        "p_value = 1 - norm.cdf(z_score)\n",
+        "print(\"P-value based on the z-score:\", p_value)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Yq4YrdhHdumL"
+      },
+      "source": [
+        "### Analysis:\n",
+        "\n",
+        "Does this result roughly match that of the previous statistical test?\n",
+        "\n",
+        "> Comment: While the values are not exactly the same, they are both relatively low (less than the conventional significance level of 0.05), indicating statistical significance. Additionally, both p-values suggest evidence against the null hypothesis, which is consistent with each other.\n",
+        "\n",
+        "Therefore, despite the slight discrepancy in the exact values, the results from both approaches align in indicating that the experimental homepage was more effective than the control group."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "mRSWmlBPdumM"
+      },
+      "source": [
+        "## Summary\n",
+        "\n",
+        "In this lab, you continued to get more practice designing and conducting AB tests. This required additional work preprocessing and formulating the initial problem in a suitable manner. Additionally, you also saw how to verify results, strengthening your knowledge of binomial variables, and reviewing initial statistical concepts of the central limit theorem, standard deviation, z-scores, and their accompanying p-values."
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "include_colab_link": true,
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.6.6"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}