From af67e9e970b3bd9b6ebd4869917094243b59ac3d Mon Sep 17 00:00:00 2001 From: Nupur Lal Date: Tue, 28 Jan 2025 11:57:39 +0000 Subject: [PATCH 1/3] new recipe notebook for attribution function --- .../ClearScape_Functions/Attribution.ipynb | 494 ++++++++++++++++++ 1 file changed, 494 insertions(+) create mode 100644 Recipes/ClearScape_Functions/Attribution.ipynb diff --git a/Recipes/ClearScape_Functions/Attribution.ipynb b/Recipes/ClearScape_Functions/Attribution.ipynb new file mode 100644 index 00000000..4234f88c --- /dev/null +++ b/Recipes/ClearScape_Functions/Attribution.ipynb @@ -0,0 +1,494 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "bc549e6c-0cc4-4188-94a3-a9bdd3ae3dfa", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " Attribution function in Vantage\n", + "
\n", + " \"Teradata\"\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "7ae7611a-0795-4168-b716-01fee6880cbd", + "metadata": {}, + "source": [ + "

Introduction

\n", + "

Attribution refers to the process of assigning credit or responsibility to a specific event or entity that contributes to an outcome of interest.Specifically, the Attribution function is used for web page analysis, which refers to the process of assigning value or credit to different pages on a website for specific actions taken by visitors, such as making a purchase or filling out a form. The goal of attribution is to identify the most effective pages or content on a website that contribute to achieving business goals. By assigning weights or credit to different pages, organizations can optimize their website by improving or eliminating underperforming pages and investing more resources into the most effective ones. Attribution can be done using various methods, including rule-based attribution and data-driven attribution.
In this notebook we will see how we can use the Attribution function available in Vantage.

" + ] + }, + { + "cell_type": "markdown", + "id": "6b3a00b4-6661-4c91-9b2d-cb7b0b403140", + "metadata": {}, + "source": [ + "
\n", + "1. Initiate a connection to Vantage" + ] + }, + { + "cell_type": "markdown", + "id": "2346857f-e0d3-488a-8a3f-ac6dff752c2b", + "metadata": {}, + "source": [ + "

In the section, we import the required libraries and set environment variables and environment paths (if required)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5af5af3-29d5-4f6a-8334-9df6924e7787", + "metadata": {}, + "outputs": [], + "source": [ + "from teradataml import *\n", + "\n", + "# Modify the following to match the specific client environment settings\n", + "display.max_rows = 5" + ] + }, + { + "cell_type": "markdown", + "id": "ad3dd7b4-831c-4fb3-ab71-719c8c99a71c", + "metadata": {}, + "source": [ + "


\n", + "

1.1 Connect to Vantage

\n", + "

You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2742444c-4349-4b0f-b4e5-b068a8785cd9", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/startup.ipynb\n", + "eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)\n", + "print(eng)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e14915b0-7932-4e03-94ba-20f0599c3707", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "execute_sql('''SET query_band='DEMO=PP_Attribution_Python.ipynb;' UPDATE FOR SESSION; ''')" + ] + }, + { + "cell_type": "markdown", + "id": "efe2fd2d-63ff-4278-9157-8b9110d682e8", + "metadata": {}, + "source": [ + "

Begin running steps with Shift + Enter keys.

" + ] + }, + { + "cell_type": "markdown", + "id": "f003f332-7489-4bdd-a740-4af2a0a22280", + "metadata": {}, + "source": [ + "
\n", + "\n", + "

1.2 Getting Data for This Demo

\n", + "\n", + "

We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45c86176-734c-4b1c-ace0-d0c88657b4f8", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/run_procedure.py \"call get_data('DEMO_Financial_cloud');\" # Takes 30 seconds\n", + "#%run -i ../../UseCases/run_procedure.py \"call get_data('DEMO_Financial_local');\" " + ] + }, + { + "cell_type": "markdown", + "id": "2401d6d3-4fcd-46fc-8a94-7cafcd1258b0", + "metadata": {}, + "source": [ + "

Next is an optional step – if you want to see the status of databases/tables created and space used.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87429200-db02-450d-9472-4d1e2030124d", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/run_procedure.py \"call space_report();\" # Takes 10 seconds" + ] + }, + { + "cell_type": "markdown", + "id": "2a3762ac-ba27-4fa3-adba-d577262a4290", + "metadata": {}, + "source": [ + "
\n", + "2. Data Exploration\n", + "

Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d936fab-7ca7-4e94-ba64-95c1da08b74f", + "metadata": {}, + "outputs": [], + "source": [ + "tdf = DataFrame(in_schema('DEMO_Financial', 'Customer_journey'))\n", + "print(\"Shape of the data: \", tdf.shape)\n", + "tdf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3c5a0992-f651-49bc-9080-828fd9c0c982", + "metadata": {}, + "outputs": [], + "source": [ + "tdf.tdtypes" + ] + }, + { + "cell_type": "markdown", + "id": "6a94af31-89fb-4c0e-b478-c2c86f92539d", + "metadata": {}, + "source": [ + "

The Attribution function takes data and parameters from multiple tables and outputs attributions. Please refer to Teradata Vantage™ - Analytics Database Analytic Functions documentation for more on Attribution function or use help(Attribution)

\n", + "\n", + "

Attribution Input :\n", + "

    \n", + "
  1. Input tables (maximum of five) (Contain data for computing attributions).
  2. \n", + "
  3. ConversionEventTable (Contains conversion events).
  4. \n", + "
  5. FirstModelTable (Defines type and distributions of model - we'll create one table per model)
\n", + "

\n", + "\n", + "

Attribution Syntax Elements:\n", + "

    \n", + "
  1. EventColumn specifies the name of the input column that contains the events.
  2. \n", + "
  3. TimeColumn specifies the name of the input column that contains the timestamps of the events.
  4. \n", + "
  5. WindowSize specifies how to determine the maximum window size for the attribution calculation
\n", + "

\n", + "\n", + "

We will create the model and conversion tables that allow us to send large numbers of parameters programmatically to the Attribution analytic function.

\n", + "\n", + "

Note: We are using SQLs to create the needed tables and insert the required values in these tables

" + ] + }, + { + "cell_type": "markdown", + "id": "502f4e4a-b5cc-4fea-b76e-c367ba487a33", + "metadata": {}, + "source": [ + "

Detailed help can be found by passing function name to built-in help function.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1e6fa334-3aa1-407a-9788-e4a5a43c8013", + "metadata": {}, + "outputs": [], + "source": [ + "help(Attribution)" + ] + }, + { + "cell_type": "markdown", + "id": "f8111d7e-7404-4a7e-be5d-aca35d05861d", + "metadata": {}, + "source": [ + "

For the input to our Attribution function, let's create Conversion event table i.e the events we want to track and the Model table for the type of attribution model we want to use.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fccb96f-6650-410a-b1f7-6f58bcddcf38", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "query1 = '''\n", + "CREATE MULTISET TABLE conversion_events (\n", + " conversion_event VARCHAR(55)\n", + ")\n", + "NO PRIMARY INDEX;\n", + "'''\n", + "query2 = '''\n", + "INSERT INTO conversion_events VALUES('ACCOUNT_BOOKED_ONLINE');\n", + "INSERT INTO conversion_events VALUES('ACCOUNT_BOOKED_OFFLINE');\n", + "'''\n", + "execute_sql(query1)\n", + "execute_sql(query2)" + ] + }, + { + "cell_type": "markdown", + "id": "ba1fdb98-0928-4a71-b15f-579f30f12ab5", + "metadata": {}, + "source": [ + "

In our Conversion event table we have added 'ACCOUNT_BOOKED_ONLINE' and 'ACCOUNT_BOOKED_OFFLINE' events as we want to assign attribution score based on these events. Next is to assign attribution model

\n", + "

Following methods available in Attribution:\n", + "

\n", + "

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "acf86530-4119-4bb3-87c1-51a279bd80e8", + "metadata": {}, + "outputs": [], + "source": [ + "query1 = '''\n", + "CREATE MULTISET TABLE attribution_model (\n", + " id INTEGER,\n", + " model VARCHAR(100)\n", + ")\n", + "NO PRIMARY INDEX;\n", + "'''\n", + "query2 = '''\n", + "INSERT INTO attribution_model VALUES(0,'SEGMENT_ROWS');\n", + "INSERT INTO attribution_model VALUES(1,'3:0.5:UNIFORM:NA');\n", + "INSERT INTO attribution_model VALUES(2,'4:0.3:LAST_CLICK:NA');\n", + "INSERT INTO attribution_model VALUES(3,'3:0.2:FIRST_CLICK:NA');\n", + "'''\n", + "execute_sql(query1)\n", + "execute_sql(query2)" + ] + }, + { + "cell_type": "markdown", + "id": "4f52f8eb-717c-48dc-a7e3-20f5418d5cf1", + "metadata": {}, + "source": [ + "

In the above Attribution Model the attribution score is assigned as followed. Attribution for a conversion event is divided among attributable events in 10 rows immediately preceding conversion event.
If conversion event is in row 11, first model specification applies to rows 10, 9, and 8; second applies to rows 7, 6, 5, and 4; and third applies to rows 3, 2, and 1.
\n", + "Half attribution (5/10) is uniformly divided among rows 10, 9, and 8; 3/10 to last click in rows 7, 6, 5, and 4 (that is, in row 7), and 2/10 to first click in rows 3, 2, and 1." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "76b40377-f96c-42c8-92ed-62765d3a3b69", + "metadata": {}, + "outputs": [], + "source": [ + "ConvEvent_df = DataFrame(\"conversion_events\")\n", + "FirstModel_df = DataFrame(\"attribution_model\") " + ] + }, + { + "cell_type": "markdown", + "id": "8329bfce-3c00-419b-8d45-867e94b3e1ef", + "metadata": {}, + "source": [ + "

Attribution function requires the datatype of the Event column to be Latin, hence we convert our data from Unicode to Latin.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2d7528a-b158-47fe-912f-864b4673afd6", + "metadata": {}, + "outputs": [], + "source": [ + "from teradataml import ConvertTo\n", + "converted_data = ConvertTo(data = tdf,\n", + " target_columns = ['interaction_type'],\n", + " target_datatype = [\"VARCHAR(charlen=100,charset=LATIN,casespecific=NO)\"])\n", + "convert_tdf=converted_data.result\n", + "convert_tdf.to_sql('convert_tdf')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f24ca149-ed21-4aa9-a755-31282af91acd", + "metadata": {}, + "outputs": [], + "source": [ + "attribution_out = Attribution(data=DataFrame('convert_tdf'),\n", + " data_partition_column=\"customer_identifier\",\n", + " data_order_column=\"interaction_timestamp\",\n", + " event_column=\"interaction_type\",\n", + " conversion_data=ConvEvent_df,\n", + " timestamp_column = \"interaction_timestamp\",\n", + " window_size = \"rows:10\",\n", + " model1_type=FirstModel_df)\n", + "\n", + "Attrdf = attribution_out.result\n", + "Attrdf.sort('attribution',ascending=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "abd032b6-ea5b-4485-9f00-53a06e1025db", + "metadata": {}, + "outputs": [], + "source": [ + "print(attribution_out.show_query())" + ] + }, + { + "cell_type": "markdown", + "id": "151d5db4-29a9-49d9-8a61-d53f9627a294", + "metadata": {}, + "source": [ + "
\n", + "2. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "a562f058-fb24-4966-a25d-f2960e6ddfb8", + "metadata": {}, + "source": [ + "
\n", + "

Databases and Tables

\n", + "

The following code will clean up tables and databases created above.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "baa36112-f1fe-4c1a-a47d-c7709ec0f051", + "metadata": {}, + "outputs": [], + "source": [ + "tables = ['conversion_events', 'attribution_model', 'convert_tdf']\n", + "\n", + "# Loop through the list of tables and execute the drop table command for each table\n", + "for table in tables:\n", + " try:\n", + " db_drop_table(table_name = table, schema_name = 'demo_user') \n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6b3935b-47c2-4a96-bec2-68106d172116", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/run_procedure.py \"call remove_data('DEMO_Financial_cloud');\" # Takes 10 seconds" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "157fe3d4-4e0e-4d92-b343-9f758f3bf690", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "4317a6cf-1479-4aa8-b30a-ee0a3b5231a8", + "metadata": {}, + "source": [ + "
\n", + "Dataset:\n", + "\n", + "`Customer_Journey`\n", + "\n", + "- `customer_skey`: customer key\n", + "- `customer_identifier`: unique customer identifier\n", + "- `customer_cookie`: cookie placed on customers device\n", + "- `customer_online_id`: boolean - does the customer have an online account\n", + "- `customer_offline_id`: customer account number\n", + "- `customer_type`: is this a high value customer or just a visitor browsing the website?\n", + "- `customer_days_active`: how long has the customer been active\n", + "- `interaction_session_number`: session identifier\n", + "- `interaction_timestamp`: timestamp for this event\n", + "- `interaction_source`: channel this event is from (online / offline, in branch etc.)\n", + "- `interaction_type`: type of event\n", + "- `sales_channel`: channel a sales event was in\n", + "- `conversion_id`: sales conversion identifier\n", + "- `product_category`: what type of product the event concerned (checking, savings, cd etc..)\n", + "- `product_type`: unused\n", + "- `conversion_sales`: unused\n", + "- `conversion_cost`: unused\n", + "- `conversion_margin`: unused\n", + "- `conversion_units`: unused\n", + "- `marketing_code`: marketing identifier\n", + "- `marketing_category`: marketing channel (inbranch, website, email etc..)\n", + "- `marketing_description`: marketing campaign name\n", + "- `marketing_placement`: specific marketing outlet (Google, Bloomberg.com etc..)\n", + "- `mobile_flag`: boolean was on a mobile device\n", + "- `updt`: unused\n", + "\n", + "

Links:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "b2dcca28-5de5-44d7-88cb-45a12153b3f8", + "metadata": {}, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 83e20e3b5dd074f296bb3da6ef75023546f385ce Mon Sep 17 00:00:00 2001 From: Nupur Lal Date: Mon, 3 Feb 2025 07:41:39 +0000 Subject: [PATCH 2/3] updates --- Recipes/ClearScape_Functions/Attribution.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Recipes/ClearScape_Functions/Attribution.ipynb b/Recipes/ClearScape_Functions/Attribution.ipynb index 4234f88c..d2d5ef25 100644 --- a/Recipes/ClearScape_Functions/Attribution.ipynb +++ b/Recipes/ClearScape_Functions/Attribution.ipynb @@ -398,7 +398,7 @@ "metadata": {}, "outputs": [], "source": [ - "%run -i ../../UseCases/run_procedure.py \"call remove_data('DEMO_Financial_cloud');\" # Takes 10 seconds" + "%run -i ../../UseCases/run_procedure.py \"call remove_data('DEMO_Financial');\" # Takes 10 seconds" ] }, { From 675382e64b9e833fecaf5d179b3c32151470a7ba Mon Sep 17 00:00:00 2001 From: Nupur Lal Date: Tue, 4 Feb 2025 06:00:51 +0000 Subject: [PATCH 3/3] correction in numbering --- Recipes/ClearScape_Functions/Attribution.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Recipes/ClearScape_Functions/Attribution.ipynb b/Recipes/ClearScape_Functions/Attribution.ipynb index d2d5ef25..0f5465f7 100644 --- a/Recipes/ClearScape_Functions/Attribution.ipynb +++ b/Recipes/ClearScape_Functions/Attribution.ipynb @@ -361,7 +361,7 @@ "metadata": {}, "source": [ "
\n", - "2. Cleanup" + "3. Cleanup" ] }, {