diff --git a/Recipes/ClearScape_Functions/OneClassSVMandOneClassSVMPredict.ipynb b/Recipes/ClearScape_Functions/OneClassSVMandOneClassSVMPredict.ipynb new file mode 100644 index 00000000..d61a518c --- /dev/null +++ b/Recipes/ClearScape_Functions/OneClassSVMandOneClassSVMPredict.ipynb @@ -0,0 +1,409 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "bc549e6c-0cc4-4188-94a3-a9bdd3ae3dfa", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " OneClassSVM and OneClassSVMPredict functions in Vantage\n", + "
\n", + " \"Teradata\"\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "7ae7611a-0795-4168-b716-01fee6880cbd", + "metadata": {}, + "source": [ + "

Introduction

\n", + "

OneClassSVM is a linear support vector machine (SVM) that performs classification analysis on data sets to identify outliers or novelty in the data. This function supports the Classification (loss: hinge) model. During the training, all the data is assumed to belong to a single class (value 1), therefore ResponseColumn is not needed by the model. For OneClassSVMPredict, output values are 0 or 1. A value of 0 corresponds to an outlier, and 1 to a normal observation or instance.
In this notebook we will see how we can use the OneClassSVM and OneClassSVMPredict functions available in Vantage.

" + ] + }, + { + "cell_type": "markdown", + "id": "6b3a00b4-6661-4c91-9b2d-cb7b0b403140", + "metadata": {}, + "source": [ + "
\n", + "1. Initiate a connection to Vantage" + ] + }, + { + "cell_type": "markdown", + "id": "2346857f-e0d3-488a-8a3f-ac6dff752c2b", + "metadata": {}, + "source": [ + "

In the section, we import the required libraries and set environment variables and environment paths (if required)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5af5af3-29d5-4f6a-8334-9df6924e7787", + "metadata": {}, + "outputs": [], + "source": [ + "from teradataml import *\n", + "\n", + "# Modify the following to match the specific client environment settings\n", + "display.max_rows = 5" + ] + }, + { + "cell_type": "markdown", + "id": "ad3dd7b4-831c-4fb3-ab71-719c8c99a71c", + "metadata": {}, + "source": [ + "


\n", + "

1.1 Connect to Vantage

\n", + "

You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2742444c-4349-4b0f-b4e5-b068a8785cd9", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/startup.ipynb\n", + "eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)\n", + "print(eng)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e14915b0-7932-4e03-94ba-20f0599c3707", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "execute_sql('''SET query_band='DEMO=PP_OneClassSVM_and_OneClassSVMPredict_Python.ipynb;' UPDATE FOR SESSION; ''')" + ] + }, + { + "cell_type": "markdown", + "id": "efe2fd2d-63ff-4278-9157-8b9110d682e8", + "metadata": {}, + "source": [ + "

Begin running steps with Shift + Enter keys.

" + ] + }, + { + "cell_type": "markdown", + "id": "f003f332-7489-4bdd-a740-4af2a0a22280", + "metadata": {}, + "source": [ + "
\n", + "\n", + "

1.2 Getting Data for This Demo

\n", + "\n", + "

Here, we will get the data which is available in the teradataml library and use the same to show the usage of the function.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45c86176-734c-4b1c-ace0-d0c88657b4f8", + "metadata": {}, + "outputs": [], + "source": [ + "load_example_data(\"teradataml\", [\"cal_housing_ex_raw\"])" + ] + }, + { + "cell_type": "markdown", + "id": "2401d6d3-4fcd-46fc-8a94-7cafcd1258b0", + "metadata": {}, + "source": [ + "

Next is an optional step – if you want to see the status of databases/tables created and space used.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87429200-db02-450d-9472-4d1e2030124d", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/run_procedure.py \"call space_report();\" # Takes 10 seconds" + ] + }, + { + "cell_type": "markdown", + "id": "2a3762ac-ba27-4fa3-adba-d577262a4290", + "metadata": {}, + "source": [ + "
\n", + "2. Data Exploration\n", + "

Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d936fab-7ca7-4e94-ba64-95c1da08b74f", + "metadata": {}, + "outputs": [], + "source": [ + "tdf = DataFrame.from_table(\"cal_housing_ex_raw\")\n", + "print(\"Shape of the data: \", tdf.shape)\n", + "tdf" + ] + }, + { + "cell_type": "markdown", + "id": "12db89ee-95dc-4a07-b734-b7f6d76a7013", + "metadata": {}, + "source": [ + "

Scaling the data using Scalefit and ScaleTransform functions

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1befb2c6-b9cb-4382-8836-7da223dd0e7b", + "metadata": {}, + "outputs": [], + "source": [ + "# Scale \"target_columns\" with respect to 'STD' value of the column.\n", + "fit_obj = ScaleFit(data=tdf,\n", + " target_columns=['MedInc', 'HouseAge', 'AveRooms',\n", + " 'AveBedrms', 'Population', 'AveOccup',\n", + " 'Latitude', 'Longitude'],\n", + " scale_method=\"STD\")\n", + " \n", + " # Transform the data.\n", + "transform_obj = ScaleTransform(data=tdf,\n", + " object=fit_obj.output,\n", + " accumulate=[\"id\", \"MedHouseVal\"])" + ] + }, + { + "cell_type": "markdown", + "id": "0d0adaf2-461e-48ff-87ce-b6038db8254a", + "metadata": {}, + "source": [ + "

Creating OneClassSVM model to find anomalies.
Detailed help can be found by passing function name to built-in help function.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d413a344-a12d-46ae-a27b-6702733387c4", + "metadata": {}, + "outputs": [], + "source": [ + "help(OneClassSVM)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8db7efa9-25d9-4da9-a142-ceeb29a9273e", + "metadata": {}, + "outputs": [], + "source": [ + "# Train the input data by OneClassSVM which helps model\n", + "# to find anomalies in transformed data.\n", + "one_class_svm = OneClassSVM(data=transform_obj.result,\n", + " input_columns=['MedInc', 'HouseAge', 'AveRooms',\n", + " 'AveBedrms', 'Population', 'AveOccup',\n", + " 'Latitude', 'Longitude'],\n", + " local_sgd_iterations=537,\n", + " batch_size=1,\n", + " learning_rate='constant',\n", + " initial_eta=0.01,\n", + " lambda1=0.1,\n", + " alpha=0.0,\n", + " momentum=0.0,\n", + " iter_max=1\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d6249f6-3ac4-4375-8640-3ec937c98b88", + "metadata": {}, + "outputs": [], + "source": [ + "# Print the result DataFrame.\n", + "one_class_svm.result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5dd6bae-6f7b-4d97-8ecf-0a6acc5f3765", + "metadata": {}, + "outputs": [], + "source": [ + "one_class_svm.output_data" + ] + }, + { + "cell_type": "markdown", + "id": "07aa6f02-d870-4fd3-a1bd-6a6ac9fd6d82", + "metadata": {}, + "source": [ + "

Predict the values if they are anomalies or not using the model created above by OneClassSVMPredict function.
\n", + "Detailed help can be found by passing function name to built-in help function.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8880ac6a-8aa8-4201-93c9-c63287ab442a", + "metadata": {}, + "outputs": [], + "source": [ + "help(OneClassSVMPredict)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b8756eb9-bf0a-4fd8-b787-e9cf10833ee8", + "metadata": {}, + "outputs": [], + "source": [ + "OneClassSVMPredict_out = OneClassSVMPredict(newdata=transform_obj.result,\n", + " object=one_class_svm.result,\n", + " id_column=\"id\"\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d9259a2c-5de2-4f11-8eca-29097b711f85", + "metadata": {}, + "outputs": [], + "source": [ + "OneClassSVMPredict_out.result" + ] + }, + { + "cell_type": "markdown", + "id": "4182173a-0c1a-4dfa-aeb3-fe857840ff68", + "metadata": {}, + "source": [ + "

Check if data had anomalies by looking at the result.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "928f4460-4ee5-4b1f-8a20-7736f37489e8", + "metadata": {}, + "outputs": [], + "source": [ + "a = OneClassSVMPredict_out.result\n", + "a[a.prediction == 0]" + ] + }, + { + "cell_type": "markdown", + "id": "343cb73a-60fa-4592-86d8-c914b0dedd1f", + "metadata": {}, + "source": [ + "

From above result we can see the ids which had anomalies.

" + ] + }, + { + "cell_type": "markdown", + "id": "151d5db4-29a9-49d9-8a61-d53f9627a294", + "metadata": {}, + "source": [ + "
\n", + "3. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "a562f058-fb24-4966-a25d-f2960e6ddfb8", + "metadata": {}, + "source": [ + "
\n", + "

Databases and Tables

\n", + "

The following code will clean up tables and databases created above.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6b3935b-47c2-4a96-bec2-68106d172116", + "metadata": {}, + "outputs": [], + "source": [ + "db_drop_table(\"cal_housing_ex_raw\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "157fe3d4-4e0e-4d92-b343-9f758f3bf690", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "4317a6cf-1479-4aa8-b30a-ee0a3b5231a8", + "metadata": {}, + "source": [ + "
\n", + "

Links:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "b2dcca28-5de5-44d7-88cb-45a12153b3f8", + "metadata": {}, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}