diff --git a/Recipes/ClearScape_Functions/OneClassSVMandOneClassSVMPredict.ipynb b/Recipes/ClearScape_Functions/OneClassSVMandOneClassSVMPredict.ipynb
new file mode 100644
index 00000000..d61a518c
--- /dev/null
+++ b/Recipes/ClearScape_Functions/OneClassSVMandOneClassSVMPredict.ipynb
@@ -0,0 +1,409 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "bc549e6c-0cc4-4188-94a3-a9bdd3ae3dfa",
+ "metadata": {},
+ "source": [
+ " \n",
+ " OneClassSVM and OneClassSVMPredict functions in Vantage\n",
+ "
\n",
+ " \n",
+ "
Introduction
\n", + "OneClassSVM is a linear support vector machine (SVM) that performs classification analysis on data sets to identify outliers or novelty in the data. This function supports the Classification (loss: hinge) model. During the training, all the data is assumed to belong to a single class (value 1), therefore ResponseColumn is not needed by the model. For OneClassSVMPredict, output values are 0 or 1. A value of 0 corresponds to an outlier, and 1 to a normal observation or instance.
In this notebook we will see how we can use the OneClassSVM and OneClassSVMPredict functions available in Vantage.
In the section, we import the required libraries and set environment variables and environment paths (if required)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5af5af3-29d5-4f6a-8334-9df6924e7787", + "metadata": {}, + "outputs": [], + "source": [ + "from teradataml import *\n", + "\n", + "# Modify the following to match the specific client environment settings\n", + "display.max_rows = 5" + ] + }, + { + "cell_type": "markdown", + "id": "ad3dd7b4-831c-4fb3-ab71-719c8c99a71c", + "metadata": {}, + "source": [ + "
1.1 Connect to Vantage
\n", + "You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2742444c-4349-4b0f-b4e5-b068a8785cd9", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/startup.ipynb\n", + "eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)\n", + "print(eng)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e14915b0-7932-4e03-94ba-20f0599c3707", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "execute_sql('''SET query_band='DEMO=PP_OneClassSVM_and_OneClassSVMPredict_Python.ipynb;' UPDATE FOR SESSION; ''')" + ] + }, + { + "cell_type": "markdown", + "id": "efe2fd2d-63ff-4278-9157-8b9110d682e8", + "metadata": {}, + "source": [ + "Begin running steps with Shift + Enter keys.
" + ] + }, + { + "cell_type": "markdown", + "id": "f003f332-7489-4bdd-a740-4af2a0a22280", + "metadata": {}, + "source": [ + "1.2 Getting Data for This Demo
\n", + "\n", + "Here, we will get the data which is available in the teradataml library and use the same to show the usage of the function.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45c86176-734c-4b1c-ace0-d0c88657b4f8", + "metadata": {}, + "outputs": [], + "source": [ + "load_example_data(\"teradataml\", [\"cal_housing_ex_raw\"])" + ] + }, + { + "cell_type": "markdown", + "id": "2401d6d3-4fcd-46fc-8a94-7cafcd1258b0", + "metadata": {}, + "source": [ + "Next is an optional step – if you want to see the status of databases/tables created and space used.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87429200-db02-450d-9472-4d1e2030124d", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/run_procedure.py \"call space_report();\" # Takes 10 seconds" + ] + }, + { + "cell_type": "markdown", + "id": "2a3762ac-ba27-4fa3-adba-d577262a4290", + "metadata": {}, + "source": [ + "Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d936fab-7ca7-4e94-ba64-95c1da08b74f", + "metadata": {}, + "outputs": [], + "source": [ + "tdf = DataFrame.from_table(\"cal_housing_ex_raw\")\n", + "print(\"Shape of the data: \", tdf.shape)\n", + "tdf" + ] + }, + { + "cell_type": "markdown", + "id": "12db89ee-95dc-4a07-b734-b7f6d76a7013", + "metadata": {}, + "source": [ + "Scaling the data using Scalefit and ScaleTransform functions
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1befb2c6-b9cb-4382-8836-7da223dd0e7b", + "metadata": {}, + "outputs": [], + "source": [ + "# Scale \"target_columns\" with respect to 'STD' value of the column.\n", + "fit_obj = ScaleFit(data=tdf,\n", + " target_columns=['MedInc', 'HouseAge', 'AveRooms',\n", + " 'AveBedrms', 'Population', 'AveOccup',\n", + " 'Latitude', 'Longitude'],\n", + " scale_method=\"STD\")\n", + " \n", + " # Transform the data.\n", + "transform_obj = ScaleTransform(data=tdf,\n", + " object=fit_obj.output,\n", + " accumulate=[\"id\", \"MedHouseVal\"])" + ] + }, + { + "cell_type": "markdown", + "id": "0d0adaf2-461e-48ff-87ce-b6038db8254a", + "metadata": {}, + "source": [ + "Creating OneClassSVM model to find anomalies.
Detailed help can be found by passing function name to built-in help function.
Predict the values if they are anomalies or not using the model created above by OneClassSVMPredict function.
\n",
+ "Detailed help can be found by passing function name to built-in help function.
Check if data had anomalies by looking at the result.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "928f4460-4ee5-4b1f-8a20-7736f37489e8", + "metadata": {}, + "outputs": [], + "source": [ + "a = OneClassSVMPredict_out.result\n", + "a[a.prediction == 0]" + ] + }, + { + "cell_type": "markdown", + "id": "343cb73a-60fa-4592-86d8-c914b0dedd1f", + "metadata": {}, + "source": [ + "From above result we can see the ids which had anomalies.
" + ] + }, + { + "cell_type": "markdown", + "id": "151d5db4-29a9-49d9-8a61-d53f9627a294", + "metadata": {}, + "source": [ + "Databases and Tables
\n", + "The following code will clean up tables and databases created above.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6b3935b-47c2-4a96-bec2-68106d172116", + "metadata": {}, + "outputs": [], + "source": [ + "db_drop_table(\"cal_housing_ex_raw\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "157fe3d4-4e0e-4d92-b343-9f758f3bf690", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "4317a6cf-1479-4aa8-b30a-ee0a3b5231a8", + "metadata": {}, + "source": [ + "Links:
\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "b2dcca28-5de5-44d7-88cb-45a12153b3f8", + "metadata": {}, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}