Lake Use Case Demos

Use Case subfolder. This contains additional demos built on the parent environment framework.

Notebooks in this folder use the environment setup file vars.json in the parent directory. All Use Cases have their own data loading script which will generally load data from an S3 bucket. Bucket information and keys are read from the vars.json environment file.

Available Demos List

1. Native KMeans Clustering (Native-KMeans/KMeans_Clustering_Python.ipynb)

Python 3.8 Notebook
Creates and evaluates a customer segmentation model using a retail data set, with feature engineering, and "operational" feature engineering using fit/transform and fit/columntransformer.

Log in as data_scientist user
ColumnSummary
Fit Tables - OutlierFilter, SimpleImputer, NonLinearCombine, ScaleFit
Transform - ColumnTransformer, ScaleFitTransform
KMeans, KMeansPredict
Silhouette for model evaluation

2. Native GLM Numeric Regression (Native-GLM-Regression/Regression_Python.ipynb)

Python 3.8 Notebook
Creates and scale a numeric-only feature set using historic housing price data to predict sales price

Log in as data_scientist user
AntiSelect to select features
SimpleImputer and ScaleFit/Transform to prepare data set
Native GLM/GLMPredict
Regression Evaluator

3. Sentiment Analysis using Native functions (Native-Sentiment-Analysis/Sentiment_Analysis_Python.ipynb)

Python 3.8 Notebook
Uses an Amazon Fine Foods data set to generate a Sentiment score for reviews, and then compares the generated sentiment to the "star" rating the reviewer gave the product

Log in as data_scientist user
Execute the SentimentExtractor Function
Analyze the results - use Bincode Fit/Transform to create a categoric sentiment out of author "rating"
Use ClassificationEvaluator to evaluate results, generate a heatmap.

4. Text Classification using Native functions (Native-Text-Classification/Python_Text_Classification.ipynb)

Python 3.8 Notebook
Uses an Amazon Fine Foods data set as input to a Naive Bayes Text Classification workflow to predict author ratings value. Clean up the data set, tokenize the review column, and train/evaluate a native NBTC model, evaluate the results.

Log in as data_scientist user
ConvertTo to convert numeric to categorical rating
TextParser to tokenize data
Native NBTC train/predict
Evaluate using Classification Evaluator, plot a heatmap

5. Churn Prediction using Native Data Prep, VAL, model training XGBOOST, scoring with BYOM OR OAF (Churn-Prediction-OAF/Churn-Prediction-OAF.ipynb)

Python 3.8 Notebook
Note 2 - Just below the imports is a function definition - this function converts nPath format to Plotly Sankey, which is nice to visualize the user paths in the notebook. This cell can be hidden by selecting the cell and Selecting View->Collapse Selected Cell.

Uses the "Retail DSE" Data Set to show sessionize/npath to create churn predictors, Sentiment Analysis on comments to create sentiment polarity, then native functions for one hot encoding, VAL transformations, to train and test an open source xgboost model.
Data Prep and Model Training

Log in as data_scientist user
Connect to source data - use VALIDTIME for customer records
Sessionize web data, use NPath to identify final event, and two events preceeding
Use VAL to One-hot Encode the events, use teradataml assign to binary-code Churn
Use Native SentimentExtractor, OneHotEncodingFit/Transform, ConvertTo, ScaleFit to create sentiment polarity
Use teradataml/pandas to join, VAL to fill NULLs and create final analytic data set
Train/Test Split
Train open-source XGBoost model, evaluate locally
Save the model for use in OAF AND in BYOM

**Model scoring using PMMLPredict** 1. Use teradataml methods for loading the model and metadata 2. Execute PMMLPredict as a teradataml DataFrame 3. Evaluate accuracy
**Model scoring using APPLY/OAF** 1. Connect to existing runtime, or create a new one 2. Install libraries 3. Upload model and scoring code 4. Execute APPLY table operator 5. Evaluate model accuracr

6. System Scaling and Monitoring Demos (Scaling/Demo...ipynb)

Python 3.8 Notebooks
Note 2 - The system scaling aspect of the demo requires a compute profile with scaling capabilities. If the user is using the default vars.json, the proper compute group and profile will be created automatically in Environment Setup and Automation.

Demo 1 - Generate Workload.ipynb

Workload Profile Setup - Define the queries, concurrency, and duration of run
Workload Execution - Submit the workload job for parallel execution
Thread monitoring and control - Monitor the status of the connections, stop them if desired

**Demo 2 - Real-Time Monitoring.ipynb** 1. Connect to the VantageCloud Lake System - Connect as a user with access to the metrics service and performance monitoring functions 2. Key Metrics Queries - Queries that monitor active users, Cluster CPU stats, and number of instances 3. Dashboard - Update and plot stats every three seconds
**Demo 3 - System Monitoring Queries.ipynb** 1. Connect to the VantageCloud Lake System - Connect as a user with access to the metrics service and performance monitoring functions 2. Current Resource Utilization - The current system utilization 3. Historic Resource Utilization - Queries showing how to query historical resource usage data 4. Cluster Events - Queries to analyze what compute resources were available when 5. Active User and Session Monitoring - For active sessions, queries that monitor users and SQL steps and text. This requires a running workload provided in Demo 1 6. Query Logging - Database Query Logs

7. Proximity to Climate Risk (Proximity-To-Climate-Risk/Proximity...ipynb)

Python 3.8 Notebooks

This demonstration notebook will illustrate how analysts can leverage location and proximity information at scale to analyze which specific addresses are within a certain proximity to flooding.

Uses flood zone data from the 2023 flooding in North New Zealand as a basis for calculating the risk of over 6 million individual addresses
Shows common open-source and client-side approaches to Geospatial Analysis
Contrasts the same capabilities in-database to run at extreme scale and performance
Offers interactive map-based visualizations

8. Using Vector Embeddings for Customer Segmentation (UseCases/Vector-Embeddings-Segmentation/Segmentation...ipynb)

Python 3.8 Notebook

This demonstration notebook illustrates how vector embedding can be used to create predictive features - in this case an optimum customer segmentation based on text similarity in customer reviews.

Uses the native TD_WordEmbeddings function to vectorize retail customer comments based on the GloVe 6B 50d word vector model
Passes this vector table to multiple iterations of the kmeans function
Displays the series of WCSS values in order to identify the "elbow" point which would indicate an ideal number of segments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Lake Use Case Demos

1. Native KMeans Clustering (Native-KMeans/KMeans_Clustering_Python.ipynb)

2. Native GLM Numeric Regression (Native-GLM-Regression/Regression_Python.ipynb)

3. Sentiment Analysis using Native functions (Native-Sentiment-Analysis/Sentiment_Analysis_Python.ipynb)

4. Text Classification using Native functions (Native-Text-Classification/Python_Text_Classification.ipynb)

5. Churn Prediction using Native Data Prep, VAL, model training XGBOOST, scoring with BYOM OR OAF (Churn-Prediction-OAF/Churn-Prediction-OAF.ipynb)

6. System Scaling and Monitoring Demos (Scaling/Demo...ipynb)

7. Proximity to Climate Risk (Proximity-To-Climate-Risk/Proximity...ipynb)

8. Using Vector Embeddings for Customer Segmentation (UseCases/Vector-Embeddings-Segmentation/Segmentation...ipynb)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Lake Use Case Demos

1. Native KMeans Clustering (Native-KMeans/KMeans_Clustering_Python.ipynb)

2. Native GLM Numeric Regression (Native-GLM-Regression/Regression_Python.ipynb)

3. Sentiment Analysis using Native functions (Native-Sentiment-Analysis/Sentiment_Analysis_Python.ipynb)

4. Text Classification using Native functions (Native-Text-Classification/Python_Text_Classification.ipynb)

5. Churn Prediction using Native Data Prep, VAL, model training XGBOOST, scoring with BYOM OR OAF (Churn-Prediction-OAF/Churn-Prediction-OAF.ipynb)

6. System Scaling and Monitoring Demos (Scaling/Demo...ipynb)

7. Proximity to Climate Risk (Proximity-To-Climate-Risk/Proximity...ipynb)

8. Using Vector Embeddings for Customer Segmentation (UseCases/Vector-Embeddings-Segmentation/Segmentation...ipynb)