Use Case subfolder. This contains additional demos built on the parent environment framework.
Notebooks in this folder use the environment setup file vars.json in the parent directory. All Use Cases have their own data loading script which will generally load data from an S3 bucket. Bucket information and keys are read from the vars.json environment file.
Available Demos List
Python 3.8 Notebook
Creates and evaluates a customer segmentation model using a retail data set, with feature engineering, and "operational" feature engineering using fit/transform and fit/columntransformer.
- Log in as data_scientist user
- ColumnSummary
- Fit Tables - OutlierFilter, SimpleImputer, NonLinearCombine, ScaleFit
- Transform - ColumnTransformer, ScaleFitTransform
- KMeans, KMeansPredict
- Silhouette for model evaluation
Python 3.8 Notebook
Creates and scale a numeric-only feature set using historic housing price data to predict sales price
- Log in as data_scientist user
- AntiSelect to select features
- SimpleImputer and ScaleFit/Transform to prepare data set
- Native GLM/GLMPredict
- Regression Evaluator
3. Sentiment Analysis using Native functions (Native-Sentiment-Analysis/Sentiment_Analysis_Python.ipynb)
Python 3.8 Notebook
Uses an Amazon Fine Foods data set to generate a Sentiment score for reviews, and then compares the generated sentiment to the "star" rating the reviewer gave the product
- Log in as data_scientist user
- Execute the SentimentExtractor Function
- Analyze the results - use Bincode Fit/Transform to create a categoric sentiment out of author "rating"
- Use ClassificationEvaluator to evaluate results, generate a heatmap.
4. Text Classification using Native functions (Native-Text-Classification/Python_Text_Classification.ipynb)
Python 3.8 Notebook
Uses an Amazon Fine Foods data set as input to a Naive Bayes Text Classification workflow to predict author ratings value. Clean up the data set, tokenize the review column, and train/evaluate a native NBTC model, evaluate the results.
- Log in as data_scientist user
- ConvertTo to convert numeric to categorical rating
- TextParser to tokenize data
- Native NBTC train/predict
- Evaluate using Classification Evaluator, plot a heatmap
5. Churn Prediction using Native Data Prep, VAL, model training XGBOOST, scoring with BYOM OR OAF (Churn-Prediction-OAF/Churn-Prediction-OAF.ipynb)
Python 3.8 Notebook
Note 2 - Just below the imports is a function definition - this function converts nPath format to Plotly Sankey, which is nice to visualize the user paths in the notebook. This cell can be hidden by selecting the cell and Selecting View->Collapse Selected Cell.
Uses the "Retail DSE" Data Set to show sessionize/npath to create churn predictors, Sentiment Analysis on comments to create sentiment polarity, then native functions for one hot encoding, VAL transformations, to train and test an open source xgboost model.
Data Prep and Model Training
- Log in as data_scientist user
- Connect to source data - use VALIDTIME for customer records
- Sessionize web data, use NPath to identify final event, and two events preceeding
- Use VAL to One-hot Encode the events, use teradataml assign to binary-code Churn
- Use Native SentimentExtractor, OneHotEncodingFit/Transform, ConvertTo, ScaleFit to create sentiment polarity
- Use teradataml/pandas to join, VAL to fill NULLs and create final analytic data set
- Train/Test Split
- Train open-source XGBoost model, evaluate locally
- Save the model for use in OAF AND in BYOM
**Model scoring using PMMLPredict** 1. Use teradataml methods for loading the model and metadata 2. Execute PMMLPredict as a teradataml DataFrame 3. Evaluate accuracy
**Model scoring using APPLY/OAF** 1. Connect to existing runtime, or create a new one 2. Install libraries 3. Upload model and scoring code 4. Execute APPLY table operator 5. Evaluate model accuracr
Python 3.8 Notebooks
Note 2 - The system scaling aspect of the demo requires a compute profile with scaling capabilities. If the user is using the default vars.json, the proper compute group and profile will be created automatically in Environment Setup and Automation.
Demo 1 - Generate Workload.ipynb
- Workload Profile Setup - Define the queries, concurrency, and duration of run
- Workload Execution - Submit the workload job for parallel execution
- Thread monitoring and control - Monitor the status of the connections, stop them if desired
**Demo 2 - Real-Time Monitoring.ipynb** 1. Connect to the VantageCloud Lake System - Connect as a user with access to the metrics service and performance monitoring functions 2. Key Metrics Queries - Queries that monitor active users, Cluster CPU stats, and number of instances 3. Dashboard - Update and plot stats every three seconds
**Demo 3 - System Monitoring Queries.ipynb** 1. Connect to the VantageCloud Lake System - Connect as a user with access to the metrics service and performance monitoring functions 2. Current Resource Utilization - The current system utilization 3. Historic Resource Utilization - Queries showing how to query historical resource usage data 4. Cluster Events - Queries to analyze what compute resources were available when 5. Active User and Session Monitoring - For active sessions, queries that monitor users and SQL steps and text. This requires a running workload provided in Demo 1 6. Query Logging - Database Query Logs
Python 3.8 Notebooks
This demonstration notebook will illustrate how analysts can leverage location and proximity information at scale to analyze which specific addresses are within a certain proximity to flooding.
- Uses flood zone data from the 2023 flooding in North New Zealand as a basis for calculating the risk of over 6 million individual addresses
- Shows common open-source and client-side approaches to Geospatial Analysis
- Contrasts the same capabilities in-database to run at extreme scale and performance
- Offers interactive map-based visualizations
8. Using Vector Embeddings for Customer Segmentation (UseCases/Vector-Embeddings-Segmentation/Segmentation...ipynb)
Python 3.8 Notebook
This demonstration notebook illustrates how vector embedding can be used to create predictive features - in this case an optimum customer segmentation based on text similarity in customer reviews.
- Uses the native TD_WordEmbeddings function to vectorize retail customer comments based on the GloVe 6B 50d word vector model
- Passes this vector table to multiple iterations of the kmeans function
- Displays the series of WCSS values in order to identify the "elbow" point which would indicate an ideal number of segments