Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSAE Bot for search across the notebooks #807

Merged
merged 9 commits into from
Feb 4, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .CSAE_Bot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
inputs:
- type: env
value: 'JUPYTER_NOTEBOOK_CI_OPEN_AI_KEY'
cell: 8
- type: text
value: 'exit'
prompt: "Enter your query here. To stop, type 'exit'. Query:"
361 changes: 361 additions & 0 deletions ExperienceBot/ExperienceBot.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fc92e86f-c5a7-499d-89e2-49046aa01be7",
"metadata": {},
"source": [
"<header>\n",
" <p style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>\n",
" Experience-Bot: Quickly find your demos of interest by just typing\n",
" <br>\n",
" <img id=\"teradata-logo\" src=\"https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg\" alt=\"Teradata\" style=\"width: 125px; height: auto; margin-top: 20pt;\">\n",
" </p>\n",
"</header>"
]
},
{
"cell_type": "markdown",
"id": "d6c42cf3-ab6f-480d-943e-75be813f912b",
"metadata": {},
"source": [
"<hr style=\"height:2px;border:none;background-color:#00233C;\">\n",
"\n",
"<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>1. Install required libraries</b></p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d74eee36-e3e8-42ca-9253-4e5ce8178c7b",
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"\n",
"!pip install openai langchain langchain-openai langchain-community requests faiss-cpu panel==1.3.4"
]
},
{
"cell_type": "markdown",
"id": "87d04ebc-4e5c-4e11-8094-fef11cf0b017",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
" <p style='font-size:16px;font-family:Arial;color:#00233C'><b>Note:</b> Please restart the kernel. The simplest way is by typing <b>0 0</b> (zero zero) and then pressing <i><b>enter</b></i></p>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98cfddce-a2b3-4d7d-aae4-1f2dbc463022",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import json\n",
"import re\n",
"\n",
"# genAI\n",
"import openai\n",
"from langchain.document_loaders import DirectoryLoader\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import FAISS\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.schema import Document\n",
"\n",
"from langchain_community.document_loaders import NotebookLoader\n",
"from langchain_community.document_loaders import DirectoryLoader\n",
"\n",
"from langchain_openai import ChatOpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains import RetrievalQA\n",
"\n",
"\n",
"# display\n",
"from IPython.display import display, Markdown, HTML, Javascript\n",
"\n",
"#teradataml\n",
"from teradataml import *"
]
},
{
"cell_type": "markdown",
"id": "1320f4b6-15ab-4d5b-8bcd-56fb9154d5ca",
"metadata": {},
"source": [
"<hr style=\"height:1px;border:none;background-color:#00233C;\">\n",
"\n",
"<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.1 Connect to Vantage</b></p>\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "882f4318-bbb0-4214-ab97-316ded717794",
"metadata": {},
"outputs": [],
"source": [
"%run -i ../UseCases/startup.ipynb\n",
"eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)\n",
"print(eng)\n",
"execute_sql('''SET query_band='DEMO= ExperienceBot.ipynb;' UPDATE FOR SESSION;''')"
]
},
{
"cell_type": "markdown",
"id": "fd6a96fb-38bb-4f7c-b01b-a6fff11dbbc6",
"metadata": {},
"source": [
"<hr style=\"height:1px;border:none;background-color:#00233C;\">\n",
"\n",
"<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.2 Set OpenAI key to environment</b></p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63f65a6a-cb8e-4ce4-a8ba-c59c95e55db3",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"OPENAI_API_KEY\"):\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")"
]
},
{
"cell_type": "markdown",
"id": "b4fe3299-d15d-48c8-ae02-a80968c9ad1f",
"metadata": {},
"source": [
"<hr style=\"height:2px;border:none;background-color:#00233C;\">\n",
"\n",
"<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>2. Create the embeddings or load from existing</b></p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b661c301-cfe3-4389-a05e-e996336684e8",
"metadata": {},
"outputs": [],
"source": [
"from chat_helper_db import *"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f34de3e-b917-4b5d-9448-c9341a9980fa",
"metadata": {},
"outputs": [],
"source": [
"def load_emb():\n",
" # Load the FAISS index with dangerous deserialization enabled\n",
" # Create vector store using embeddings\n",
" embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\") \n",
"\n",
" return FAISS.load_local(\n",
" \"notebooks_index\", embeddings, allow_dangerous_deserialization=True\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "2f65df37-f51b-45cf-9aba-7058af4447b8",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: You do not have to run the next cell multiple times. Each time it is executed it will generate over 1M embeddings and the charge is typically $0.02USD / 1M tokens</b>.</p></div>\n",
" \n",
"<div class=\"alert alert-block alert-info\">\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Note: If you have previously run this notebook, please select <b>No</b> in response to the following question.</p></div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c357a955-d49d-4fec-99a5-6c42f9d204bf",
"metadata": {},
"outputs": [],
"source": [
"# Request user's input\n",
"generate = input(\"Do you want to generate embeddings? ('yes'/'no'): \")\n",
"\n",
"# Check the user's input\n",
"if generate.lower() == \"yes\":\n",
" vector_store = generate_emb()\n",
"elif generate.lower() == \"no\":\n",
" try:\n",
" print('Loading existing embeddings...')\n",
" vector_store = load_emb()\n",
" print('Embeddings are loaded now...')\n",
" except:\n",
" print('Embeddings not found, generating now..')\n",
" generate_emb()\n",
" vector_store = load_emb()"
]
},
{
"cell_type": "markdown",
"id": "5d754281-77b6-4f68-a41d-2323cdc020ad",
"metadata": {},
"source": [
"<hr style=\"height:2px;border:none;background-color:#00233C;\">\n",
"<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>3. Define a RAG Pipeline</b></p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6dbb1de5-2966-474f-86a4-a1aff224e9e8",
"metadata": {},
"outputs": [],
"source": [
"# Custom Prompt Template\n",
"CUSTOM_PROMPT = \"\"\"\n",
"You are a helpful assistant. Use the following retrieved information from Jupyter notebooks to provide:\n",
"1. A **clean and concise textual explanation** based on the question and notebook markdown.\n",
"2. **Relevant Clean Python code** extracted from the notebooks' code cells that are related to the question. Please filter the code that is related to the query.\n",
"3. Extract the source documents.\n",
"If no relevant information is found, politely say so.\n",
"\n",
"*Critical*: start by greeting only if user starts with greeting, just say, \"Hey there! 😊 Welcome to our chatbot!\"\n",
"\n",
"Context:\n",
"{context}\n",
"\n",
"Question:\n",
"{question}\n",
"\n",
"Your response should be in below format:\n",
"##Answer:\n",
"##Relevant Code:\n",
"##Source documents:\n",
"\"\"\"\n",
"\n",
"prompt = PromptTemplate(\n",
" input_variables=[\"context\", \"question\"],\n",
" template=CUSTOM_PROMPT\n",
")\n",
"\n",
"# Make sure to use a Chat model like 'gpt-4' or 'gpt-3.5-turbo'\n",
"chat_model = ChatOpenAI(model=\"gpt-4o-mini\")\n",
"\n",
"# Retrieval QA Chain\n",
"qa_chain = RetrievalQA.from_chain_type(\n",
" llm=chat_model,\n",
" retriever=vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 10}), # Assuming vector_store is your vector database\n",
" return_source_documents=True,\n",
" chain_type_kwargs={\"prompt\": prompt}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b8e3755-2398-4e51-9479-3fe810f34beb",
"metadata": {},
"outputs": [],
"source": [
"import panel as pn\n",
"pn.extension(design=\"material\")\n",
"\n",
"# panel callback function\n",
"def callback(contents, user, instance):\n",
" response = qa_chain.invoke(contents)\n",
" result = response['result']\n",
" \n",
" source_docs = response.get(\"source_documents\", [])\n",
" sources = \"\\n\".join(set([doc.metadata.get(\"source\", \"Unknown\") for doc in source_docs]))\n",
" html_output = extract_answer_code_references(sources)\n",
" result = result + \"\\n\\n\" + html_output\n",
" return result\n",
"\n",
"\n",
"pn.chat.ChatInterface(\n",
" callback=callback,\n",
" show_rerun=False,\n",
" show_undo=False,\n",
" show_clear=False,\n",
" width=1200,\n",
" height=400,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "28cdf87b-3b88-4acb-8a8d-4d9f2c6c3b97",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
" <p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>To ensure that the Chatbot interface reflects the latest changes, please reload the page by clicking the 'Reload' button or pressing F5 on your keyboard for <b>first-time only</b> This will update the notebook with the latest modifications, and you'll be able to interact with the Chatbot using the new libraries.</i></p>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"id": "f443eeaf-b913-4dd4-839b-e950b228a3c5",
"metadata": {},
"source": [
"<hr style=\"height:2px;border:none;background-color:#00233C;\">\n",
"<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>4. You can try your own question</b></p>\n",
"\n",
"\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here are some sample questions that you can try out:</p>\n",
"\n",
"<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>\n",
" <li>How VectorDistance works?</li>\n",
" <li>What is Script table operator?</li>\n",
" <li>Give me demos which have AWS Bedrock?</li>\n",
" <li>What is GEOSEQUENCE? Show me some examples</li>\n",
" <li>Which notebooks are using OpenAI?</li>\n",
" <li>Which notebooks are about fraud detection?</li>\n",
" <li>How to use TDApiClient to generate the embeddings?</li>\n",
" <li>Show me demo for Broken digital Journey?</li>\n",
"</ol>"
]
},
{
"cell_type": "markdown",
"id": "2fc3a070-33cf-4787-8c66-cc221075fdd5",
"metadata": {},
"source": [
"<footer style=\"padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C\">\n",
" <div style=\"float:left;margin-top:14px\">ClearScape Analytics™</div>\n",
" <div style=\"float:right;\">\n",
" <div style=\"float:left; margin-top:14px\">\n",
" Copyright © Teradata Corporation - 2025. All Rights Reserved\n",
" </div>\n",
" </div>\n",
"</footer>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading