diff --git a/images/Grounding_new01.png b/images/Grounding_new01.png
new file mode 100644
index 0000000..695c058
Binary files /dev/null and b/images/Grounding_new01.png differ
diff --git a/images/Grounding_new02.png b/images/Grounding_new02.png
new file mode 100644
index 0000000..36dd02f
Binary files /dev/null and b/images/Grounding_new02.png differ
diff --git a/images/Grounding_new03.png b/images/Grounding_new03.png
new file mode 100644
index 0000000..85f2f1a
Binary files /dev/null and b/images/Grounding_new03.png differ
diff --git a/images/VertexAIIntro01.png b/images/VertexAIIntro01.png
new file mode 100644
index 0000000..a9c2dde
Binary files /dev/null and b/images/VertexAIIntro01.png differ
diff --git a/images/VertexAIIntro02.png b/images/VertexAIIntro02.png
new file mode 100644
index 0000000..732caa8
Binary files /dev/null and b/images/VertexAIIntro02.png differ
diff --git a/images/VertexAIIntro03.png b/images/VertexAIIntro03.png
new file mode 100644
index 0000000..db7a9ce
Binary files /dev/null and b/images/VertexAIIntro03.png differ
diff --git a/images/VertexAIIntro04.png b/images/VertexAIIntro04.png
new file mode 100644
index 0000000..f830505
Binary files /dev/null and b/images/VertexAIIntro04.png differ
diff --git a/images/VertexAIIntro05.png b/images/VertexAIIntro05.png
new file mode 100644
index 0000000..63455a8
Binary files /dev/null and b/images/VertexAIIntro05.png differ
diff --git a/images/VertexAIStudioGCP_new01.png b/images/VertexAIStudioGCP_new01.png
new file mode 100644
index 0000000..cfe662d
Binary files /dev/null and b/images/VertexAIStudioGCP_new01.png differ
diff --git a/images/VertexAIStudioGCP_new05.png b/images/VertexAIStudioGCP_new05.png
new file mode 100644
index 0000000..3608e81
Binary files /dev/null and b/images/VertexAIStudioGCP_new05.png differ
diff --git a/images/VertexAIStudioGCP_new06.png b/images/VertexAIStudioGCP_new06.png
new file mode 100644
index 0000000..2f1585f
Binary files /dev/null and b/images/VertexAIStudioGCP_new06.png differ
diff --git a/images/VertexAIStudioGCP_new07.png b/images/VertexAIStudioGCP_new07.png
new file mode 100644
index 0000000..8392f73
Binary files /dev/null and b/images/VertexAIStudioGCP_new07.png differ
diff --git a/images/gcp_rag_structure_data_01.png b/images/gcp_rag_structure_data_01.png
new file mode 100644
index 0000000..a9cedd8
Binary files /dev/null and b/images/gcp_rag_structure_data_01.png differ
diff --git a/notebooks/GenAI/GCP_Grounding.ipynb b/notebooks/GenAI/GCP_Grounding.ipynb
index 163c483..2f8b6a5 100644
--- a/notebooks/GenAI/GCP_Grounding.ipynb
+++ b/notebooks/GenAI/GCP_Grounding.ipynb
@@ -121,7 +121,7 @@
"id": "d2d4afa0-bdf7-4afc-8b74-d3e791bbaebd",
"metadata": {},
"source": [
- "![grounding3](../../images/grounding_3.png)"
+ "![grounding3](../../images/Grounding_new01.png)"
]
},
{
@@ -169,7 +169,7 @@
"id": "1df54267-b65c-44cd-a869-9723b7302771",
"metadata": {},
"source": [
- "For this tutorial we'll select our data as unstructured but it also supports structured data in JSONL format and CSVs. Next choose your bucket and click **'CONTINUE'**."
+ "For this tutorial we'll select our data as unstructured but it also supports structured data in CSVs format. Next choose your bucket and click **'CONTINUE'**."
]
},
{
@@ -177,7 +177,7 @@
"id": "74e88355-7146-44a5-b30d-9907676dbb02",
"metadata": {},
"source": [
- "![grounding6](../../images/grounding_6.png)"
+ "![grounding6](../../images/Grounding_new02.png)"
]
},
{
@@ -249,7 +249,7 @@
"id": "369c112d-2376-43fa-9552-9498479be9fd",
"metadata": {},
"source": [
- "On the console go to Vertex AI and head down to Vertex AI Studio where the playground is. Select the **Language** playground. Grounding works on Text Prompt and Text Chat but for this tutorial we will use **Text Chat**."
+ "On the console go to Vertex AI and head down to Vertex AI Studio where the playground is. Select the **Chat** playground. "
]
},
{
@@ -257,7 +257,7 @@
"id": "8c971978-4925-427c-900c-0e8a996c0c91",
"metadata": {},
"source": [
- "![grounding11](../../images/grounding_11.png)"
+ "![grounding11](../../images/Grounding_new03.png)"
]
},
{
diff --git a/notebooks/GenAI/GCP_RAG_for_Structure_Data.ipynb b/notebooks/GenAI/GCP_RAG_for_Structure_Data.ipynb
new file mode 100644
index 0000000..a5698c7
--- /dev/null
+++ b/notebooks/GenAI/GCP_RAG_for_Structure_Data.ipynb
@@ -0,0 +1,625 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "6e38b3f2-2b29-4788-a29a-f9e5d10f8365",
+ "metadata": {},
+ "source": [
+ "##### __Skill Level:__ Intermediate"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6c0533e3-4fbd-47b1-8458-fd193bcb263f",
+ "metadata": {},
+ "source": [
+ "# Creating a Chatbot for Structure Data using a RAG"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "76cf4b02-6422-4a48-882c-c00315e34c8a",
+ "metadata": {},
+ "source": [
+ "## Overview"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7b7870dd-09e0-4f6a-91a9-7c30529af720",
+ "metadata": {},
+ "source": [
+ "Generative AI (GenAI) represents a transformative technology capable of producing human-like text, images, code, and various other types of content. While much of the focus has been on unstructured data—such as PDFs, text documents, image files, and websites—many GenAI implementations rely on a parameter called \"top K.\" This algorithm retrieves only the highest-scoring pieces of content relevant to a user's query, which can be limiting. Users seeking insights from structured data formats like CSV and JSON often require access to all relevant occurrences, rather than just a subset.\n",
+ "\n",
+ "In this tutorial, we use a RAG agent together with a VertexAI chat model gemini-1.5-pro to query the BigQuery table. A Retrieval Augmented Generation (RAG) agent is a key part of a RAG application that enhances the capabilities of the large language models (LLMs) by integrating external data retrieval. AI agents empower LLMs to interact with the world through actions and tools. The agent connects to BigQuery, runs the query, and then sends the results to the model. The model processes the data and generates the final output that it is displayed for the user.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d513a9fe-1ac4-4610-b53c-6696e461f828",
+ "metadata": {},
+ "source": [
+ "## Prerequisites"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fbdb09aa-73d6-4cee-b3e7-5d6b243e3d58",
+ "metadata": {},
+ "source": [
+ "In this tutorial, we will be using Google Gemini Pro 1.5, which does not require deployment. However, if you prefer to use a different model, you can select one from the Model Garden via the console. This will allow you to add the model to your registry, create an endpoint (or utilize an existing one), and deploy the model—all in a single step. Here is a link for more information: [model deployment](https://cloud.google.com/vertex-ai/docs/general/deployment).\n",
+ "\n",
+ "Before we begin, you'll need to create a Vertex AI RAG Data Service Agent service account. To do this, go to the IAM section of the console. Ensure you check the box for \"Include Google-provided role grant.\" If the role is not listed, click \"Grant Access\" and add \"Vertex AI RAG Data Service Agent\" as a role. Additionally, to create datasets and tables in BigQuery, the user must have the __bigquery.datasets.create__ and __bigquery.tables.create__ permissions.\n",
+ "\n",
+ "We are using a e2-standard-4 (Efficient Instance: 4 vCPUs, 16 GB RAM) for this tutorial. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "eeb24716-0db8-462f-8faa-0b8ad9d61feb",
+ "metadata": {},
+ "source": [
+ "## Learning objectives"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f5216a43-31ee-4c3e-821a-4b81f4be8e54",
+ "metadata": {},
+ "source": [
+ "In this tutorial you will learn about:\n",
+ "- How to set up a BigQuery dataset and a table.\n",
+ "- How to load data to a BigQuery table.\n",
+ "- How to use the langchain ChatVertexAI agent to extract information from the table. \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "348910aa-9802-4318-8fcb-86603158923b",
+ "metadata": {},
+ "source": [
+ "## Pricing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0dca31b2-880d-4cbc-bf76-ede96998e371",
+ "metadata": {},
+ "source": [
+ "If you are following this tutorial in one sitting it will cost ~$0.80 . Completing the process in multiple sessions or using a method different from the tutorial may result in increased costs."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c7d9e3ee-1c7e-4baa-a818-c13330c665dd",
+ "metadata": {},
+ "source": [
+ "## Get Started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "137defcd-fd2b-421d-9b2e-fe2b878fb325",
+ "metadata": {},
+ "source": [
+ "### Install Packages"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "55c14a38-ba51-466d-90b5-1512f5a7fae0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip install google-cloud-bigquery\n",
+ "!pip install gcloud\n",
+ "!pip install langchain\n",
+ "!pip install -U langchain-community\n",
+ "!pip install -qU langchain_google_vertexai\n",
+ "!pip install pandas-gbq\n",
+ "!pip install sqlalchemy-bigquery"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "64e22773-9a03-47de-9b6d-7946abcba722",
+ "metadata": {},
+ "source": [
+ "Set your project id, location, and bucket variables."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7193a9ac-3169-45c1-9c5d-ba5e2f31458f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "project_id=''\n",
+ "location=' (e.g.us-east4)'\n",
+ "bucket = ''\n",
+ "dataset_name = ''\n",
+ "table_name = \"\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "df0d9eda-ee54-4c4c-86e6-9166b2b82e2d",
+ "metadata": {},
+ "source": [
+ "Create a bucket where the data used for tutorial is going to be placed. It is important to keep in mind that the name of the bucket has to be unique."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7b583234-2d92-4e5b-841d-c7d79dd87d45",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# make bucket\n",
+ "!gsutil mb -l {location} gs://{bucket}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5781a07c-84ed-416c-a782-b1dc881e5c18",
+ "metadata": {},
+ "source": [
+ "Once the bucket is created, we need to access the CSV source file. In this tutorial, we transferred the data file to our Jupyter notebook by simply dragging and dropping it from my local folder. Next, we need to specify the bucket name and the path of the data source in order to upload the CSV file to the bucket. \n",
+ "\n",
+ "We are using the [health screening](https://www.kaggle.com/datasets/drateendrajha/health-screening-data) dataset from kaggle for this tutorial."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4829094e-61a6-4dd7-854f-6b21dc19b1b7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gsutil cp '' gs://{bucket}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3cb53613-326d-4608-8c55-cf9bef115d88",
+ "metadata": {},
+ "source": [
+ "### Create a database in BigQuery"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7a00dc83-8713-4ea5-9345-08a9f3d785d0",
+ "metadata": {},
+ "source": [
+ "The next step is to create a database in BigQuery. To accomplish this, we need to define a dataset_id and construct a dataset object that will be sent to the API for creation. Note that in BigQuery, a dataset is analogous to a database."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1750e13d-ec05-4335-82a2-ffe54c5b5f32",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.cloud import bigquery\n",
+ "\n",
+ "client = bigquery.Client()\n",
+ "\n",
+ "# Set dataset_id to the ID of the dataset to create.\n",
+ "dataset_id = f'{project_id}.{dataset_name}'\n",
+ "\n",
+ "# Construct a full Dataset object to send to the API.\n",
+ "dataset = bigquery.Dataset(dataset_id)\n",
+ "\n",
+ "# Specify the geographic location where the dataset should reside.\n",
+ "dataset.location = location\n",
+ "\n",
+ "# Send the dataset to the API for creation, with an explicit timeout.\n",
+ "# Raises google.api_core.exceptions.Conflict if the Dataset already\n",
+ "# exists within the project.\n",
+ "dataset = client.create_dataset(dataset, timeout=30) # Make an API request.\n",
+ "print(\"Created dataset {}.{}\".format(client.project, dataset.dataset_id))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "35a1499e-0439-48ab-8042-cc7b8625efd8",
+ "metadata": {},
+ "source": [
+ "Now you have two options to create your table (please pick one):"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b421e99e-e493-42b0-9bba-5ffa9d9f1f7b",
+ "metadata": {},
+ "source": [
+ "#### Option 1: Auto detect "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e18540d-defa-414b-8b78-b86bce3dd46a",
+ "metadata": {},
+ "source": [
+ "The following code will auto detect the table schema and upload the data to the table in one step!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "671aa2a3-d259-42e3-8cc3-8ab6b1e4a7d1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "table_id = f'{dataset_id}.{table_name}'\n",
+ "\n",
+ "job_config = bigquery.LoadJobConfig(\n",
+ " autodetect=True, source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1,\n",
+ ")\n",
+ "uri = f\"gs://{bucket}/ds_salaries.csv\"\n",
+ "load_job = client.load_table_from_uri(\n",
+ " uri, table_id, job_config=job_config\n",
+ ") # Make an API request.\n",
+ "load_job.result() # Waits for the job to complete.\n",
+ "destination_table = client.get_table(table_id)\n",
+ "print(\"Loaded {} rows.\".format(destination_table.num_rows))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cf413f17-c0ab-4d4a-b333-94b05d0bd376",
+ "metadata": {},
+ "source": [
+ "#### Option 2: Manually create schema and upload CSV file"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "544c191c-2bc0-4a47-a0a7-cea81035597a",
+ "metadata": {},
+ "source": [
+ "Alternatively, if we choose to define the table schema and create the dataset table beforehand, we can use the following code. Afterward, we can upload the CSV dataset into a pandas DataFrame and then insert it into the newly created BigQuery table."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "68206b14-1b5e-49c3-9f07-52ba6c70a93e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Set table_id to the ID of the table to create.\n",
+ "\n",
+ "client = bigquery.Client()\n",
+ "\n",
+ "table_id = f'{project_id}.{dataset_name}.{table_name}'\n",
+ "\n",
+ "schema = [\n",
+ " bigquery.SchemaField(\"id\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"age\", \"INTEGER\", mode=\"NULLABLE\"), \n",
+ " bigquery.SchemaField(\"gender\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"height\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"weight\", \"FLOAT\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"ap_hi\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"ap_lo\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"cholesterol\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"gluc\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"smoke\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"alco\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"active\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"cardio\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"ageinyr\", \"INTEGER\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"bmi\", \"FLOAT\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"bmicat\", \"STRING\", mode=\"NULLABLE\"),\n",
+ " bigquery.SchemaField(\"agegroup\", \"STRING\", mode=\"NULLABLE\"),\n",
+ "\n",
+ "]\n",
+ "\n",
+ "table = bigquery.Table(table_id, schema=schema)\n",
+ "table = client.create_table(table) # Make an API request.\n",
+ "print(\n",
+ " \"Created table {}.{}.{}\".format(table.project, table.dataset_id, table.table_id)\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0c87a0f1-de43-4cce-a5e2-8952ce2283a0",
+ "metadata": {},
+ "source": [
+ "We load our dataset located in our bucket to a pandas dataframe."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2486d4e3-b462-42de-85ab-3c6d5a8ba363",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.cloud import storage\n",
+ "import pandas as pd\n",
+ "from io import StringIO\n",
+ "\n",
+ "path = \"\"\n",
+ "df = pd.read_csv(path)\n",
+ "df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2dad23f6-cc3f-4bc6-b5ff-a66d17740a65",
+ "metadata": {},
+ "source": [
+ "We need to use the \"to_gbq\" function from the pandas-gbq library, which allows us to write a Pandas DataFrame to a Google BigQuery table. This enables us to populate our BigQuery table with the data from the DataFrame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7cfd1d66-a0fb-403d-a9b1-27250cc81f91",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df.to_gbq(table_id, project_id=project_id)\n",
+ "\n",
+ "client.load_table_from_dataframe(df, table_id).result()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6e97b4c6-429c-4d36-a4c1-3963b079af6a",
+ "metadata": {},
+ "source": [
+ "This step is optional. We can make an API request to verify whether the dataset has been successfully created under our project."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "72f11cfd-8b20-48bf-b157-aaf791cd3a9f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "datasets = list(client.list_datasets()) # Make an API request.\n",
+ "project = project_id\n",
+ "\n",
+ "if datasets:\n",
+ " print(\"Datasets in project {}:\".format(project))\n",
+ " for dataset in datasets:\n",
+ " print(\"\\t{}\".format(dataset.dataset_id))\n",
+ "else:\n",
+ " print(\"{} project does not contain any datasets.\".format(project))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4b16a292-afdc-415c-a8a7-071bc90281c3",
+ "metadata": {},
+ "source": [
+ "![image.png](../../images/gcp_rag_structure_data_01.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ca9014cb-2226-4168-b96e-d01021bee230",
+ "metadata": {},
+ "source": [
+ "### Optional: Run a query"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "22b4dd65-9bd9-48ef-9c98-85e3e9a82701",
+ "metadata": {},
+ "source": [
+ "This step is also optional. We can execute a simple query on the BigQuery table to count the number of records."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "29e91fc1-735f-4d6c-9fcf-3a2d745363c1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = f\"\"\"\n",
+ " SELECT count(gender) as cgender\n",
+ " FROM `{project_id.dataset_id.table_id}`\n",
+ "\"\"\"\n",
+ "\n",
+ "# Execute the query\n",
+ "query_job = client.query(query)\n",
+ "\n",
+ "# Fetch the results\n",
+ "results = query_job.result()\n",
+ "\n",
+ "# Iterate through the rows\n",
+ "for row in results:\n",
+ " #print(f\"gender: {row.name}, age: {row.age}\")\n",
+ " print(f\"cgender: {row.cgender}\")\n",
+ "\n",
+ "try:\n",
+ " query_job = client.query(query)\n",
+ " results = query_job.result()\n",
+ " print(results)\n",
+ "except Exception as e:\n",
+ " print(f\"Error executing query: {str(e)}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ce65ead3-694d-4d46-9376-38d35b8d0852",
+ "metadata": {},
+ "source": [
+ "cgender: 139920\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b0cd8a0d-9845-4d84-abf6-2610ed946b7e",
+ "metadata": {},
+ "source": [
+ "### Connect our model and Big Query database and run a query"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b6fb53b8-9f93-4f37-8730-33c5871720fc",
+ "metadata": {},
+ "source": [
+ "To interact with the BigQuery table using a Pythonic domain language, we utilize SQLAlchemy. SQLAlchemy is a Python SQL toolkit that enables developers to access and manage SQL databases, allowing users to write queries as strings or chain Python objects for similar queries. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d1d29b90-7bcd-46b8-82b9-dc5e58f9edee",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.cloud import bigquery\n",
+ "from sqlalchemy import *\n",
+ "from sqlalchemy.engine import create_engine\n",
+ "from sqlalchemy.schema import *\n",
+ "import os\n",
+ "from langchain_community.agent_toolkits import create_sql_agent\n",
+ "from langchain.agents import create_sql_agent\n",
+ "from langchain.agents.agent_toolkits import SQLDatabaseToolkit\n",
+ "from langchain.sql_database import SQLDatabase\n",
+ "\n",
+ "sqlalchemy_url = f'bigquery://{project_id}/{dataset_name}'\n",
+ "print(sqlalchemy_url)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9194cc1d-7ef1-4c4e-8e65-e7d9ada5be24",
+ "metadata": {},
+ "source": [
+ "Next, we import the __ChatVertexAI__ agent from Langchain and configure it with the appropriate hyperparameters. In this instance, we are using the __gemini-1.5-pro__ LLM model, after which we create the SQL agent to enable querying the BigQuery table using a natural language string as a prompt. Temperature regulates randomness, with higher temperatures resulting in more varied and unpredictable outputs. Top-k sampling selects from the k most probable next tokens at each step, where a lower k emphasizes higher-probability tokens. The max tokens hyperparameter specifies the maximum number of tokens in the response from the large language model. Max retries indicates how many responses we will receive from the model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ee1899a2-dd79-4cf4-a02d-c7c35b99040b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from langchain_google_vertexai import VertexAI, ChatVertexAI\n",
+ "\n",
+ "llm = ChatVertexAI(\n",
+ " model=\"gemini-1.5-pro\",\n",
+ " temperature=0,\n",
+ " max_tokens=8190,\n",
+ " timeout=None,\n",
+ " max_retries=2,\n",
+ ")\n",
+ "db = SQLDatabase.from_uri(sqlalchemy_url)\n",
+ "toolkit = SQLDatabaseToolkit(db=db, llm=llm)\n",
+ "agent_executor = create_sql_agent(\n",
+ " llm=llm,\n",
+ " toolkit=toolkit,\n",
+ " verbose=True,\n",
+ " top_k=100000\n",
+ " )\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f3551426-8644-48ba-b8c1-bf7d5eb843b3",
+ "metadata": {},
+ "source": [
+ "For our initial query, we check the number of rows to compare the result with what we obtained in a previous step using a simple SQL query against the BigQuery table. We can confirm that we received the same number of rows: 139,920."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e9753533-e82b-47af-936a-a401d512308a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "agent_executor.run(\"How many rows are in the table screening? \")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f2b62012-b2d2-48c1-ae69-6f8816065551",
+ "metadata": {},
+ "source": [
+ "Next, we pose a question that requires applying a filter, and we successfully obtain the accurate number of obese individuals in the table."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5563d317-81ab-4a37-8225-e1b0dce20b36",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "agent_executor.run(\"How many obese are in the table screen?\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b96ea414-3f2d-4e66-ab0b-a034dfe507b0",
+ "metadata": {},
+ "source": [
+ "As a more complex query, we inquired about the number of female smokers, and the SQL agent accurately returned the answer of 1,626 female smokers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a456ff90-a3a5-4ee4-81da-fba4fcdf7f2d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "agent_executor.run(\"How many smokers are female in the table screen? the value 1 in smoke means that is a smoker and the value 2 in gender means that is a female\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ca0fb657-19e0-4488-b9f6-25ca3971064f",
+ "metadata": {},
+ "source": [
+ "## Conclusion"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7f85e560-231f-4841-b1d0-334326e4b347",
+ "metadata": {},
+ "source": [
+ "You have learned how to create a dataset and a table in BigQuery, as well as how to set up an agent that enables you to query the table using natural language instead of SQL queries, allowing you to obtain the results you need."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "41573388-7feb-44b6-9e8a-0511cc8ae62e",
+ "metadata": {},
+ "source": [
+ "## Clean Up"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1f0edc0b-f2cc-478c-855f-9acff3792f4d",
+ "metadata": {},
+ "source": [
+ "Please remember to delete or stop your Jupyter notebook and delete your BigQuery dataset and table to prevent incurring charges. And if you have created any other services like buckets, please remember to delete them as well."
+ ]
+ }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/notebooks/GenAI/Gemini_Intro.ipynb b/notebooks/GenAI/Gemini_Intro.ipynb
index b6dda99..c6a24d8 100644
--- a/notebooks/GenAI/Gemini_Intro.ipynb
+++ b/notebooks/GenAI/Gemini_Intro.ipynb
@@ -514,7 +514,7 @@
"id": "614883f2-e53a-4ba5-855c-209d755a6e6f",
"metadata": {},
"source": [
- "You can also use Gemini Pro and Pro Vision in Vertex AI's playground called **Vertex AI Studio**. To locate Vertex AI Studio search Vertex AI and on the left hand side locate Vertex AI Studio as the image below shows. To utilize Gemini Pro Vision locate and click **Multimodal** you will have the option to use your own prompt or explore some of the other set prompts such as Extract text from images, image question answering , etc."
+ "You can also use Gemini Pro and Pro Vision in Vertex AI's playground called **Vertex AI Studio**. To locate Vertex AI Studio search Vertex AI and on the left hand side locate Vertex AI Studio as the image below shows. To utilize Gemini Pro Vision locate and click **Overview** and then click on **Open FreeForm** option. You will have the option to use your own prompt or explore some of the other set prompts such as Extract text from images, image question answering , etc."
]
},
{
@@ -522,15 +522,15 @@
"id": "8a634e11-3bd5-4048-b6e4-46d5aac6ce34",
"metadata": {},
"source": [
- "![Gemini1](../../images/Gemini_1.png)"
+ "![Gemini1](../../images/VertexAIIntro01.png)"
]
},
{
"cell_type": "markdown",
- "id": "c3716d9d-49af-40c9-acce-757c53be9a12",
+ "id": "12c359d3-9d82-441f-aff8-da1353925210",
"metadata": {},
"source": [
- "For this tutorial we will select Open on the **Prompt Design** option. We will upload the COVID image we downloaded before by clicking **INSERT MEDIA** and selecting our file. Then we will ask it a question, here we asked \"Describe treatments for the item in this image\"."
+ "When the free form opens, you have to select the gemini-1.0-pro-vision-001 model on the right side of the screen. We will upload the COVID image we downloaded before by clicking **Insert Media** and selecting our file. Then we will ask it a question, her we ask \"Describe treatment for the item in this image\"."
]
},
{
@@ -538,7 +538,7 @@
"id": "ec72f9cd-0d06-47e4-b67c-45039372d967",
"metadata": {},
"source": [
- "![Gemini3](../../images/Gemini_3.png)"
+ "![Gemini3](../../images/VertexAIIntro03.png)"
]
},
{
@@ -546,7 +546,7 @@
"id": "4a88f06b-770a-43be-b875-b4054c3ccbf3",
"metadata": {},
"source": [
- "To utilize Gemini Pro locate and click **Language** on the left side menu. You have the option to use a prompt or chat and if you would like to focus on text or code."
+ "To utilize Gemini Pro locate and click **Chat** on the left side menu. "
]
},
{
@@ -554,7 +554,7 @@
"id": "1952cd5c-c93d-428f-8036-ac658eebfba4",
"metadata": {},
"source": [
- "![Gemini2](../../images/Gemini_2.png)"
+ "![Gemini4](../../images/VertexAIIntro04.png)"
]
},
{
@@ -562,7 +562,7 @@
"id": "7810c0db-d145-4acd-a2cc-b6ce8231fa14",
"metadata": {},
"source": [
- "Here we picked the **TEXT CHAT** option and asked the bot to describe covid and how it works."
+ "We ask the bot to describe covid and how it works."
]
},
{
@@ -570,13 +570,45 @@
"id": "36b4635d-d8f3-42df-959d-dff92259813c",
"metadata": {},
"source": [
- "![Gemini4](../../images/Gemini_4.png)"
+ "![Gemini5](../../images/VertexAIIntro05.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c0c5d0ad-2235-471e-98e6-8d5f295cc724",
+ "metadata": {},
+ "source": [
+ "## Conclusion"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3d66220a-b17e-4b9d-8f85-f2c682d7a27f",
+ "metadata": {},
+ "source": [
+ "You’ve learned how to interact with Gemini through code and the Vertex AI Studio interface. We covered how to generate text from images and videos using the Gemini-pro-vision model in a Python script. Additionally, we explored the FreeForm and Chat options in Vertex AI Studio to extract text from images and to engage in question-and-answer interactions within the Chat prompt."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0a1fedcb-75d7-411f-9475-9ecb777c9dbb",
+ "metadata": {},
+ "source": [
+ "## Clean UP"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f30d1136-f67c-4bb1-8ed4-09b88ef67a1e",
+ "metadata": {},
+ "source": [
+ "Please remember to delete or stop your Jupyter notebook and if you have created any other services like buckets, please delete them as well."
]
},
{
"cell_type": "code",
"execution_count": null,
- "id": "acc53b99-e6ed-45f1-a0e1-9b146e18e46c",
+ "id": "9ee938bb-6dbb-4e01-88ae-e8511ada3ff7",
"metadata": {},
"outputs": [],
"source": []
diff --git a/notebooks/GenAI/VertexAIStudioGCP.ipynb b/notebooks/GenAI/VertexAIStudioGCP.ipynb
index 219f1dc..d217321 100644
--- a/notebooks/GenAI/VertexAIStudioGCP.ipynb
+++ b/notebooks/GenAI/VertexAIStudioGCP.ipynb
@@ -63,18 +63,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Go to the Vertex AI Studio console by navigating to Vertex AI via the search bar on the console. On the left side menu scroll down to Vertex AI Studio, click **Language**.\n",
+ "Go to the Vertex AI Studio console by navigating to Vertex AI via the search bar on the console. On the left side menu scroll down to Vertex AI Studio, click **Chat**.\n",
"\n",
- " \n",
+ " \n",
"\n",
"\n",
- "Scroll down **\"Prompt examples\"** then to **Summarization** and click **\"Open\"** on **Article Summary**. You will see a prompt session were you will need to enter in the contents of your article as the console does not allow you to upload files. For this tutorial this article is about how gut microbiota affects Alzeheimer's disease because of the gut-brain-microbiota axis network [here](https://www.aging-us.com/article/102930/pdf).\n",
+ "Click on **\"Browse prompt gallery\"** then scroll down to **Summarizing texts** and click on it. You will see the system instructions and the prompt on the chat that are populated with the summarizing example. Since we want to summarize our own article, we clear the prompt by clicking **Clear prompt** on the top right and then **Insert Media**, select **ByURL** if you want to upload a document from the web or **Upload** if you want to upload it from your local machine. Paste the url link and press **Insert**. The document will be loaded into the Prompt and you will have to enter some instruction of what to do with the uploaded document i.e. \"Please summarize the uploaded document.\" For this tutorial this article is about how gut microbiota affects Alzeheimer's disease because of the gut-brain-microbiota axis network [here](https://www.aging-us.com/article/102930/pdf).\n",
"\n",
- " \n",
+ " \n",
"\n",
- "To the right you can control the parameters that we have been using before this is a great way to test what each parameter does and how they effect each other. Once you are done click **submit**, you should have a similar output as below. For explainations on the parameters **temperature, Output token limit, top p, and top k** see the following article [here](https://cloud.google.com/vertex-ai/docs/generative-ai/text/test-text-prompts#generative-ai-test-text-prompt-drest).\n",
+ "To the right you can control the parameters that we have been using before this is a great way to test what each parameter does and how they effect each other. Once you are done click the submit arrow, you should have a similar output as below. For explainations on the parameters **temperature, Output token limit, top p, and top k** see the following article [here](https://cloud.google.com/vertex-ai/docs/generative-ai/text/test-text-prompts#generative-ai-test-text-prompt-drest).\n",
"\n",
- " \n",
+ " \n",
" \n"
]
},
@@ -89,7 +89,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- ""
+ ""
]
},
{