Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update genai notebooks - Cameron J. #94

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions notebooks/GenAI/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
__pycache__
.venv
.env
microsoft-earnings_embeddings.csv
embedding_demos/p1.py
234 changes: 234 additions & 0 deletions notebooks/GenAI/azure_infra_setup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
# Setting Up Azure Environment for Azure GenAI Cloud Lab
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an estimate on the price of how much running this README in one setting could potentially cost a user?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing the ARM template in this README does not incur any cost. However, the resources deployed in the ARM template will incur costs overtime, based on Azure's pricing for each resource. These resources are the same resources that were originally being used in the Cloud Lab around AOAI, such as AI Search, Blob Storage, and Azure OpenAI service. The ARM template serves as an automated way to deploy these resources to a resource group, rather than having to manually deploy each resource from the Azure portal. "## Resources and Cost Breakdown" added to readme with estimated costs from Azure Pricing Calculator based on SKUs found in the ARM template for each resource.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please structure the tutorial to follow the outline laid out in the tutorial checklist

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tutorial has been structured to follow the outline laid out in checklist. Skill level identification has not yet been added. Where should skill level identification be placed in the tutorial structure?


Welcome! This guide will help you set up your Azure environment to complete the activities in the [Azure GenAI](../) directory of the NIH Cloud Lab. We will walk you through the steps required to configure PowerShell, deploy necessary resources using an ARM template, upload local files to Azure Storage Account, and acquire keys and secrets for `.env` variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if the user is using Azure Machine Learning and creates a notebook there then the CLI is already installed there. It might be the same for the VMs created in Azure. If so, then that would be a helpful note to add here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note to Prerequisites, advising the user of such circumstance. If using such environment, user is encouraged to skip step 1 and move directly to step 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposing we also incorporate steps to deploying Azure Resources manually from the Azure portal in this tutorial as well. Therefore, we can have a unified landing zone for users to configure their working environments for all tutorials in GenAI notebook using either the ARM file as an option or manual deployments as an option. In each tutorial for prerequisites we can then link users to this landing zone if they have not yet configured the environment and resources.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what an ARM template is? Some of our users are very new to the cloud and/or scripting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview of what an ARM template is has been added to the "## Resources and Cost Breakdown" section of README.


## Prerequisites

- An active Azure subscription
- PowerShell installed on your machine (option 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason why the user should use powershell?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choosing between Azure CLI and PowerShell comes down to personal preference and the working environment. Brief overview added to the Prerequisites section for helping users choose.

- Azure CLI installed (option 2)

## Steps

### 1. Setting Up the Azure Module in PowerShell

First, you need to install the Azure module in PowerShell to connect to your Azure account.

```powershell
# Install the Az module (if using PowerShell)
Install-Module -Name Az -AllowClobber -Force

# Import the Az module (if using Azure CLI)
Import-Module Az
```

### 2. Logging into Azure

You can log into your Azure account either using PowerShell or Azure CLI.

**Using PowerShell**
```powershell
# Log into your Azure account
Connect-AzAccount
```
**Using Azure CLI**
```powershell
# Log into your Azure account
az login
```

### 3. Setting Variables

Set the following variables, which you'll need throughout the setup process.

**Using PowerShell**
```powershell
# Variables
$resourceGroupName="nihcloudlabrg"
$location="eastus2"
$templateFilePath="Path To ./arm_resources.json"
$storageAccountName="cloudlabstgacct"
$containerName="cloudlabdocuments"
$localFilePath="Path To ../search_documents"
$searchServiceName="cloudlabsearch"
$openAIResourceName="cloudlabaoai"
```
**Using Azure CLI**
```bash
# Variables
resourceGroupName="nihcloudlabrg"
location="eastus2"
templateFilePath="Path To ./arm_resources.json"
storageAccountName="cloudlabstgacct"
containerName="cloudlabdocuments"
localFilePath="Path To ../search_documents"
searchServiceName="cloudlabsearch"
openAIResourceName="cloudlabaoai"
```

### 4. Creating an Empty Resource Group

Create an empty resource group where the ARM template will deploy the necessary resources.

**Using PowerShell**
```powershell
# Create a resource group
New-AzResourceGroup -Name $resourceGroupName -Location $location
```
**Using Azure CLI**
```bash
# Create a resource group
az group create --name $resourceGroupName --location $location
```

### 5. Deploying the ARM Template

Deploy the [ARM template](/notebooks/GenAI/azure_infra_setup/arm_resources.json) to create the Azure Storage Account, Azure AI Search, and Azure OpenAI resources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link didn't work please change to arm_resources.json

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjustment made to arm_resources.json.


***Using PowerShell***
```powershell
# Deploy the ARM template
New-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName -TemplateFile $templateFilePath
```
***Using Azure CLI***
```bash
# Deploy the ARM template
az deployment group create --resource-group $resourceGroupName --template-file $templateFilePath
```

### 6. Uploading Local Files to Azure Storage

Upload your local files to the blob container in the Azure Storage Account.

**Using PowerShell**
```powershell
# Get storage account context
$storageContext = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName).Context

# Upload all files in the directory
Get-ChildItem -Path $localFilePath -File | ForEach-Object {
Set-AzStorageBlobContent -File $_.FullName -Container $containerName -Context $storageContext
}
```
**Using Azure CLI**
```bash
# Get storage account key
storageAccountKey=$(az storage account keys list --resource-group $resourceGroupName --account-name $storageAccountName --query "[0].value" --output tsv)

# Upload all files in the directory
for file in localFilePath/*; do
az storage blob upload --account-name $storageAccountName --account-key $storageAccountKey --container-name $containerName --file file --name (basename file)
done
```

### 7. Retrieving API Keys

Retrieve the API keys for each service created by the ARM template deployment. These secrets are confidential and should be handled appropriately. Once the output is received, the values will be added to your `.env` file, which should be created in the ./notebooks/GenAI directory. Note that this `.env` file is already added to the `.gitignore`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what .gitignore does for beginner users?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description added to this section of the ReadMe, explaining what .gitignore does and why it is important that the .env file has been added.


**Azure Storage Account**

***Using PowerShell***
```powershell
# Get the storage account key
$storageAccountKey = (Get-AzStorageAccountKey -ResourceGroupName $resourceGroupName -Name $storageAccountName)[0].Value
# Construct the Blob connection string
$connectionString = "DefaultEndpointsProtocol=https;AccountName=$storageAccountName;AccountKey=$storageAccountKey;EndpointSuffix=core.windows.net"
# Output the connection string
Write-Output $connectionString
```
***Using Azure CLI***
```bash
# Get the storage account key
storageAccountKey=(az storage account keys list --resource-group $resourceGroupName --account-name $storageAccountName --query '[0].value' --output tsv)
echo $storageAccountKey
# Construct the Blob connection string
connectionString="DefaultEndpointsProtocol=https;AccountName=$storageAccountName;AccountKey=$storageAccountKey;EndpointSuffix=core.windows.net"
echo $connectionString
```

You now have the secrets to set the following .env variables in your local file. Copy the values to your `.env`:
- ***BLOB_CONTAINER_NAME*** = Use the value of `$containerName` or `containerName`.
- ***BLOB_CONNECTION_STRING*** = Use the value of `$connectionString ` or `connectionString`.
- ***BLOB_ACCOUNT_NAME*** = Use the value of `$storageAccountName` or `storageAccountName`.

**Azure AI Search**

***Using PowerShell***
```powershell
# Acquire the AI Search Admin Key
$adminKeys = Get-AzSearchAdminKeyPair -ResourceGroupName $resourceGroupName -ServiceName $searchServiceName
Write-Output $adminKeys
# Construct the AI Search Admin Key
$searchServiceEndpoint="https://$searchServiceName.search.windows.net"
Write-Output $searchServiceEndpoint
```
***Using Azure CLI***
```bash
# Acquire the AI Search Admin Key
searchServiceKey = az search admin-key show --resource-group resourceGroupName --service-name $searchServiceName --query primaryKey -o tsv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what pricing tier this command would use to create the Azure AI Search Service?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basic pricing tier used for Azure AI Search deployment with this command. The tier is identified in the ARM template logic as basic sku. The tier has also been added to the "## Resources and Cost Breakdown" section along with cost.

echo $searchServiceKey
# Construct the AI Search endpoint
searchServiceEndpoint="https://$searchServiceName.search.windows.net"
echo $searchServiceEndpoint
```

You now have the secrets to set the following .env variables in your local file. Copy the values to your `.env`:
- ***AZURE_SEARCH_ENDPOINT*** = Use the value of `$searchServiceEndpoint` or `searchServiceEndpoint`.
- ***AZURE_SEARCH_ADMIN_KEY*** = Use the value of `$searchServiceKey` or `searchServiceKey`.

**Azure OpenAI**

***Using PowerShell***
```powershell
# Get the Azure OpenAI key 1
$openAIKey = az cognitiveservices account keys list --resource-group $resourceGroupName --name $openAIResourceName --query "key1" --output tsv
Write-Output $openAIKey
# Construct the Azure OpenAI endpoint
$openAIEndpoint = "https://$openAIResourceName.openai.azure.com/"
Write-Output $openAIEndpoint
```
***Using Azure CLI***
```bash
# Get the Azure OpenAI key
openAIKey=$(az cognitiveservices account keys list --resource-group $resourceGroupName --name $openAIResourceName --query "key1" --output tsv)
echo $openAIKey
# Construct the Azure OpenAI endpoint
openAIEndpoint = "https://$openAIResourceName.openai.azure.com/"
echo $openAIEndpoint
```

You now have the secrets to set the following .env variables in your local file. Copy the values to your `.env`:
- ***AZURE_OPENAI_ENDPOINT*** = Use the value of `$openAIEndpoint` or `openAIEndpoint`.
- ***AZURE_OPENAI_KEY*** = Use the value of `$openAIKey` or `openAIKey`.
- ***AZURE_GPT_DEPLOYMENT*** = Use the value of `gpt-4o-mini`.
- ***AZURE_EMBEDDINGS_DEPLOYMENT*** = Use the value of `text-embedding-3-small`.

**Note**: To find the ***API version (Azure_OPENAI_VERSION)*** for your resource in the Azure OpenAI playground, follow these steps:
1. **Navigate to Deployments**: In the left side panel of the Azure OpenAI playground, click on “Deployments.”
2. **Select the Model Deployment**: Click on the specific model deployment you are working with.
3. **Locate the Endpoint Section**: In the endpoint section, you will see the Target URI.
4. **Find the API Version**: Look for the part of the URL that includes `api-version=2024-08-01-preview`. This will be your API version.

Your final local `.env` file should look something like this:
```sh
AZURE_OPENAI_VERSION = "Your Azure OpenAI API version"
AZURE_OPENAI_ENDPOINT = "Your Azure OpenAI API endpoint"
AZURE_OPENAI_KEY = "Your Azure OpenAI API key"
AZURE_GPT_DEPLOYMENT = "Your Azure OpenAI deployed GPT model name"
AZURE_EMBEDDINGS_DEPLOYMENT = "Your Azure OpenAI deployed ADA model name"
AZURE_SEARCH_ENDPOINT = "Your Azure AI Search API endpoint"
AZURE_SEARCH_ADMIN_KEY = "Your Azure AI Search API key"
BLOB_CONTAINER_NAME = "Your Azure Blob Container name hosting files from /search_documents"
BLOB_CONNECTION_STRING = "Your Azure Blob connection string"
```
## Conclusion

Congratulations on completing the Azure setup! During this process, we established a new resource group dedicated to the NIH Cloud Lab environment and configured three Azure resources in your tenant using an ARM template file. The resources include:

- An Azure Storage Account with a deployed Blob container and files uploaded from `../search_documents`
- Azure AI Search
- Azure OpenAI with deployed `gpt-4o-mini` and `text-embedding-3-small` models

Additionally, we configured `.env` variables in your local `.env` file, which is added to `.gitignore` by default.

You are now ready to proceed with the GenAI activities in the NIH Cloud Lab.
Loading
Loading