Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please merge to main with the new slides #7

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
e3fb899
Update lab_2.md
kaushik-microsoft Apr 26, 2023
b3ca130
Merge pull request #1 from kaushik-microsoft/patch-1
kaushik-microsoft Apr 26, 2023
00b2e31
Update lab_2.md
kaushik-microsoft Apr 26, 2023
71fa32d
Update lab_2.md
kaushik-microsoft Apr 26, 2023
24cfaa5
Add files via upload
kaushik-microsoft Apr 26, 2023
c9fdf44
Add files via upload
kaushik-microsoft Apr 26, 2023
ff736d3
Update lab_2.md
kaushik-microsoft Apr 26, 2023
5420cce
uploaded new pictures
vp-ms Apr 26, 2023
24e9eca
Update lab_2.md
kaushik-microsoft Apr 26, 2023
5ee64b6
Merge branch 'main' of https://github.com/MSUSAzureAccelerators/Azure…
vp-ms Apr 26, 2023
fa5ef2c
Update lab_2.md
kaushik-microsoft Apr 26, 2023
8542519
Add files via upload
kaushik-microsoft Apr 26, 2023
d9a603f
Update lab_2.md
kaushik-microsoft Apr 26, 2023
1350c94
Add files via upload
kaushik-microsoft Apr 26, 2023
d202b7a
Update lab_2.md
kaushik-microsoft Apr 26, 2023
2fd164f
Add files via upload
kaushik-microsoft Apr 26, 2023
fb73132
Add files via upload
kaushik-microsoft Apr 26, 2023
d491999
Add files via upload
kaushik-microsoft Apr 26, 2023
2264c23
Update lab_2.md
kaushik-microsoft Apr 26, 2023
0f1dbf7
Update lab_2.md
kaushik-microsoft Apr 26, 2023
23a1431
Add files via upload
kaushik-microsoft Apr 27, 2023
502e0f5
Add files via upload
kaushik-microsoft Apr 27, 2023
269d310
Update lab_2.md
vpatil-ms Apr 27, 2023
325ecdc
Update lab_3.md
vpatil-ms Apr 27, 2023
6f6c8b3
Update lab_2.md
kaushik-microsoft Apr 27, 2023
68541a5
Delete lab_1.md
kaushik-microsoft Apr 27, 2023
15fefc5
Rename lab_2.md to lab_1.md
kaushik-microsoft Apr 27, 2023
61d680b
Rename lab_3.md to lab_2.md
kaushik-microsoft Apr 27, 2023
38e34cd
Add files via upload
kaushik-microsoft Apr 27, 2023
c43ec23
Update lab_2.md
vpatil-ms Apr 27, 2023
24cbb4b
Update lab_2.md
vpatil-ms Apr 27, 2023
6ff577b
Add files via upload
kaushik-microsoft Apr 27, 2023
543f96c
Add files via upload
vpatil-ms Apr 27, 2023
33b108a
Update lab_2.md
vpatil-ms Apr 27, 2023
9818a3d
Add files via upload
vpatil-ms Apr 27, 2023
4bcf4a4
Update lab_2.md
vpatil-ms Apr 27, 2023
ca3a488
Add files via upload
vpatil-ms Apr 27, 2023
98746be
Update lab_2.md
vpatil-ms Apr 27, 2023
bb24962
Add files via upload
vpatil-ms Apr 27, 2023
2779669
Update lab_2.md
vpatil-ms Apr 27, 2023
2895dca
Update lab_2.md
vpatil-ms Apr 27, 2023
d24267d
Add files via upload
vpatil-ms Apr 27, 2023
8a777c9
Update lab_2.md
vpatil-ms Apr 27, 2023
57035a2
Update lab_2.md
vpatil-ms Apr 27, 2023
0cb3c4e
Add files via upload
vpatil-ms Apr 27, 2023
a7fd148
Add files via upload
vpatil-ms Apr 27, 2023
8f2ef60
Update lab_2.md
vpatil-ms Apr 27, 2023
d374597
Update lab_1.md
vpatil-ms Apr 27, 2023
171c7b4
Add files via upload
kaushik-microsoft Jun 12, 2023
8b83e55
Updating Lab 1 and Lab 2 instructions with new captures and steps.
gurkamaldeep Jun 27, 2023
e444939
Discarding BPAHomepageSSA conflict
gurkamaldeep Jun 27, 2023
35d2031
Merge pull request #2 from gurkamaldeep/main
kaushik-microsoft Jun 27, 2023
8b8185c
Add files via upload
kaushik-microsoft Dec 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Binary file added SampleInvoices/Extras/Lab3 Sample Data/7454.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added images/2.2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/3.1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/3.2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/BPA 2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/BPA 3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/BPA ingest documents.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/customermodelprojectcreation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/searchconfig.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/selectblobstorage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/selectblobstorage.png.png
Binary file added images/selectcontainer.png
Binary file added images/selectcontainerfolder.png
Binary file modified images/selectimportdata.png
Binary file added images/selectimportdata.png.png
Binary file added images/step1b playground replacement.png
160 changes: 160 additions & 0 deletions lab_instructions/Extras/lab_1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Create and Deploy a Form Recognizer Custom Model

### Overview
In this lab, you will create (train) an Azure Form Recognizer custom model using a sample training dataset. Custom models extract and analyze distinct data and use cases from forms and documents specific to your business. To create a custom model, you label a dataset of documents with the values you want extracted and train the model on the labeled dataset. You only need five examples of the same form or document type to get started. For this lab, you will use the dataset provided at [Custom Model Sample Files](/SampleInvoices/Lab3%20Sample%20Data)..


### Goal
* Use a sample training data set to train a custom model in the Azure Form Recognizer Studio
* Label the training data documents with custom fields of interest
* Test the trained model on test data, visualized results and confidence score in the Studio
* Use the custom model in the BPA pipeline from Lab 1


### Pre-requisites
* The accelerator is deployed and ready in the resource group
* You have an Azure subscription and permission to create a Form Recognizer Resource
* You have access to sample invoices folder with the invoices to upload


### Instructions

#### Create a Custom Model
- [Step 1 - Create a Form Recognizer Resource](#step-1---create-a-form-recognizer-resource)
- [Step 2 - Open Form Recognizer Studio and Create a Custom Labeling Project ](#step-2---open-form-recognizer-studio-and-create-a-custom-labeling-project)
- [Step 3 - Import the Sample Data](#step-3---import-the-sample-data)
- [Step 4 - Train the model](#step-4---train-the-model)
- [Step 5 - Test the Model on Test Data](#step-5---test-the-model-on-test-data)

#### Step 1 - Create a Form Recognizer Resource
![](images/step1a-create-form-rec-resource.png)
![](images/step1b-create-form-rec-resource.png)
![](images/step1c-create-form-rec-resource.png)

#### Step 2 - Open Form Recognizer Studio and Create a Custom Labeling Project

![](images/step2a-Create-custom-labeling-project.png)

Select the **Custom Extraction Model** from the bottom of the list of options

![](images/step2b-Create-custom-labeling-project.png)

Create Custom Model Project

![](images/step2c-Create-custom-labeling-project.png)
![](images/customermodelprojectcreation.png)

Provide the storage account and container containing the forms data which you will like to label

![](images/step2e-Create-custom-labeling-project.png)
![](images/step2f-Create-custom-labeling-project.png)
![](images/step2g-Create-custom-labeling-project.png)

#### Step 3 - Import the Sample Data
Use the data folder on VM desktop and go to **Custom Model Sample Files** and pick 5 files marked as **train**
![](images/step3a-import-sample-data.png)
![](images/step3b-import-sample-data.png)

Create a new field which you would like to label

![](images/step3c-import-sample-data.png)
We created the label as "Organization_sample"

![](images/step3d-import-sample-data.png)

Apply the custom label to form fields
![](images/step3e-import-sample-data.png)
Apply the labels to all forms by repeating the process in the previous step
![](images/step3f-import-sample-data.png)
#### Step 4 - Train the model
After labeling the forms, click on "Train" and provide the below information. Please note **Neural** method will take a longer duration to train but may be necessary in case of most unstructured files. If your data is mostly structured, you can use **Tabular** to make the training faster. For this workshop, we will use Tabular method to train the model.
![](images/step4a-train-the-model.png)
![](images/step4b-train-the-model.png)
#### Step 5 - Test the Model on Test Data
Use the sample files marked as **test** from the same location where you picked the files for training
![](images/step5a-test-the-model.png)
![](images/step5b-test-the-model.png)
Load the test file and click "Analyze"
![](images/step5c-test-the-model.png)
The results are projected with the confidence score
![](images/step5d-test-the-model.png)


#### Build new pipeline with custom model module in BPA
After you are sastified with the custom model performance, you can retrieve the **model ID** and use it in a new BPA pipeline with the Cusom Model module in the next step.

#### Launch BPA Accelerator
1. Launch the accelerator from the resource group in the Static Web App
1. To do this go to portal.azure.com ([Azure Portal](portal.azure.com)) from a web browser and click on resource group that is created for the purpose of this lab.
![resourcegroup.png](/images/resourcegroup.png)
Click on the resource group that is created for this lab, you should be able to see resources deployed as a part of Business Process Automation accelerator deployment.

> **Note :** The names will be different in your specific labs and will not exactly match with the names of the resources or resource group

![resourceswithinresourcegroup.png](/images/resourceswithinresourcegroup.png)

1. Look for the Static Web App under **type**. This is what we will use as a part of lab 1. Click on the Web App.

![staticwebappresource.png](/images/staticwebappresource.png)

Click on the URL and this will launch the accelerator
![swaurl.png](/images/swaurl.png)

1. Please create the following pipeline:
![](images/step6a-deploy-custom-model.png)
![](images/step6b-deploy-custom-model.png)
![](images/step6c-deploy-custom-model.png)

1. Retrieve the trained **custom model ID** from the Form Recognizer Studio and Enter it into the following window:
![](images/step6d-deploy-custom-model.png)
![](images/step6e-deploy-custom-model.png)

1. Check the newly created pipeline use **View Pipeline** Option
![](images/step6f-deploy-custom-model.png)

1. Ingest data for the new pipeline from BPA homepage. Please make sure you select the Pipeline first before ingesting the files. For smaller files use the **Upload A Single Document** option. Otherwise for larger files use **Split Document By Page And Process** option.
![](images/step6g-deploy-custom-model.png)


1. To get the **Search Service**. To view the results, go to portal.azure.com ([Azure Portal](portal.azure.com)) again in your browser and get to the resource group like we did earlier in Step 1. There, in the resource group, click on the resource that is of type **Search Service**.

![searchservicetype.png](/images/searchservicetype.png)

1. Click on **Import Data** and Select **Azure Blob Storage** from the dropdown in datasource.
![selectblobstorage.png](/images/selectblobstorage.png)
1. Provide a name for datasource; change the parsing mode to **JSON**; click on **Choose an existing connection** for **Connection String** and select the Storage account related to your project and choose container **document**
![selectcontainer.png](/images/selectcontainer.png)

1. Keep the default for **Managed identity Authentication**, which is **None**.

1. On the Blob folder, provide the name of **your pipeline**

![selectcontainerfolder.png](/images/selectcontainerfolder.png)

1. Click **Next: Add cognitive skills (Optional)**. This validates and creates the index schema.

1. In the next Screen(**Add cognitive skills (Optional)**), Click **Skip to: Customize Target Index**,
![customizetargetindex.png](/images/customizetargetindex.png)

1. Make all fields **Retrievable** and **Searchable**
![searchconfig.png](/images/searchconfig.png)

1. Provide a name for the Index and click on **Next: Create an indexer**
![indexname.png](/images/indexname.png)

1. Provide a name for the indexer and click **Submit**

![createindexer.png](/images/createindexer.png)

1. You will get a notification that the import is successfully configured

1. Now, go back to the accelerator url that you retreived from Step 1 and click on **Sample Search Application**.
![samplesearchapplication.png](/images/samplesearchapplication.png)

This opens the same search application
![searchlandingpage.png](/images/searchlandingpage.png)

1. You can now filter and search on items and other fields configured.
## More Resources
Getting Started with Form Recognizer Studio - https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/form-recognizer-studio-overview?view=form-recog-3.0.0
Form Recognizer Documentation - https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-invoice?view=form-recog-3.0.0
77 changes: 32 additions & 45 deletions lab_instructions/lab_3.md → lab_instructions/Extras/lab_2.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In this lab, you will use unstructured data files like contract documents, lease
### Instructions


#### **Step 1a - Create a OpenAI Generic Pipeline**
#### **Step 1 - Create A Generic Pipeline**
![](images/BPAHomepage.png)

![](images/Lab3NewPipeline.png)
Expand All @@ -25,34 +25,6 @@ In this lab, you will use unstructured data files like contract documents, lease

![](images/Lab3OCR2Txt.png)

![](images/Lab3OpenAIGeneric.png)

### **Step 1b - Get Sample Configurations from GPT-3 Playground**

![](images/Lab3SelectOAIResource.png)

![](images/Lab3OAIExplore.png)

![](images/Lab3OAIClickGPT3.png)

At this stage, we select the model we want to use and the feature we want to leverage. In this case we will be using the Davinci model and the Summerize feature. The playground brings in a sample on the editor. Select the content of the 'Conversation' section and replace with ${document} to ensure the dynamic content is used on runtime.
After that click on 'View Code' on top right.

![](images/Lab3OAIPlayground.png)

On the pop up, there will be drop down menu where by default 'Python' will be selected. Please change that to 'json' and Copy the code snippet.

![](images/CopySamplejson.png)

Go back to the BPA tab and replace the default text on the Generic OpenAI component opened earlier with the copied text.

![](images/Lab3OAISampleCode.png)

That completes the pipeline

![](images/Lab3FinishPipeline.png)


#### **Step 2 - Ingest Data for the pipeline**

There are 2 options for ingesting the data for the pipeline:
Expand All @@ -70,20 +42,21 @@ There are 2 options for ingesting the data for the pipeline:
1. Click on **Import Data**.
![selectimportdata.png](/images/selectimportdata.png)

1. Select **Azure Cosmos DB** from the dropdown in datasource.
![selectazurecosmosdb.png](/images/selectazurecosmosdb.png)

1. Provide a name for datasource and click on **Choose an existing connection** for **Connection String**. Here the Azure CosmosDB resource created as a part of BPA accelerator already setup will be one of the sources you can choose from.
![selectcosmosdb.png](/images/selectcosmosdb.png)


1. Keep the default for **Managed identity Authentication**, which is **None**. For **Databases** and **Collection** use the dropdown to select the same name as the Cosmos DB you selected at step 15.
1. Select **Azure Blob Storage** from the dropdown in datasource.

1. Under Query, use the following Query. The pipeline should match the pipeline name you used in step 3
> SELECT * FROM c WHERE c.pipeline = 'YOUR-PIPELINE-NAME' AND c._ts > @HighWaterMark

![](images/Lab3LoadData.png)
1. Provide a name for datasource and click on **Choose an existing connection** for **Connection String**. Here the Azure Blob Storage resource created as a part of BPA accelerator already setup will be one of the sources you can choose from.
![lab3-import-data-1.png](images/lab3-import-data-1.png)
1. After you select your storage account, select the **results** container from the list and click **Select** button. See the screen shot below.
![lab3-import-data-2.png](images/lab3-import-data-2.png)

1. On the Import data screen make sure you have the following:
- Your data source as **Azure Blob Storage**
- You have provided data source name. For e.g. **storagedatasource**
- You selected **JSON** as Parsing mode
- Your container name is **results** and
- Your created pipeline name is entered. For instance, if your pipeline name is **pipeline-name** then enter pipeline-name
- Keep the default for **Managed identity Authentication**, which is **None**
![lab3-import-data-3.png](images/lab3-import-data-3.png)

1. Click **Next: Add cognitive skills (Optional)**. This validates and creates the index schema.

Expand Down Expand Up @@ -111,16 +84,30 @@ There are 2 options for ingesting the data for the pipeline:
1. Select the Semantic Configuration and click on Create new.

On the pop up do the following:
- Give a name to the Semantic Search Config
- Give a name to the Semantic Search Config. **For this lab, the name must be 'default'**
- Select the Title field and select 'filename'
- Select the 'content' field and any other relevant fields for Content Fields
- Select Save

![](images/Lab3SemSearchConfig.png)
![](images/Lab3SemSearchConfig_default.png)
It is important that you name your Semantic Search Config for this lab as **default**

![](images/lab3-semantic-config-save.png)

Do not forget to click on **save** again on the index screen. Otherwise the Semantic Search Config will not be applied.

![](images/Lab3SemSearchConfigSave.png)
#### **Step 5 - Perform Azure OpenAI Search**
1. Now, go back to the accelerator url that you retreived from Step 1 and click on **Search Application**.
![BPAHomepageSampleSearch](images/BPAHomePageSearchApp.png)

#### **Step 5 - Perform Semantic Search**
This opens the Azure Open AI search application

- Select the index from top drop down. In this case the index created earlier is selected. **azureblob-index**
- Provide a search query based on your document, like:
- 'Tell me the 7454 installation instructions'
![Lab5-openai-search](images/Lab5-openai-search.png)

#### **Step 6 - Perform Semantic Search**
1. Now, go back to the accelerator url that you retreived from Step 1 and click on **Sample Search Application**.
![](images/BPAHomepageSSA.png)

Expand Down
Loading