Skip to content

Commit

Permalink
Add ADI Based Skillset (#11)
Browse files Browse the repository at this point in the history
Co-authored-by: Ben Constable <[email protected]>
  • Loading branch information
priyal1508 and BenConstable9 authored Sep 11, 2024
1 parent 65a3d2d commit 45f2023
Show file tree
Hide file tree
Showing 22 changed files with 2,376 additions and 12 deletions.
6 changes: 6 additions & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"recommendations": [
"ms-azuretools.vscode-azurefunctions",
"ms-python.python"
]
}
15 changes: 15 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"configurations": [
{
"connect": {
"host": "localhost",
"port": 9091
},
"name": "Attach to Python Functions",
"preLaunchTask": "func: host start",
"request": "attach",
"type": "debugpy"
}
],
"version": "0.2.0"
}
7 changes: 7 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"azureFunctions.projectLanguage": "Python",
"azureFunctions.projectLanguageModel": 2,
"azureFunctions.projectRuntime": "~4",
"azureFunctions.scmDoBuildDuringDeployment": true,
"debug.internalConsoleOptions": "neverOpen"
}
15 changes: 15 additions & 0 deletions .vscode/tasks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"tasks": [
{
"command": "host start",
"isBackground": true,
"label": "func: host start",
"options": {
"cwd": "${workspaceFolder}/ai_search_with_adi_function_app"
},
"problemMatcher": "$func-python-watch",
"type": "func"
}
],
"version": "2.0.0"
}
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ It is intended that the plugins and skills provided in this repository, are adap
## Components

- `./text2sql` contains an Multi-Shot implementation for Text2SQL generation and querying which can be used to answer questions backed by a database as a knowledge base.
- `./ai_search_with_adi` contains code for linking Azure Document Intelligence with AI Search to process complex documents with charts and images, and uses multi-modal models (gpt4o) to interpret and understand these.
- `./ai_search_with_adi_function_app` contains code for linking Azure Document Intelligence with AI Search to process complex documents with charts and images, and uses multi-modal models (gpt4o) to interpret and understand these.
- `./deploy_ai_search` provides an easy Python based utility for deploying an index, indexer and corresponding skillset for AI Search.

The above components have been successfully used on production RAG projects to increase the quality of responses. The code provided in this repo is a sample of the implementation and should be adjusted before being used in production.

Expand Down
8 changes: 8 additions & 0 deletions adi_function_app/.funcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.git*
.vscode
__azurite_db*__.json
__blobstorage__
__queuestorage__
local.settings.json
test
.venv
32 changes: 21 additions & 11 deletions ai_search_with_adi/README.md → adi_function_app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,35 +38,46 @@ The properties returned from the ADI Custom Skill are then used to perform the f

## Provided Notebooks \& Utilities

- `./ai_search.py`, `./deployment.py` provide an easy Python based utility for deploying an index, indexer and corresponding skillset for AI Search.
- `./function_apps/indexer` provides a pre-built Python function app that communicates with Azure Document Intelligence, Azure OpenAI etc to perform the Markdown conversion, extraction of figures, figure understanding and corresponding cleaning of Markdown.
- `./ai_search_with_adi_function_app` provides a pre-built Python function app that communicates with Azure Document Intelligence, Azure OpenAI etc to perform the Markdown conversion, extraction of figures, figure understanding and corresponding cleaning of Markdown.
- `./rag_with_ai_search.ipynb` provides example of how to utilise the AI Search plugin to query the index.

## Deploying AI Search Setup

To deploy the pre-built index and associated indexer / skillset setup, see instructions in `./ai_search/README.md`.

## ADI Custom Skill

Deploy the associated function app and required resources. You can then experiment with the custom skill by sending an HTTP request in the AI Search JSON format to the `/adi_2_ai_search` HTTP endpoint.
Deploy the associated function app and required resources. You can then experiment with the custom skill by sending an HTTP request in the AI Search JSON format to the `/adi_2_deploy_ai_search` HTTP endpoint.

To use with an index, either use the utility to configure a indexer in the provided form, or integrate the skill with your skillset pipeline.

### function_app.py
### Deployment Steps

1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication. Use this template to update the environment variables in the function app.
2. Make sure the infra and required identities are setup. This setup requires Azure Document Intelligence and GPT4o.
3. [Deploy your function app](https://learn.microsoft.com/en-us/azure/azure-functions/functions-deployment-technologies?tabs=windows) and test with a HTTP request.

`./function_apps/indexer/function_app.py` contains the HTTP entrypoints for the ADI skill and the other provided utility skills.
### Code Files

### adi_2_aisearch
#### function_app.py

`./function_apps/indexer/adi_2_aisearch.py` contains the methods for content extraction with ADI. The key methods are:
`./indexer/ai_search_with_adi_function_app.py` contains the HTTP entrypoints for the ADI skill and the other provided utility skills.

#### analyse_document
#### adi_2_aisearch

`./indexer/adi_2_aisearch.py` contains the methods for content extraction with ADI. The key methods are:

##### analyse_document

This method takes the passed file, uploads it to ADI and retrieves the Markdown format.

#### process_figures_from_extracted_content
##### process_figures_from_extracted_content

This method takes the detected figures, and crops them out of the page to save them as images. It uses the `understand_image_with_vlm` to communicate with Azure OpenAI to understand the meaning of the extracted figure.

`update_figure_description` is used to update the original Markdown content with the description and meaning of the figure.

#### clean_adi_markdown
##### clean_adi_markdown

This method performs the final cleaning of the Markdown contents. In this method, the section headings and page numbers are extracted for the content to be returned to the indexer.

Expand Down Expand Up @@ -181,7 +192,6 @@ If `chunk_by_page` header is `False`:

**Page wise analysis in ADI is recommended to avoid splitting tables / figures across multiple chunks, when the chunking is performed.**


## Production Considerations

Below are some of the considerations that should be made before using this custom skill in production:
Expand Down
Loading

0 comments on commit 45f2023

Please sign in to comment.