Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Image Processing for multiple stage processing and fix bugs #140

Merged
merged 65 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
9023ffc
Bug fixes for deployment
BenConstable9 Jan 13, 2025
0ebfdf9
Update image processing
BenConstable9 Jan 13, 2025
aa0cdfd
Update function app
BenConstable9 Jan 13, 2025
a3d84c0
Update code
BenConstable9 Jan 13, 2025
c920377
Remove entry points
BenConstable9 Jan 13, 2025
f0912ad
Update cleaner
BenConstable9 Jan 13, 2025
02f0bba
Update local settings
BenConstable9 Jan 13, 2025
127d161
Update readme
BenConstable9 Jan 14, 2025
77c4094
Update rag documents
BenConstable9 Jan 14, 2025
b3e3c04
Update layout holders
BenConstable9 Jan 14, 2025
8e6b61d
Update ai search
BenConstable9 Jan 14, 2025
0b9370f
Update image processing work
BenConstable9 Jan 16, 2025
eef8b6f
Update precommit
BenConstable9 Jan 16, 2025
b47ccf8
Update code
BenConstable9 Jan 16, 2025
b120adf
Update directory
BenConstable9 Jan 16, 2025
54aec5d
Try new export
BenConstable9 Jan 16, 2025
600ae3b
Update pre commit
BenConstable9 Jan 16, 2025
6a47d39
Update req
BenConstable9 Jan 16, 2025
c357ddd
Update requirements
BenConstable9 Jan 16, 2025
d6386e9
Update return
BenConstable9 Jan 16, 2025
2521e76
Update documents
BenConstable9 Jan 16, 2025
be744e5
Update rag examples
BenConstable9 Jan 16, 2025
1ef26cd
Update README
BenConstable9 Jan 16, 2025
a67a695
Merge main
BenConstable9 Jan 21, 2025
6049d7b
Update readme and model
BenConstable9 Jan 21, 2025
27e5251
Update ai search deploy
BenConstable9 Jan 21, 2025
84f9572
Update readme and model
BenConstable9 Jan 21, 2025
967ccfb
Update example
BenConstable9 Jan 21, 2025
37f3a6b
Update mappings
BenConstable9 Jan 22, 2025
a4ef19b
Update chunker
BenConstable9 Jan 22, 2025
769e60f
Update location of exported requirements
BenConstable9 Jan 22, 2025
674bd80
Update toml
BenConstable9 Jan 22, 2025
28c1076
Update README
BenConstable9 Jan 22, 2025
908037d
Update
BenConstable9 Jan 22, 2025
d42d3c1
Update
BenConstable9 Jan 22, 2025
7f1b72c
Update lock
BenConstable9 Jan 22, 2025
c3aecf1
Update lock
BenConstable9 Jan 22, 2025
63c4915
Update req
BenConstable9 Jan 22, 2025
e000569
Update
BenConstable9 Jan 22, 2025
f270a57
Update func ignore
BenConstable9 Jan 22, 2025
8c46364
Update settings
BenConstable9 Jan 22, 2025
4a2bf99
Update requirements
BenConstable9 Jan 22, 2025
f99c883
udpate
BenConstable9 Jan 22, 2025
a6f5a94
Remove extra
BenConstable9 Jan 22, 2025
aab7b41
new pre commit
BenConstable9 Jan 22, 2025
5b965f6
Update
BenConstable9 Jan 22, 2025
dead887
Bump version
BenConstable9 Jan 22, 2025
0a9c4bc
Don't update lock
BenConstable9 Jan 22, 2025
7a0b590
Update
BenConstable9 Jan 22, 2025
9aa4a77
Update
BenConstable9 Jan 22, 2025
f3dcbb2
Update
BenConstable9 Jan 22, 2025
64c5cc7
Add export
BenConstable9 Jan 22, 2025
d702442
Update
BenConstable9 Jan 22, 2025
f317f08
Update
BenConstable9 Jan 22, 2025
5f8bec3
Update
BenConstable9 Jan 22, 2025
7f0a7fd
Change diff
BenConstable9 Jan 22, 2025
dd6a7ba
Update
BenConstable9 Jan 22, 2025
f343746
Update
BenConstable9 Jan 22, 2025
19ccc74
trey
BenConstable9 Jan 22, 2025
f1c6a4c
Change imports
BenConstable9 Jan 22, 2025
afa4f7d
Update
BenConstable9 Jan 22, 2025
ecece23
Update code
BenConstable9 Jan 23, 2025
a5a8722
Update code and flows
BenConstable9 Jan 23, 2025
9858d37
Update mapping
BenConstable9 Jan 23, 2025
9209d5e
final updates
BenConstable9 Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .funcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.git*
.vscode
__azurite_db*__.json
__blobstorage__
__queuestorage__
local.settings.json
test
.venv
.github/*
.devcontainer/*
.ruff_cache/*
deploy_ai_search_indexes/*
text_2_sql/*
documentation/*
images/
__pycache__
2 changes: 1 addition & 1 deletion .github/workflows/ci-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ on:
- "*" # Run on all branches

env:
MIN_PYTHON_VERSION: 3.11
MIN_PYTHON_VERSION: 3.12

jobs:
job-pre-commit-check:
Expand Down
14 changes: 8 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,11 @@ repos:
args: [--fix, --ignore, UP007]
exclude: samples

# - repo: https://github.com/astral-sh/uv-pre-commit
# # uv version.
# rev: 0.5.5
# hooks:
# # Update the uv lockfile
# - id: uv-lock
- repo: https://github.com/astral-sh/uv-pre-commit
# uv version.
rev: 0.5.20
hooks:
# Update the uv lockfile
- id: uv-lock
- id: uv-export
args: [--frozen, --no-hashes, --no-editable, --no-sources, --verbose, --no-group, dev, --directory, image_processing, -o, src/image_processing/requirements.txt]
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.12
3.11
2 changes: 2 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
{
"azureFunctions.deploySubpath": "image_processing/src/image_processing",
"azureFunctions.projectLanguage": "Python",
"azureFunctions.projectLanguageModel": 2,
"azureFunctions.projectRuntime": "~4",
"azureFunctions.pythonVenv": ".venv",
"azureFunctions.scmDoBuildDuringDeployment": true,
"debug.internalConsoleOptions": "neverOpen"
}
2 changes: 1 addition & 1 deletion .vscode/tasks.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"isBackground": true,
"label": "func: host start",
"options": {
"cwd": "${workspaceFolder}/ai_search_with_adi_function_app"
"cwd": "${workspaceFolder}/image_processing/src/image_processing"
},
"problemMatcher": "$func-python-watch",
"type": "func"
Expand Down
14 changes: 5 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,14 @@ It is intended that the plugins and skills provided in this repository, are adap
## Components

- `./text_2_sql` contains an three Multi-Shot implementations for Text2SQL generation and querying which can be used to answer questions backed by a database as a knowledge base. A **prompt based** and **vector based** approach are shown, both of which exhibit great performance in answering sql queries. Additionally, a further iteration on the vector based approach is shown which uses a **query cache** to further speed up generation. With these plugins, your RAG application can now access and pull data from any SQL table exposed to it to answer questions.
- `./adi_function_app` contains code for linking **Azure Document Intelligence** with AI Search to process complex documents with charts and images, and uses **multi-modal models (gpt4o)** to interpret and understand these. With this custom skill, the RAG application can **draw insights from complex charts** and images during the vector search. This function app also contains a **Semantic Text Chunking** method that aims to intelligently group similar sentences, retaining figures and tables together, whilst separating out distinct sentences.
- `./deploy_ai_search` provides an easy Python based utility for deploying an index, indexer and corresponding skillset for AI Search and for Text2SQL.
- `./image_processing` contains code for linking **Azure Document Intelligence** with AI Search to process complex documents with charts and images, and uses **multi-modal models (gpt4o)** to interpret and understand these. With this custom skill, the RAG application can **draw insights from complex charts** and images during the vector search. This function app also contains a **Semantic Text Chunking** method that aims to intelligently group similar sentences, retaining figures and tables together, whilst separating out distinct sentences.
- `./deploy_ai_search_indexes` provides an easy Python based utility for deploying an index, indexer and corresponding skillset for AI Search and for Text2SQL.

The above components have been successfully used on production RAG projects to increase the quality of responses.

_The code provided in this repo is a sample of the implementation and should be adjusted before being used in production._

## High Level Implementation

The following diagram shows a workflow for how the Text2SQL and AI Search plugin would be incorporated into a RAG application. Using the plugins available, alongside the Function Calling capabilities of LLMs, the LLM can do Chain of Thought reasoning to determine the steps needed to answer the question. This allows the LLM to recognise intent and therefore pick appropriate data sources based on the intent of the question, or a combination of both.

![High level workflow for a plugin driven RAG application](./images/Plugin%20Based%20RAG%20Flow.png "High Level Workflow")
> [!WARNING]
>
> - The code provided in this repo is a accelerator of the implementation and should be review / adjusted before being used in production.

## Contributing

Expand Down
12 changes: 0 additions & 12 deletions adi_function_app/.env

This file was deleted.

10 changes: 0 additions & 10 deletions adi_function_app/GETTING_STARTED.md

This file was deleted.

Loading
Loading