This project proposes an architecture and demo of building a multimodal RAG on AWS as a serverless and CDK.
The code of the two projects below shows how to build a multimodal RAG.
- aws-bedrock-examples Github: multimodal-rag-pdf.ipynb
- aws-ai-ml-workshop-kr Github: 05_0_load_complex_pdf_kr_opensearch.ipynb
To make the multimodal RAG implemented in the ipynb environment above available to the application by calling them as events, this project migrate them to a serverless environment. To do this, we break the code into multiple Lambdas per module and orchestrate them using AWS Step Functions.
The Step Functions workflow builds a multimodal RAG in 3 steps.
- Load unstructed files using UnstructedFileLoader (or S3FileLoader)
- Generate summarization of Images or Tables using Step Functions Map state to prevent Lambda Timeout issue (Summarized by Anthropic Claude Sonnet 3.0)
- Start vector embedding using Amazon Titan Text Embeddings V2 model and indexing using Amazon OpenSearch Serverless
TBD
- Yoonseo Kim, AWS Associate Solutions Architect
- TBD