Skip to content

ottlseo/multimodal-rag-made-easy

Repository files navigation

Multimodal RAG Made Easy

This project proposes an architecture and demo of building a multimodal RAG on AWS as a serverless and CDK.

The code of the two projects below shows how to build a multimodal RAG.

Architecture

To make the multimodal RAG implemented in the ipynb environment above available to the application by calling them as events, this project migrate them to a serverless environment. To do this, we break the code into multiple Lambdas per module and orchestrate them using AWS Step Functions.

The Step Functions workflow builds a multimodal RAG in 3 steps.

  1. Load unstructed files using UnstructedFileLoader (or S3FileLoader)
  2. Generate summarization of Images or Tables using Step Functions Map state to prevent Lambda Timeout issue (Summarized by Anthropic Claude Sonnet 3.0)
  3. Start vector embedding using Amazon Titan Text Embeddings V2 model and indexing using Amazon OpenSearch Serverless

demogo-ottlseo-0720-advanced-multimodal-rag drawio

Issues

TBD

Contributor

  • Yoonseo Kim, AWS Associate Solutions Architect
  • TBD