How to effectively embed PDFs with images for a RAG LLM? #4031
Unanswered
lucasmirachi
asked this question in
Q&A
Replies: 1 comment
-
i have added examples within: https://github.com/andysingal/llm-course |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I’m working on a project where I need to embed PDF documents that contain images (which may or may not be relevant to the response) to create a vector database for later retrieval in a Retrieval-Augmented Generation (RAG) LLM. Currently, I’m using Unstructured + Faiss, but I’m not achieving satisfactory results with the images in the PDFs.
Here are some details about my approach:
I’m using the Unstructured library to parse the PDFs.
FAISS is being used to create and manage the vector database.
Text embeddings are working fine, but image embeddings are not yielding good results.
Questions:
What are the best practices for embedding PDFs that contain both text and images?
Are there any specific techniques or libraries that handle image embeddings within PDFs more effectively?
How can I improve the integration of image embeddings with text embeddings in my current setup?
Any advice or suggestions would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions