File uploads #441
Replies: 3 comments 1 reply
-
Thanks to @matthewbennink for asking this on my behalf! I will just add that I find it quite useful to be able to upload PDF documents so that the model can do summarization of certain contents, gather data from various locations inside of them, etc. |
Beta Was this translation helpful? Give feedback.
-
Hey guys, it’s definitely doable and I want the app to have support for this but it’s not a trivial feature. Basically, all the major LLMs support images natively. When message text is passed into the LLM a base64 encoded image can be included right alongside the message text and the LLMs know how to make sense of that to answer questions about it. But every other file type needs to be handled by the app rather than by the LLM. RAG is the popular acronym for this. Basically, a file is uploaded, parsed, turned into vectors, and then when you’re about to send a message to the LLM related to this document you quickly use vector and/or keyword search to pull the relevant parts of the document. However, as I write this, I realized that for documents up to a certain size this could be simplified by simply including the full text of the document at the time it’s attached. That would make this a much simpler task to implement. If we want to do true RAG then there are two options. (1) is to handle it within the app. This is the ruby library I’ve been planning to dig into since it has a big focus on different vector databases: https://github.com/patterns-ai-core/langchainrb The advantage of “handling it within the app” is that it would work with every LLM. (2) Alternatively, OpenAI has a new API in which they’ll do the RAG for you. You simply submit docs on a conversation thread and then you can submit messages like normal. I started the process of moving the app over to use these new API v2 chat endpoints but they’re a bit more complex than the v1 endpoints because v1 is stateless. Currently, the app’s database is the source of truth and it submits what’s needed each time. But v2 is storing everything on their servers (assistant settings, conversation history, etc) so every time we make a change in the app database we need to submit this same change to their servers so things stay in sync. AND… it doesn’t look like any other providers are copying OpenAI on this. All providers have really similar v1 chat endpoints but v2 is really “proprietary” to OpenAI. After writing all this out, I think the best first step would be to do a naive implementation where attachments are treated almost like images. They’re allowed to be uploaded, they are stored in active storage, the text contents of them is extracted and turned into a conversation Message, and then they’re just submitted like all the other chat messages. This wouldn’t be too tricky, just a little thought into all the different file formats (pdf, doc, docs, etc) and making sure there is an appropriate “ruby text extractor” for each one. All this said, I don’t have an ETA. I haven’t even started this and I’m pretty deep in tool calling at the moment. But I’m happy to help support and steer if one of you wants to take the lead on an implementation. I can help spec things out. |
Beta Was this translation helpful? Give feedback.
-
As each LLM might support diff file uploads, e.g. claude now supports PDFs, perhaps E.g. claude3 based |
Beta Was this translation helpful? Give feedback.
-
How feasible is it to add support for non-image file uploads, such as PDFs or CSV files? It seems like documents are assumed to be images at the moment.
Beta Was this translation helpful? Give feedback.
All reactions