File uploads #441

matthewbennink · 2024-06-27T18:01:07Z

matthewbennink
Jun 27, 2024

How feasible is it to add support for non-image file uploads, such as PDFs or CSV files? It seems like documents are assumed to be images at the moment.

dadiv · 2024-06-27T18:03:29Z

dadiv
Jun 27, 2024

Thanks to @matthewbennink for asking this on my behalf! I will just add that I find it quite useful to be able to upload PDF documents so that the model can do summarization of certain contents, gather data from various locations inside of them, etc.

0 replies

krschacht · 2024-06-27T19:55:41Z

krschacht
Jun 27, 2024
Maintainer

Hey guys, it’s definitely doable and I want the app to have support for this but it’s not a trivial feature. Basically, all the major LLMs support images natively. When message text is passed into the LLM a base64 encoded image can be included right alongside the message text and the LLMs know how to make sense of that to answer questions about it.

But every other file type needs to be handled by the app rather than by the LLM. RAG is the popular acronym for this. Basically, a file is uploaded, parsed, turned into vectors, and then when you’re about to send a message to the LLM related to this document you quickly use vector and/or keyword search to pull the relevant parts of the document. However, as I write this, I realized that for documents up to a certain size this could be simplified by simply including the full text of the document at the time it’s attached. That would make this a much simpler task to implement.

If we want to do true RAG then there are two options. (1) is to handle it within the app. This is the ruby library I’ve been planning to dig into since it has a big focus on different vector databases: https://github.com/patterns-ai-core/langchainrb The advantage of “handling it within the app” is that it would work with every LLM.

(2) Alternatively, OpenAI has a new API in which they’ll do the RAG for you. You simply submit docs on a conversation thread and then you can submit messages like normal. I started the process of moving the app over to use these new API v2 chat endpoints but they’re a bit more complex than the v1 endpoints because v1 is stateless. Currently, the app’s database is the source of truth and it submits what’s needed each time. But v2 is storing everything on their servers (assistant settings, conversation history, etc) so every time we make a change in the app database we need to submit this same change to their servers so things stay in sync. AND… it doesn’t look like any other providers are copying OpenAI on this. All providers have really similar v1 chat endpoints but v2 is really “proprietary” to OpenAI.

After writing all this out, I think the best first step would be to do a naive implementation where attachments are treated almost like images. They’re allowed to be uploaded, they are stored in active storage, the text contents of them is extracted and turned into a conversation Message, and then they’re just submitted like all the other chat messages. This wouldn’t be too tricky, just a little thought into all the different file formats (pdf, doc, docs, etc) and making sure there is an appropriate “ruby text extractor” for each one.

All this said, I don’t have an ETA. I haven’t even started this and I’m pretty deep in tool calling at the moment. But I’m happy to help support and steer if one of you wants to take the lead on an implementation. I can help spec things out.

0 replies

drnic · 2024-11-24T23:22:37Z

drnic
Nov 24, 2024

As each LLM might support diff file uploads, e.g. claude now supports PDFs, perhaps LanguageModel needs the html string to be offered for the <input type="file" accept="image/*,.pdf"

E.g. claude3 based LanguageModels might have accepts: "image/*,.pdf" and this value goes straight into the HTML above. Or, accepts could be an array of strings, and we concat them for the input.

1 reply

krschacht Nov 25, 2024
Maintainer

Yes, this is a good idea. Recently, someone did this PR to add explicit configuration to indicate if an LLM supports system instructions: #542

We also need a configuration option for supports_streaming. So if you wanted to create a new PR, I think we could add two new fields: supports_streaming and supports_pdfs

And then we could just add conditional on the file input, like you suggest, which pivots on the supports_images and supports_pdfs booleans.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File uploads #441

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

File uploads #441

matthewbennink Jun 27, 2024

Replies: 3 comments · 1 reply

dadiv Jun 27, 2024

krschacht Jun 27, 2024 Maintainer

drnic Nov 24, 2024

krschacht Nov 25, 2024 Maintainer

matthewbennink
Jun 27, 2024

Replies: 3 comments 1 reply

dadiv
Jun 27, 2024

krschacht
Jun 27, 2024
Maintainer

drnic
Nov 24, 2024

krschacht Nov 25, 2024
Maintainer