Skip to content

Add support for prompt augmentation via external API #909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oxaronick opened this issue Mar 6, 2024 · 8 comments
Open

Add support for prompt augmentation via external API #909

oxaronick opened this issue Mar 6, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@oxaronick
Copy link

Problem

I love HF chat-ui, and I'd like to deploy it for more teams. However, missing RAG features prevents me from deploying it as widely as I'd like.

Some teams need RAG with PDFs as a data source. Many of these PDFs are oddly formatted, and different types of PDFs require different kinds of parsing/chunking/embedding/whatever. This is not chat-ui's concern - it's mine - and no feature of any LLM chat UI will ever solve this problem in the way I need it solved.

Possible solution

What I'd really like is to have a feature where I can tell chat-ui to get a prompt from the user, call a ReST API to have the prompt translated/augmented/whatever, and then send the resulting prompt to the LLM. (Some way to hover and see what the actual, augmented prompt looked like would also be nice, in case something weird happens and the user wants to know why.)

I'll build the indexing system and present the ReST API to augment prompts, I just need a UI that will use it. I would even be happy to use an existing API as reference or adopt a standard if one exists, but I haven't seen one. Maybe we'll set the standard here.

My plan

I was thinking of forking chat-ui to add these hooks to do "prompt translation" or "prompt augmentation" or whatever the best name is.

Questions

Any advice on where I start? Where in the code would this logic go if I added it?

Are maintainers open to the idea of merging a feature like this if it works?

(Because you never know) Is this already present in chat-ui and I just haven't noticed?

@nsarrazin
Copy link
Collaborator

So the main difference with a RAG API (if I understand correctly) is that you want your system to return different things based on the content of the user prompt. (While RAG will just fetch a static asset, be it web page or PDF, even if it does sentence similarity afterwards).

Seems to me that this could be accomplished using some kind of function calling API? I think this feature would be a nice next-step for chat-ui, but we should discuss what it would look like.

In the meantime if you want to fork and add something a bit more custom built for your own use case, maybe have a look in:

src/lib/server/websearch/searchWeb.ts (which is the code we use to go from search query to list of URLs to parse, maybe you can use that to hook in to your indexing system)

Let me know if you need any other help!

@nsarrazin nsarrazin added the enhancement New feature or request label Mar 7, 2024
@oxaronick
Copy link
Author

So the main difference with a RAG API (if I understand correctly) is that you want your system to return different things based on the content of the user prompt. (While RAG will just fetch a static asset, be it web page or PDF, even if it does sentence similarity afterwards).

Yes, the other system would take the prompt and do a similarity search in a bunch of pre-indexed material (a library of sorts) and augment the prompt with the results of that search. That's a little different from some RAG flows (like chat-ui's web search or open-webui's PDF upload) where material is fetched and indexed as part of a conversation. But we're still augmenting the prompt with relevant results from a data store.

From chat-ui's perspective, though, it doesn't matter what I do in the library, which is good separation of concerns IMO. Someone else could implement what makes sense for them, as long as they follow the API spec.

I'll have a look at the web search code you mentioned and see what I can do. I'm sure I'll have questions. :)

@oxaronick
Copy link
Author

I've managed to get the basic flow working, but I haven't surfaced anything in the UI yet.

I was picturing a "Library search" toggle at the bottom of the conversation next to the "Web search" toggle. If enabled by ENV vars, the toggle would be present and off by default.

If you turn it on, you should see the prompt you entered as you typed it, but maybe there were Updates in the UpdatePad about how the prompt was being augmented. Alternatively, maybe there was something at the bottom of the message showing the augmented prompt, similar to the WebSearch sources.

Thoughts, @nsarrazin ?

@nsarrazin
Copy link
Collaborator

nsarrazin commented Mar 19, 2024

Would a custom search engine work for you as part of the web search instead of having a different feature?

The way it works is currently as follows: (entry point here)

  1. We generate the query from the user conversation [generateQuery]
    webSearch.searchQuery = await generateQuery(messages);

  2. We pass the search query to a search provider which returns a list of relevant urls :
    const results = await searchWeb(webSearch.searchQuery);

  3. We fetch & parse the URLs from above (can be plain-text or HTML), chunk them, then do sentence similarity to augment the prompt with the top 8 chunks. (currently hardcoded but we could make it configurable)

Seems to me like for your use case, you could hook your feature in step 2, and replace the google search by a custom search engine which takes a search query and returns links to the relevant chunks (in plain text) hosted on your server. If you return 8 chunks or less they will all get added to the prompt.

Currently all search engines are all hardcoded, but it should be very easy to add your own there

@secondtruth
Copy link
Contributor

secondtruth commented Mar 19, 2024

Related:

@oxaronick
Copy link
Author

Integrating it as another search engine would be easier, and have minimal impact on the UI. I'll try that route.

The only drawback I can think of is that it's not technically "web" search anymore... :D

@oxaronick
Copy link
Author

I've got the basic flow working. The only thing that felt like Web Search wasn't a good fit was that I needed to skip generateQuery. The "You are tasked with generating web search queries..." doesn't make sense for this use case.

I will probably also look for a way to skip the chunking, since the remote server will be returning chunks rather than whole documents. Do you think some sort of size check makes sense here? Like, if the document is already under 200 characters don't bother chunking?

@oxaronick oxaronick mentioned this issue Mar 20, 2024
6 tasks
@spew
Copy link

spew commented Jun 25, 2024

Another thing that such a thing should support is forwarding the identity of the logged in chat user (if configured with OpenID).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants