implement streaming for assistants #215

pmeier · 2023-11-20T22:44:53Z

Addresses the Python API part of #185.

pmeier · 2023-12-13T21:51:54Z

The current state implements the idea in #185 (comment). The UX is now as follows:

`main`

import asyncio
from ragna import Rag, assistants, source_storages

async def main():
    async with Rag().chat(
        documents=["ragna.txt"],
        source_storage=source_storages.Chroma,
        assistant=assistants.ClaudeInstant,
    ) as chat:
        message = await chat.answer("What is Ragna in ten words or less?")
        print(message)

asyncio.run(main())

 Ragna is an open source framework for building RAG-based AI applications.

PR

import asyncio
from ragna import Rag, assistants, source_storages

async def main():
    async with Rag().chat(
        documents=["ragna.txt"],
        source_storage=source_storages.Chroma,
        assistant=assistants.ClaudeInstant,
    ) as chat:
        message1 = await chat.answer("What is Ragna in ten words or less?")
        print(message1)
        print()
        print(await message1.content())
        print()
        print(message1)

        print("#" * 20)

        message2 = await chat.answer("What is Ragna in ten words or less?")
        print(message2)
        print()
        async for chunk in message2:
            print(chunk, end="")
        print()
        print()
        print(message2)

asyncio.run(main())

...

 Ragna is an open source framework for building RAG-based AI applications.

 Ragna is an open source framework for building RAG-based AI applications.
####################
...

 Ragna is an open source framework for building RAG-based AI applications.

 Ragna is an open source framework for building RAG-based AI applications.

So the trade-off for the ability to stream the answer is that user needs to await twice. Once to get the message object from the chat and a second time to get its content (either await message.content() or async for chunk in message). Printing the message when it hasn't been awaited yet currently just shows ..., but that also could be a source on confusion.

I'm not a 100% happy with the result. One option might be to put a stream: bool = False parameter on Chat.answer() and unless that is True, i.e. the user explicitly needs to set it, we exhaust the stream inside the function and thus remove the requirement of awaiting again.

Thoughts @nenb?

pmeier · 2023-12-13T22:04:19Z

One thing that also bugs me is that we no longer can access the message.content attribute from the outside. After the message is awaited, we can only access it, by "awaiting Message.content() again" (we'll just return the stored value) or by Message.__str__. I'll have another look at the response classes of HTTP clients on how they do it.

pmeier · 2023-12-13T22:53:17Z

Ok, now I'm actually happy with it 🚀

import asyncio
from ragna import Rag, assistants, source_storages

async def main():
    async with Rag().chat(
        documents=["ragna.txt"],
        source_storage=source_storages.Chroma,
        assistant=assistants.ClaudeInstant,
    ) as chat:
        print("Without stream=True")
        message = await chat.answer("What is Ragna in ten words or less?")
        print(message)

        print("#" * 20)

        print("With stream=True and iterating")
        message = await chat.answer("What is Ragna in ten words or less?", stream=True)

        try:
            print(message)
        except RuntimeError as error:
            print(f"{type(error).__name__}: {error}")

        async for chunk in message:
            print(chunk, end="")
        print()
        print(message)
        print(message.content)

        print("#" * 20)

        print("With stream=True but no iterating")

        message = await chat.answer("What is Ragna in ten words or less?", stream=True)
        print(await message.read())
        print(message)
        print(message.content)

asyncio.run(main())

Compared to #215 (comment), by default users can simply print the message and get its content without awaiting a second time. With stream=True, users now get control over how they want to read the message either in chunks, i.e. iterating over it, or simply .read()ing everything in one go.

Plus, #215 (comment) is resolved as well. We now have Message.content property again. Accessing it when the content stream has not been consumed yet will raise an error. Same for Message.__str__ and Message.__repr__ which internally rely on Message.content.

nenb · 2023-12-17T19:42:55Z

@pmeier Nice work. The code snippet looks quite sensible and intuitive to me.

Let me know if you need me to look over the PR in more depth (but a quick skim seemed fine to me!)

pmeier · 2023-12-18T14:19:38Z

@peachkeel I could use some input from you on this. You rightfully advocated in #242 to trim the output of the answer REST API endpoint to avoid unnecessary data being sent. I'm currently in the process of deciding what the "message chunk" object looks like if one streams from answer endpoint. As of the latest commit we are using

ragna/ragna/deploy/_api/schemas.py

Lines 50 to 57 in c9fd83d

    
           class Message(BaseModel): 
        
               id: uuid.UUID = Field(default_factory=uuid.uuid4) 
        
               content: str 
        
               role: ragna.core.MessageRole 
        
               sources: list[Source] = Field(default_factory=list) 
        
               timestamp: datetime.datetime = Field( 
        
                   default_factory=lambda: datetime.datetime.utcnow() 
        
               )

in which the content attribute is just the message chunk. Meaning, for every chunk we transmit the full source object. While already wasteful, this will be greatly amplified by #248, which we agreed to implement. So we need to come up with a better scheme.

One option could be to transmit the chunk on its own for most of the stream and only on the first or last chunk transmit the remaining message data as well. If we go for this, I would prefer sending the message data together with the first chunk for a two reasons:

All the other information is available when we get the first chunk. Thus, there is no reason to not send it to the client right away.
This would match the Python API. In there we have direct access to .sources etc. attributes when the object is created.

Happy for other opinions on this or completely different suggestions on how to handle streaming for the REST API.

peachkeel · 2023-12-18T22:16:45Z

Your preference for how to implement this feature is eminently reasonable. The only reason not to send sources with the first chunk might be the added latency. If that's a factor for people, maybe having an option like /chats/{id}/answer?sources=none or /chats/{id}/answer?sources=last would be the best way to go so all bases are covered.

pmeier · 2023-12-18T23:14:45Z

If that's a factor for people, maybe having an option like /chats/{id}/answer?sources=none or /chats/{id}/answer?sources=last would be the best way to go so all bases are covered.

Let's wait for demand on this. Thanks for the input.

pmeier · 2024-01-04T23:22:33Z

With #261 finally merged, adding streaming support for the UI was a breeze. Or so I thought. It kinda works, but we have an horrible "blinking" due to the rapid updates:

Screencast.from.2024-01-05.00-13-32.webm

@ahuang11 any idea where this is coming from or how to fix it?

pierrotsmnrd · 2024-01-09T15:08:36Z

I have had a look and my understanding is that the flickering is due to the repeated calls to ChatMessage._update_object_pane when updating message.object here.

I've tried various things to fix this but haven't found a proper solution yet.

ahuang11 · 2024-01-09T18:59:24Z

My first thought is that it looks like CSS being reset then reloaded every time so it jumps around. Can you check the browser console to see if that's true?

Also, this is how we stream internally: https://github.com/holoviz/panel/blob/main/panel/chat/feed.py#L379-L387

pmeier · 2024-01-09T20:58:27Z

@ahuang11

Can you check the browser console to see if that's true?

How would I do that? What are you expecting there to be in case you are right?

ahuang11 · 2024-01-09T21:03:58Z

If on Chrome, if the stylesheets are updated, it should show flashing / highlighting I think?

(flashing like this, but for the stylesheets):

Screen.Recording.2024-01-09.at.1.04.37.PM.mov

I am expecting the styles to be cached, but if not, it could be a bug in Panel.

ahuang11 · 2024-01-09T21:06:30Z

Also, I find it strange that your Copy / Source Info buttons disappear and re-appear. Are they implemented in ReactiveHTML?

pmeier · 2024-01-10T20:27:33Z

@ahuang11

If on Chrome, if the stylesheets are updated, it should show flashing / highlighting I think?

Previous screengrab was from Firefox. The one below is from Chrome. I see some flashing style attributes. More important, on Chrome the message is not rendered at all until the very end:

Screencast.from.2024-01-10.21-21-47.webm

Also, I find it strange that your Copy / Source Info buttons disappear and re-appear. Are they implemented in ReactiveHTML?

They are:

ragna/ragna/deploy/_ui/central_view.py

Lines 47 to 87 in 20ecade

    
           class CopyToClipboardButton(ReactiveHTML): 
        
               title = param.String(default=None, doc="The title of the button ") 
        
               value = param.String(default=None, doc="The text to copy to the clipboard.") 
        
               _template = """ 
        
                   <div type="button"  
        
                           id="copy-button" 
        
                           onclick="${script('copy_to_clipboard')}" 
        
                           class="container" 
        
                           style="cursor: pointer;" 
        
                   > 
        
                       <svg xmlns="http://www.w3.org/2000/svg" class="icon icon-tabler icon-tabler-clipboard" width="16" height="16"  
        
                               viewBox="0 0 24 24" stroke-width="2" stroke="gray" fill="none" stroke-linecap="round" stroke-linejoin="round"> 
        
                           <path stroke="none" d="M0 0h24v24H0z" fill="none"/> 
        
                           <path d="M9 5h-2a2 2 0 0 0 -2 2v12a2 2 0 0 0 2 2h10a2 2 0 0 0 2 -2v-12a2 2 0 0 0 -2 -2h-2" /> 
        
                           <path d="M9 3m0 2a2 2 0 0 1 2 -2h2a2 2 0 0 1 2 2v0a2 2 0 0 1 -2 2h-2a2 2 0 0 1 -2 -2z" /> 
        
                       </svg> 
        
                       <span>${title}</span> 
        
                   </div>             
        
                   """ 
        
               _scripts = { 
        
                   "copy_to_clipboard": """navigator.clipboard.writeText(`${data.value}`);""" 
        
               } 
        
               _stylesheets = [ 
        
                   """ div.container {  
        
                                       display:flex; 
        
                                       flex-direction: row; 
        
                                       margin: 7px 10px; 
        
                                       color:gray; 
        
                                   }  
        
                                   svg { 
        
                                       margin-right:5px; 
        
                                   } 
        
                               """ 
        
               ]

ragna/ragna/deploy/_ui/central_view.py

Lines 151 to 170 in 20ecade

    
           def _copy_and_source_view_buttons(self) -> pn.Row: 
        
               return pn.Row( 
        
                   CopyToClipboardButton( 
        
                       value=self.object, 
        
                       title="Copy", 
        
                       stylesheets=[ 
        
                           ui.CHAT_INTERFACE_CUSTOM_BUTTON, 
        
                       ], 
        
                   ), 
        
                   pn.widgets.Button( 
        
                       name="Source Info", 
        
                       icon="info-circle", 
        
                       stylesheets=[ 
        
                           ui.CHAT_INTERFACE_CUSTOM_BUTTON, 
        
                       ], 
        
                       on_click=lambda event: self.on_click_source_info_callback( 
        
                           event, self.sources 
        
                       ), 
        
                   ), 
        
               )

We didn't find a better way to move them below our message than by replacing the original object pane with a column that includes the message and the buttons:

ragna/ragna/deploy/_ui/central_view.py

Lines 137 to 149 in 20ecade

    
           def _update_object_pane(self, event=None): 
        
               super()._update_object_pane(event) 
        
               if self.sources: 
        
                   assert self.role == "assistant" 
        
                   css_class = "message-content-assistant-with-buttons" 
        
                   self._object_panel = self._center_row[0] = pn.Column( 
        
                       self._object_panel, 
        
                       self._copy_and_source_view_buttons(), 
        
                       css_classes=["message", css_class], 
        
                       stylesheets=ui.stylesheets( 
        
                           (f":host(.{css_class})", self._content_style_declarations) 
        
                       ), 
        
                   )

ahuang11 · 2024-01-12T19:38:59Z

Thanks! Can you expand that flashing style attribute?

If I understand Ragna's codebase correctly, I think rather than calling _update_object_pane repeatedly, through message.object += chunk["content"], I think you can store a _contents_pane internally

_contents_pane = pn.pane.Markdown()
message.object = pn.Column(_contents_pane, ...)  # include the other two things you have

Then, call
message._contents_pane.object += ...

So you're only affecting the contents pane, and not updating an entire column each time.

For reference, here's how we stream:
https://github.com/holoviz/panel/blob/main/panel/chat/feed.py#L428-L452
https://github.com/holoviz/panel/blob/main/panel/chat/feed.py#L370-L411

pmeier · 2024-01-16T22:58:40Z

I've refactored the message class a little after an offline chat with @ahuang11. Previously, we created with a str as object and had a renderer set that turned it into a pn.pane.Markdown down the line. Now, we do this inside the constructor and pass that up to the base class constructor. This has two benefits:

We don't need to overwrite private functionality from the base class to get our buttons in. In case we have a message from an assistant, we just wrap the markdown into a pn.Column right away.
We can create a self.content_pane object that can be updated while streaming the message without rendering anything else anew.

This is the result:

Screencast.from.2024-01-16.23-56-44.webm

pierrotsmnrd

Looks good to me. tested locally and it works.

pmeier added 2 commits November 20, 2023 23:44

implement streaming for assistants

1412c93

update dependencies

ba2c1b1

pmeier mentioned this pull request Nov 30, 2023

Streaming assistant responses #185

Closed

pmeier added 3 commits December 13, 2023 00:42

Merge branch 'main' into streaming

c7ea351

refactore message object

2dc43db

readd answer docstring

30c99b2

make content a property and don't stream by default

9cc6876

pmeier added 10 commits December 14, 2023 00:30

add tests

cbbce10

implement streaming for the API

60de788

make httpx_sse a core dependency

d7e25b7

tests

5a422b6

mypy

95e6284

use starlette for concurrency

6367264

make casting more explicit

b6a236a

mypy

c52ec7b

Merge branch 'main' into streaming

e00064e

Merge branch 'main' into streaming

53c436c

dirty

d90818b

pmeier mentioned this pull request Dec 21, 2023

Cleanup UI code #261

Merged

pmeier added 3 commits January 4, 2024 23:47

Merge branch 'main' into streaming

178321f

update UI

e927f74

mypy

60d37a6

pmeier marked this pull request as ready for review January 4, 2024 23:22

remove update of object pane

1c7a520

Merge branch 'main' into streaming

3c723fd

pierrotsmnrd approved these changes Jan 17, 2024

View reviewed changes

pmeier merged commit c08a223 into main Jan 17, 2024
10 checks passed

pmeier deleted the streaming branch January 17, 2024 14:06

pmeier mentioned this pull request Feb 12, 2024

only send sources on first chunk when streaming #317

Merged

pmeier mentioned this pull request Mar 19, 2024

fix add_chats.py script #372

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement streaming for assistants #215

implement streaming for assistants #215

pmeier commented Nov 20, 2023

pmeier commented Dec 13, 2023

pmeier commented Dec 13, 2023

pmeier commented Dec 13, 2023 •

edited

Loading

nenb commented Dec 17, 2023

pmeier commented Dec 18, 2023

peachkeel commented Dec 18, 2023

pmeier commented Dec 18, 2023

pmeier commented Jan 4, 2024 •

edited

Loading

pierrotsmnrd commented Jan 9, 2024

ahuang11 commented Jan 9, 2024 •

edited

Loading

pmeier commented Jan 9, 2024 •

edited

Loading

ahuang11 commented Jan 9, 2024 •

edited

Loading

ahuang11 commented Jan 9, 2024

pmeier commented Jan 10, 2024

ahuang11 commented Jan 12, 2024 •

edited

Loading

pmeier commented Jan 16, 2024

pierrotsmnrd left a comment

implement streaming for assistants #215

implement streaming for assistants #215

Conversation

pmeier commented Nov 20, 2023

pmeier commented Dec 13, 2023

main

PR

pmeier commented Dec 13, 2023

pmeier commented Dec 13, 2023 • edited Loading

nenb commented Dec 17, 2023

pmeier commented Dec 18, 2023

peachkeel commented Dec 18, 2023

pmeier commented Dec 18, 2023

pmeier commented Jan 4, 2024 • edited Loading

pierrotsmnrd commented Jan 9, 2024

ahuang11 commented Jan 9, 2024 • edited Loading

pmeier commented Jan 9, 2024 • edited Loading

ahuang11 commented Jan 9, 2024 • edited Loading

ahuang11 commented Jan 9, 2024

pmeier commented Jan 10, 2024

ahuang11 commented Jan 12, 2024 • edited Loading

pmeier commented Jan 16, 2024

pierrotsmnrd left a comment

Choose a reason for hiding this comment

`main`

pmeier commented Dec 13, 2023 •

edited

Loading

pmeier commented Jan 4, 2024 •

edited

Loading

ahuang11 commented Jan 9, 2024 •

edited

Loading

pmeier commented Jan 9, 2024 •

edited

Loading

ahuang11 commented Jan 9, 2024 •

edited

Loading

ahuang11 commented Jan 12, 2024 •

edited

Loading