-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement streaming for assistants #215
Conversation
The current state implements the idea in #185 (comment). The UX is now as follows:
|
One thing that also bugs me is that we no longer can access the |
Ok, now I'm actually happy with it 🚀 import asyncio
from ragna import Rag, assistants, source_storages
async def main():
async with Rag().chat(
documents=["ragna.txt"],
source_storage=source_storages.Chroma,
assistant=assistants.ClaudeInstant,
) as chat:
print("Without stream=True")
message = await chat.answer("What is Ragna in ten words or less?")
print(message)
print("#" * 20)
print("With stream=True and iterating")
message = await chat.answer("What is Ragna in ten words or less?", stream=True)
try:
print(message)
except RuntimeError as error:
print(f"{type(error).__name__}: {error}")
async for chunk in message:
print(chunk, end="")
print()
print(message)
print(message.content)
print("#" * 20)
print("With stream=True but no iterating")
message = await chat.answer("What is Ragna in ten words or less?", stream=True)
print(await message.read())
print(message)
print(message.content)
asyncio.run(main()) Compared to #215 (comment), by default users can simply print the message and get its content without awaiting a second time. With Plus, #215 (comment) is resolved as well. We now have |
@pmeier Nice work. The code snippet looks quite sensible and intuitive to me. Let me know if you need me to look over the PR in more depth (but a quick skim seemed fine to me!) |
@peachkeel I could use some input from you on this. You rightfully advocated in #242 to trim the output of the answer REST API endpoint to avoid unnecessary data being sent. I'm currently in the process of deciding what the "message chunk" object looks like if one streams from answer endpoint. As of the latest commit we are using ragna/ragna/deploy/_api/schemas.py Lines 50 to 57 in c9fd83d
in which the One option could be to transmit the chunk on its own for most of the stream and only on the first or last chunk transmit the remaining message data as well. If we go for this, I would prefer sending the message data together with the first chunk for a two reasons:
Happy for other opinions on this or completely different suggestions on how to handle streaming for the REST API. |
Your preference for how to implement this feature is eminently reasonable. The only reason not to send |
Let's wait for demand on this. Thanks for the input. |
I have had a look and my understanding is that the flickering is due to the repeated calls to I've tried various things to fix this but haven't found a proper solution yet. |
My first thought is that it looks like CSS being reset then reloaded every time so it jumps around. Can you check the browser console to see if that's true? Also, this is how we stream internally: https://github.com/holoviz/panel/blob/main/panel/chat/feed.py#L379-L387 |
How would I do that? What are you expecting there to be in case you are right? |
If on Chrome, if the stylesheets are updated, it should show flashing / highlighting I think? (flashing like this, but for the stylesheets): Screen.Recording.2024-01-09.at.1.04.37.PM.movI am expecting the styles to be cached, but if not, it could be a bug in Panel. |
Also, I find it strange that your Copy / Source Info buttons disappear and re-appear. Are they implemented in ReactiveHTML? |
Previous screengrab was from Firefox. The one below is from Chrome. I see some flashing style attributes. More important, on Chrome the message is not rendered at all until the very end: Screencast.from.2024-01-10.21-21-47.webm
They are:
We didn't find a better way to move them below our message than by replacing the original object pane with a column that includes the message and the buttons: ragna/ragna/deploy/_ui/central_view.py Lines 137 to 149 in 20ecade
|
Thanks! Can you expand that flashing style attribute? If I understand Ragna's codebase correctly, I think rather than calling _update_object_pane repeatedly, through
Then, call So you're only affecting the contents pane, and not updating an entire column each time. For reference, here's how we stream: |
I've refactored the message class a little after an offline chat with @ahuang11. Previously, we created with a
This is the result: Screencast.from.2024-01-16.23-56-44.webm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. tested locally and it works.
Addresses the Python API part of #185.