-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Support actual streaming for AzureAI #1054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support actual streaming for AzureAI #1054
Conversation
Also remark: This PR is nicely working with a pretty big professional project with >50 functions and streaming. I do not have access to the Images service though. So would be nice if you can also check if that is working for you. |
Out of date now. Follow up pr will come. |
Reopening this as a reference util @timostark submits the new PR. Looking forward for it :) |
Not sure I follow? |
@tzolov: So I had to debug that for quite a while now.. The point is that streaming is actually working (unfortunatly not really smooth, due to an very strange implementation in azure. So the question is why it is not working in REST Service: As far as i see, the issue is that the reactive web-services are not sending out the stream results (with the example provided on the Spring AI page), as the Flux is blocked using the sync approach. I am unfortunatly not an expert at all in the usage of Schedulers / Flux && Threads, but my assumption is, that the Event-Stream sending is blocked by the Azure OpenAI call on the same thread and can not send out the events until the event stream is done. The usage of Schedulers is not required when using the Azure OpenAI Async library - because there is no blocking thread in the thread of the rest call. Alternatives would be
What is your opinion here? I would tend for option (c) but I am not sure if that has side effects.. Code what is working using current SpringAI implementation:
"Wrong" documentation: https://docs.spring.io/spring-ai/reference/api/chat/azure-openai-chat.html |
@tzolov OK, would need a little help to create a PR.. using subscribeOn / publishOn (as mentioned above) i was not able to find a functional solution. When using this approach (either in SpringAI or in my own code) the WebFlux implementation will be stuck after ~~100-300 tokens. The flatMapIterable is called, but the map((choice)) is not called anymore (in SpringAI). That only happens in case the result is forewarded to a webflux. Might be threading or backpressure.. I am not deep enough into the details of (Web)Flux/Threading/Schedulers to understand the root cause. For me the WebFlux stream is working nicely if i isolate the handling to an own thread (probably the same what the OpenAiAsync lib by azure is doing) - See example: ( sorry, that will probably work much nicer with some integrated stuff of Flux - as said I am really not an expert here).
This is of course very ugly to have outside of the lib code.. So as said above: Do you have any preferred solution? |
Just a quick comment, we should avoid an implementation that uses |
@timostark if the Async Client is working as expected for streaming response can't we use both clients, the AzureClient for call and Async Azure Client for stream? |
@tzolov yup certainly.. i will adjust the code accordingly in the next days and raise a PR. |
It looks like the bug fix was done a month ago, when can we expect it to make it to the PR? |
@tzolov @timostark |
@hy4470 @tzolov sorry i forgot about that issue. i am currently blocked with my work projects, but just fyi there is a functional workaround to get streaming working with the current status of SpringAI (Azure) by simply running the azureOpenAiChatmodel.stream inside an isolated runnable with a seperated EventSink.. That is reaaally no nice code, but imo good enough as a functional workaround. Regarding the suggested change: I am really not sure if that is the general way to go for a central library like SpringAI. I weren'table to find a meanigful Azure Async Client documentation (what is the difference between Async/non-async). So i am worried that there are any side effects when switching everything to AzureAsynClient i am not aware of (e.g. bad performance, missed tokens, simply deprecated stuff, ...). Extremly simplified functional code of the workaround:
You can return this value in your TEXT_EVENT_STREAM_VALUE GetMapping..
|
Hello all! Great work on the library all around, by the way! We are happily using it in production and it works like a charm! Asking about the deadline, because we have some use cases that would benefit from streaming being integrated in the Azure bean, just like it works for the Bedrock/Anthropic ones! Once again, really great work so far! |
This is working now, see this PR for the details. #1447 Thanks for everyone's patience! |
I conducted a test with the modified version and found that many tokens are being lost. Did anyone else not experience this issue? |
@hy4470 I’ll give this a try when milestone 3 is officially released. |
@tzolov @markpollack Is milestone 3 officially available? I’ll try this out if it is 🥳 |
Yes, M3 is available now. https://spring.io/blog/2024/10/08/spring-ai-1-0-0-m3-released |
Yes, very awesome!! I’ve just seen this now, was knee deep in implementing stuff and this fell under my radar, will try it out today! Thank you! 🙏 |
Follow up of #1042
As already raised in a few PRs the streaming is not really working at the moment for azure.. This PR contains the following fixes:
Can you have a look if that is the correct direction for you?