Description
In the latest round of architectural changes of Advisors in #1422 there now are two types of advisors:
CallAroundAdvisor
StreamAroundAdvisor
In the case of the non-streaming one, it's easy to take some actions based on the entire response. However, the streaming manipulates an entire stream. A next advisor in the chain can also manipulate the entire stream. If a stream advisor in the middle is acting upon each chunk of the response, all should be fine. However, if the advisor is only interested in the entire aggregation it would modify the stream in a way that aggregates everything in a side channel, e.g. using org.springframework.ai.chat.model.MessageAggregator
class. If multiple advisors perform the same type of aggregation it is inefficient in terms of both time and memory.
Having that, I propose a new interface, StreamAggregationAdvisor
. Instances of this type would be fed with an aggregation of the original stream of chunks coming back from the model on their way into the application before any other advisors have a chance to manipulate the stream. The aggregation would then be performed once and could deal with the unaltered view of the exchange. The way to implement this behaviour would be based on utilizing the innermost StreamAroundAdvisor
that is created in the DefaultChatClient
.