Skip to content

Optimized aggregation advisors for streaming scenarios #1439

Open
@chemicL

Description

@chemicL

In the latest round of architectural changes of Advisors in #1422 there now are two types of advisors:

  • CallAroundAdvisor
  • StreamAroundAdvisor

In the case of the non-streaming one, it's easy to take some actions based on the entire response. However, the streaming manipulates an entire stream. A next advisor in the chain can also manipulate the entire stream. If a stream advisor in the middle is acting upon each chunk of the response, all should be fine. However, if the advisor is only interested in the entire aggregation it would modify the stream in a way that aggregates everything in a side channel, e.g. using org.springframework.ai.chat.model.MessageAggregator class. If multiple advisors perform the same type of aggregation it is inefficient in terms of both time and memory.

Having that, I propose a new interface, StreamAggregationAdvisor. Instances of this type would be fed with an aggregation of the original stream of chunks coming back from the model on their way into the application before any other advisors have a chance to manipulate the stream. The aggregation would then be performed once and could deal with the unaltered view of the exchange. The way to implement this behaviour would be based on utilizing the innermost StreamAroundAdvisor that is created in the DefaultChatClient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions