Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 sec. extra latency added since v.0.0.57 #1319

Open
aomegaai opened this issue Feb 28, 2025 · 2 comments
Open

1 sec. extra latency added since v.0.0.57 #1319

aomegaai opened this issue Feb 28, 2025 · 2 comments
Assignees

Comments

@aomegaai
Copy link

Hi pipecat team. Thanks for all your amazing work!

In v.0.0.57 you added this in processors.aggregators.llm_response.py:

class LLMUserContextAggregator(LLMContextResponseAggregator):
    """This is a user LLM aggregator that uses an LLM context to store the
    conversation. It aggregates transcriptions from the STT service and it has
    logic to handle multiple scenarios where transcriptions are received between
    VAD events (`UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame`) or
    even outside or no VAD events at all.

    """

    def __init__(
        self,
        context: OpenAILLMContext,
        aggregation_timeout: float = 1.0,
        bot_interruption_timeout: float = 2.0,

The unfortunate and undocumented behavior induced by aggregation_timeout=1 is that it adds an extra one sec. latency, which is completely unavoidable and unnecessary. I suspect this must be a missed side-effect of the intended functionality (which is not well documented)?

FYI We tested this with your own twilio-chatbot example using Silero and VADParams(onfidence=0.01, min_volume=0.01, start_secs=0.0001, stop_secs=0.0001) in v0.0.54, v0.0.55, v0.0.56, v0.0.57, and v0.0.58 -- running the exact same code (copied from v0.0.58).

We can of course just set aggregation_timeout=0 (which we did), but the natural way of instantiating the user context aggregator is something like this (e.g. in bot.py):

    context = OpenAILLMContext()
    context_aggregator = llm.create_context_aggregator(context)

Thus, I would recommend setting the default value for aggregation_timeout to zero, or adding it as a keyword parameter in llm.create_context_aggregator().

Thanks!

@aconchillo
Copy link
Contributor

Hi pipecat team. Thanks for all your amazing work!

In v.0.0.57 you added this in processors.aggregators.llm_response.py:

class LLMUserContextAggregator(LLMContextResponseAggregator):
    """This is a user LLM aggregator that uses an LLM context to store the
    conversation. It aggregates transcriptions from the STT service and it has
    logic to handle multiple scenarios where transcriptions are received between
    VAD events (`UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame`) or
    even outside or no VAD events at all.

    """

    def __init__(
        self,
        context: OpenAILLMContext,
        aggregation_timeout: float = 1.0,
        bot_interruption_timeout: float = 2.0,

The unfortunate and undocumented behavior induced by aggregation_timeout=1 is that it adds an extra one sec. latency, which is completely unavoidable and unnecessary. I suspect this must be a missed side-effect of the intended functionality (which is not well documented)?

FYI We tested this with your own twilio-chatbot example using Silero and VADParams(onfidence=0.01, min_volume=0.01, start_secs=0.0001, stop_secs=0.0001) in v0.0.54, v0.0.55, v0.0.56, v0.0.57, and v0.0.58 -- running the exact same code (copied from v0.0.58).

The reason for this is because before this change it was possible for transcriptions to be received outside of UserStartedSpeakingFrame and UserStoppedSpeakingFrame or even sometimes VAD doesn't detect the user is speaking but the STT does in which case you would only get a transcription. For example, short utterances like "yeah", "yes", "right", "sure" were usually missed which caused a lot of confusion because the bot stopped speaking.

I totally agree that the 1 second timeout is unfortunate, but see below. 😅

We can of course just set aggregation_timeout=0 (which we did), but the natural way of instantiating the user context aggregator is something like this (e.g. in bot.py):

    context = OpenAILLMContext()
    context_aggregator = llm.create_context_aggregator(context)

Thus, I would recommend setting the default value for aggregation_timeout to zero, or adding it as a keyword parameter in llm.create_context_aggregator().

I was just in the middle of doing just that 😄

@aconchillo aconchillo self-assigned this Mar 1, 2025
@aconchillo
Copy link
Contributor

See #1321

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants