v0.8.0 changelogs

v0.8.0 is our biggest release yet, featuring significant reliability improvements to VoiceAssistant. This update includes a few breaking API changes that will impact the way you build your agents. We strive to minimize breaking changes and will stabilize the API as we approach version 1.0.

Migrating to v0.8.0 (Breaking Changes)

Job and Worker

entrypoint moved from req.accept() to WorkerOptions

Previously the job entrypoint was in the req.accept() method call. Now the job entrypoint has been moved into WorkerOptions.

namespace removed

The WorkerOptions namespace field has been removed and will be replaced in the future.

explict connection to the room

You now need to call ctx.connect() to initiate the connection to the room. This allows for pre-connect setup (such as callback registrations) to avoid race conditions.

The following shows a minimal_worker.py example:

from livekit.agents import JobContext, JobRequest, WorkerOptions, cli

async def job_entrypoint(ctx: JobContext):
    await ctx.connect()
    ...

if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(entrypoint_fnc=job_entrypoint)
    )

LLM

💡 These changes may not be relevant to users of the VoiceAssistant class.

The LLM class has been restructured to enhance ergonomics and improve the function calling experience.

Function/tool calling

Function calling has gotten a complete overhaul in v0.8.0. Most the the changes are additive and can be found in the New Features section.

The primary breaking change is that function calls are now NOT automatically invoked when iterating the LLM stream. LLMStream.execute_functions needs to be called instead.

TODO: insert code snipper showing some ai_callable fncs

`LLM.chat()` is no longer an async method

Previously, LLM.chat() was an async method that returned an LLMStream (which itself was an AsyncIterable).

We found it easier and less-confusing for LLM.chat() to be synchronous, while still returning the same AsyncIterable LLMStream.

LLM.chat ‘history’ has been renamed to ‘chat_ctx’

In order to improve consistency and reduce confusion.

TODO: insert code snippet

STT

💡 These changes may not be relevant to users of the VoiceAssistant class.

SpeechStream.flush()

Previously, to communicate to a STT provider that you have sent enough input to generate a response - you could push_frame(None) to coax the TTS into synthesizing a response.

In v0.8.0 that API has been removed and replaced with flush()

SpeechStream.end_input()

end_input signals to the STT provider that the input is complete and no additional input will follow. Previously, this was done using aclose(wait=True).

SpeechStream.aclose()

The “wait” arg of aclose has been removed in favor of SpeechStream.end_input (see above). Now, if you call TTS.aclose() without first calling STT.end_input, the behavior will be that the request is cancelled.

stt_stream = my_stt_instance.stream()
async for ev in audio_stream:
  stt_stream.push_frame(ev.frame)
  # optionally flush when enough frames have been pushed
  stt_stream.flush()

stt_stream.end_input()
await stt_stream.aclose()

TTS

💡 These changes may not be relevant to users of the VoiceAssistant class.

SynthesizedAudio changed and SynthesisEvent removed

Most of the fields of the SynthesizedAudio dataclass have been changed:

# New SynthesizedAudio dataclass
@dataclass
class SynthesizedAudio:
    request_id: str
    """Request ID (one segment could be made up of multiple requests)"""
    segment_id: str
    """Segment ID, each segment is separated by a flush"""
    frame: rtc.AudioFrame
    """Synthesized audio frame"""
    delta_text: str = ""
    """Current segment of the synthesized audio"""
    
#Old SynthesizedAudio dataclass
@dataclass
class SynthesizedAudio:
    text: str
    data: rtc.AudioFrame

The SynthesisEvent has been removed entirely. All occurrences of it have been replaced with SynthesizedAudio

SynthesizeStream.flush()

Similar to the STT changes, this coaxes the TTS provider into generating a response. The SynthesizedAudio response will have a new segment_id after calls to flush().

SynthesizeStream.end_input()

Similar to the STT changes, this replaces aclose(wait=True).

SynthesizeStream.aclose()

Similar to the STT changes, the wait arg has been removed.

tts_stream = my_tts_instance.stream()
tts_stream.push_text("This is the first sentence")
tts_stream.flush()
tts_stream.push_text("This is the second sentence")
tts_stream.end_input()
await tts_stream.aclose()

VAD

flush(), end_input(), aclose()

The same changes made to STT and TTS have also been made to VAD

vad_stream = my_vad_instance.stream()
async for ev in audio_stream:
  vad_stream.push_frame(ev.frame)
  # optionally flush when enough frames have been pushed
  vad_stream.flush()

vad_stream.end_input()
await vad_stream.aclose()

VoiceAssistant

Much of the VoiceAssistant API remains unchanged, despite significant improvements to functionality and internals. However, there have been changes to the configuration.

Initialization args

Removed
- base_volume
- debug
- sentence_tokenizer, word_tokenizer, hyphenate_word
Changed
- transcription related options now all fall into the “transcription” arg

class VoiceAssistant(utils.EventEmitter[EventTypes]):
    def __init__(
        self,
        *,
        vad: vad.VAD,
        stt: stt.STT,
        llm: LLM,
        tts: tts.TTS,
        chat_ctx: ChatContext | None = None,
        fnc_ctx: FunctionContext | None = None,
        allow_interruptions: bool = True,
        interrupt_speech_duration: float = 0.6,
        interrupt_min_words: int = 0,
        preemptive_synthesis: bool = True,
        transcription: AssistantTranscriptionOptions = AssistantTranscriptionOptions(),
        will_synthesize_assistant_reply: WillSynthesizeAssistantReply = _default_will_synthesize_assistant_reply,
        plotting: bool = False,
        loop: asyncio.AbstractEventLoop | None = None,
    ) -> None:
    ...

New features

Job and Worker

New prewarm_fnc in WorkerOptions that can be used to setup agent subprocesses before the agent joins the room. Useful for things like loading model weights.
New num_idle_processes in WorkerOptions for keeping a process pool available for subsequent agents. This improves the latency of agents joining rooms and being ready to participate.
Health server listens on 0.0.0.0 by default now instead of localhost

LLM

You can now add AI functions at runtime.
AI functions can now return values and throw exceptions. The return values and exception are automatically added to the chat_ctx so the LLM is aware of them.

VAD

livekit-plugins-silero
- The onnx runtime is used directly now which removes pytorch dependency
- Model weights are included in the python package itself, you no longer need to download model weights as a build step
- The model has been updated to the latest silero model (V5) which has improved accuracy
- Logic fixes to inference + hidden state which improves accuracy

TTS

A new Cartesia plugin has been introduced
SynthesizeStream now has flush() and end_input() for better control over which text input to audio output synchronization
SynthesizedAudio now has a segment_id for more granularity around what audio corresponds to what input text

VoiceAssistant

Big improvements and bug fixes to interrupt logic
Bug fixes for duplicated responses
Bug fixes for stuck responses

RAG

New livekit-plugins-rag package to help with RAG related tasks
- Index builder for creating searchable index
- Nearest neighbor search on indexes based on spotify annoy library