Ultravox audio streaming #278

FelixNeutatzMainWebSolutions · 2025-02-04T08:50:41Z

Hi everyone,

I am currently experimenting with audio streaming into the model. The idea is to improve latency by ingesting parts of the audio before the utterance of the user is finished. As we already run computation while the user is speaking, we can let the model answer faster.

This is what I came up with:

inference: Optional[ultravox_infer.UltravoxInference] = ultravox_infer.UltravoxInference(
            "fixie-ai/ultravox-v0_4",
            device=None,
            data_type=None,
            conversation_mode=True,
        )

user_audio_prompt = datasets.VoiceSample.from_prompt_and_file("<|audio|>", "part0.mp3")
inference.infer(user_audio_prompt, max_tokens=1)
del inference.past_messages[-1]

user_audio_prompt = datasets.VoiceSample.from_prompt_and_file("<|audio|>", "part1.mp3")
inference.infer(user_audio_prompt, max_tokens=1)
del inference.past_messages[-1]

user_audio_prompt = datasets.VoiceSample.from_prompt_and_file("<|audio|>", "part2.mp3")
output = inference.infer(user_audio_prompt)
print(output)

Is there any more efficient approach to this?

Thank you for your help.

Best regards,
Felix

The text was updated successfully, but these errors were encountered:

zqhuang211 · 2025-02-07T17:27:28Z

That’s one way to do it, but it’s not quite right since the audio segments are encoded separately. Additionally, they are treated as separate speaker turns, and you end up wasting compute on the inference of intermediate segments. The latency incurred from these additional inference steps would be significantly higher than that of speech encoding.

We experimented with block-wise unidirectional encoding for the speech encoder. You can find a config here:

ultravox/ultravox/training/configs/streaming_tinyllama.yaml

Line 2 in ec03371

exp_name: "ultravox-streaming-experiments-1s"

We haven’t done much work on this feature yet, so it could break training/inference or hurt model performance. But it’s a more viable solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultravox audio streaming #278

Ultravox audio streaming #278

FelixNeutatzMainWebSolutions commented Feb 4, 2025

zqhuang211 commented Feb 7, 2025

Ultravox audio streaming #278

Ultravox audio streaming #278

Comments

FelixNeutatzMainWebSolutions commented Feb 4, 2025

zqhuang211 commented Feb 7, 2025