Skip to content

chore(usm-streaming): add Vapi docs content #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added fern/assets/img/vapi/Vapi-Step1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fern/assets/img/vapi/Vapi-Step2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fern/assets/img/vapi/Vapi-Step3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 5 additions & 1 deletion fern/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ navigation:
- page: Pipecat
path: pages/02-speech-to-text/pipecat-intro-guide.mdx
slug: /voice-agents/pipecat-intro-guide
- page: Vapi
path: pages/02-speech-to-text/universal-streaming/voice-agents/vapi.mdx
slug: /vapi
- page: Introducing Slam-1
path: pages/01-getting-started/slam-1.mdx
slug: /getting-started/slam-1
Expand Down Expand Up @@ -484,6 +487,8 @@ navigation:
path: pages/02-speech-to-text/universal-streaming/voice-agents/livekit.mdx
- page: Pipecat
path: pages/02-speech-to-text/universal-streaming/voice-agents/pipecat.mdx
- page: Vapi
path: pages/02-speech-to-text/universal-streaming/voice-agents/vapi.mdx
- section: LangChain
path: pages/06-integrations/langchain.mdx
slug: /langchain
Expand Down Expand Up @@ -643,7 +648,6 @@ navigation:
- page: Vapi
path: pages/02-speech-to-text/universal-streaming/voice-agents/vapi.mdx
slug: /vapi
hidden: true
- page: Turn detection
path: pages/02-speech-to-text/universal-streaming/turn-detection.mdx
slug: /turn-detection
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1003,8 +1003,9 @@ Utilizing our ongoing transcriptions in this manner will allow you to achieve th
<Card title="Pipecat" icon={<img src="https://assemblyaiassets.com/images/Pipecat.svg" alt="Pipecat logo"/>} href="/docs/speech-to-text/universal-streaming/pipecat">
View our Pipecat integration guide.
</Card>
{/* <Card title="Vapi" icon={<img src="https://assemblyaiassets.com/images/Vapi.svg" alt="Vapi logo"/>} href="https://docs.vapi.ai/providers/transcriber/assembly-ai">
View Vapi's AssemblyAI STT plugin documentation. TO DO: ADD VAPI */}
<Card title="Vapi" icon={<img src="https://assemblyaiassets.com/images/Vapi.svg" alt="Vapi logo"/>} href="/docs/speech-to-text/universal-streaming/vapi">
View our Vapi integration guide.
</Card>
</CardGroup>


Expand Down
Original file line number Diff line number Diff line change
@@ -1,58 +1,46 @@
---
title: "Turn detection"
description: "Intelligent turn detection with Streaming Speech-to-Text"
title: "Vapi"
description: "Vapi voice agent integration"
---

### Overview
## Overview

AssemblyAI's end-of-turn detection functionality is integrated into our Streaming STT model, leveraging both acoustic and semantic features, and is coupled with a traditional silence-based heuristic approach. Both mechanisms work jointly and either can trigger end-of-turn detection throughout the audio stream. This joint approach significantly enhances the speed and accuracy of end-of-turn detection while allowing this functionality to fall back to the traditional method when the model makes a misprediction.
Vapi is a developer platform for building voice AI agents, they handle the complex backend of voice agents for you so you can focus on creating great voice experiences. In this guide, we'll show you how to integrate AssemblyAI's streaming speech-to-text model into your Vapi voice agent.

<Note>
End-of-turn and end-of-utterances refer to the same thing and may be used
interchangeably in these docs.
</Note>
<Card
title="Vapi"
icon={<img src="https://assemblyaiassets.com/images/Vapi.svg" alt="Vapi logo"/>}
href="https://docs.vapi.ai/providers/transcriber/assembly-ai"
>
View Vapi's AssemblyAI STT provider documentation.
</Card>

### Model-based detection
## Quick start

Triggers when **all** conditions are met:
<Steps>
**Head to the "Assistants" tab in your Vapi dashboard.**

#### EOT token predicted
<Frame>
<img src="/assets/img/vapi/Vapi-Step1.png" />
</Frame>

- Model predicts semantic end-of-turn with a probability greater than `end_of_turn_confidence_threshold`
- Default: 0.5 (user configurable)
**Click on your assistant and then the "Transcriber" tab.**

#### Minimum silence duration has passed
<Frame>
<img src="/assets/img/vapi/Vapi-Step2.png" />
</Frame>

- After the last non-silence word token, `min_end_of_turn_silence_when_confident` milliseconds must pass
- Default: 2400ms (user configurable)
**Select "assembly-ai" on the Provider dropdown.**

#### Minimum speech duration spoken
<Frame>
<img src="/assets/img/vapi/Vapi-Step3.png" />
</Frame>
</Steps>

- The user must speak for at least 80ms since the last end-of-turn (ensures at least one word)
- Set to 80 ms (internal)
Your voice agent now uses **AssemblyAI** for speech-to-text (STT) processing.

#### Word finalized
<Info>
New to Vapi? Visit the [Quickstart Guide](https://docs.vapi.ai/quickstart/introduction) to explore various example voice agent workflows. For the easiest way to test a voice agent, follow this [simple phone-based guide](https://docs.vapi.ai/quickstart/phone). Vapi offers a wide range of example workflows to get you up and running quickly.
</Info>

- Last word in `turn.words` has been finalized
- Internal configuration

### Silence-based detection

Triggers when **all** conditions are met:

#### Minimum speech duration spoken

- The user must speak for at least 80ms since the last end-of-turn (ensures at least one word)
- Set to 80 ms (internal)

#### Maximum silence duration has passed

- After the last non-silence word token, `max_turn_silence` milliseconds must pass
- Default: 2400ms (user configurable)

### Important notes

- Silence-based detection can override model-based detection even with high EOT confidence thresholds
- Word finalization always takes precedence — endpointing won't occur until the last word is finalized
- We define end-of-turn detection as the process of detecting the end of sustained speech activity, often called end-pointing in the Voice Agents context

11 changes: 9 additions & 2 deletions fern/pages/06-integrations/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,25 @@ AssemblyAI seamlessly integrates with a variety of tools and platforms to enhanc
<CardGroup>
<Card
title="Livekit"
icon="forward-fast"
icon={<img src="https://assemblyaiassets.com/images/Livekit.svg" alt="Livekit logo"/>}
href="/docs/integrations/livekit"
>
Use AssemblyAI with Livekit's voice agent orchestrator.
</Card>
<Card
title="Pipecat"
icon="cat"
icon={<img src="https://assemblyaiassets.com/images/Pipecat.svg" alt="Pipecat logo"/>}
href="/docs/integrations/pipecat"
>
Use AssemblyAI with Pipecat's voice agent orchestrator.
</Card>
<Card
title="Vapi"
icon={<img src="https://assemblyaiassets.com/images/Vapi.svg" alt="Vapi logo"/>}
href="/docs/integrations/vapi"
>
Use AssemblyAI with Vapi's voice agent orchestrator.
</Card>
</CardGroup>

## No-Code Integrations
Expand Down
Loading