Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs sprint] Updates docs for using transcribers #9

Merged
merged 1 commit into from
Jun 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 19 additions & 4 deletions docs/open-source/using-transcribers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, Punct
server = InboundCallServer(
...
transcriber_config=DeepgramTranscriberConfig.from_telephone_input_device(
endpointing_config=PunctuationEndpointingConfig()
endpointing_config=DeepgramEndpointingConfig()
),
...
)
Expand All @@ -56,7 +56,7 @@ async def main():
output_device=speaker_output,
transcriber=DeepgramTranscriber(
DeepgramTranscriberConfig.from_input_device(
microphone_input, endpointing_config=PunctuationEndpointingConfig()
microphone_input, endpointing_config=DeepgramEndpointingConfig()
)
),
...
Expand All @@ -70,7 +70,22 @@ The method takes a `microphone_input` object as an argument and extracts the `sa

Endpointing is the process of understanding when someone has finished speaking. The `EndpointingConfig` controls how this is done. There are a couple of different ways to configure endpointing:

We provide `DeepgramEndpointingConfig()` which has some reasonable defaults and knobs to suit most use-cases (but only works with the Deepgram transcriber).

```
class DeepgramEndpointingConfig(EndpointingConfig, type="deepgram"): # type: ignore
vad_threshold_ms: int = 500
utterance_cutoff_ms: int = 1000
time_silent_config: Optional[TimeSilentConfig] = Field(default_factory=TimeSilentConfig)
use_single_utterance_endpointing_for_first_utterance: bool = False
```

- `vad_threshold_ms`: translates to [Deepgram's `endpointing` feature](https://developers.deepgram.com/docs/endpointing#enable-feature)
- `utterance_cutoff_ms`: uses [Deepgram's Utterance End features](https://developers.deepgram.com/docs/utterance-end)
- `time_silent_config`: is a Vocode specific parameter that marks an utterance final if we haven't seen any new words in X seconds
- `use_single_utterance_endpointing_for_first_utterance`: Uses `is_final` instead of `speech_final` for endpointing for the first utterance (works really well for outbound conversations, where the user's first utterance is something like "Hello?") - see [this doc on Deepgram](https://developers.deepgram.com/docs/understand-endpointing-interim-results) for more info.

Endpointing is highly use-case specific - building a realistic experience for this greatly depends on the person speaking to the AI. Here are few paradigms that we've used to help you along the way:

- Time-based endpointing: This method considers the speaker to be finished when there is a certain duration of silence.
- Punctuation-based endpointing: This method considers the speaker to be finished when there is a certain duration of silence after a punctuation mark.

In the first example, the `PunctuationEndpointingConfig` is used to configure the Deepgram transcriber for punctuation-based endpointing.
Loading