Skip to content

Commit 7401140

Browse files
authored
[docs sprint] Updates docs for using transcribers (#9)
1 parent cfd6226 commit 7401140

File tree

1 file changed

+19
-4
lines changed

1 file changed

+19
-4
lines changed

docs/open-source/using-transcribers.mdx

+19-4
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, Punct
3333
server = InboundCallServer(
3434
...
3535
transcriber_config=DeepgramTranscriberConfig.from_telephone_input_device(
36-
endpointing_config=PunctuationEndpointingConfig()
36+
endpointing_config=DeepgramEndpointingConfig()
3737
),
3838
...
3939
)
@@ -56,7 +56,7 @@ async def main():
5656
output_device=speaker_output,
5757
transcriber=DeepgramTranscriber(
5858
DeepgramTranscriberConfig.from_input_device(
59-
microphone_input, endpointing_config=PunctuationEndpointingConfig()
59+
microphone_input, endpointing_config=DeepgramEndpointingConfig()
6060
)
6161
),
6262
...
@@ -70,7 +70,22 @@ The method takes a `microphone_input` object as an argument and extracts the `sa
7070

7171
Endpointing is the process of understanding when someone has finished speaking. The `EndpointingConfig` controls how this is done. There are a couple of different ways to configure endpointing:
7272

73+
We provide `DeepgramEndpointingConfig()` which has some reasonable defaults and knobs to suit most use-cases (but only works with the Deepgram transcriber).
74+
75+
```
76+
class DeepgramEndpointingConfig(EndpointingConfig, type="deepgram"): # type: ignore
77+
vad_threshold_ms: int = 500
78+
utterance_cutoff_ms: int = 1000
79+
time_silent_config: Optional[TimeSilentConfig] = Field(default_factory=TimeSilentConfig)
80+
use_single_utterance_endpointing_for_first_utterance: bool = False
81+
```
82+
83+
- `vad_threshold_ms`: translates to [Deepgram's `endpointing` feature](https://developers.deepgram.com/docs/endpointing#enable-feature)
84+
- `utterance_cutoff_ms`: uses [Deepgram's Utterance End features](https://developers.deepgram.com/docs/utterance-end)
85+
- `time_silent_config`: is a Vocode specific parameter that marks an utterance final if we haven't seen any new words in X seconds
86+
- `use_single_utterance_endpointing_for_first_utterance`: Uses `is_final` instead of `speech_final` for endpointing for the first utterance (works really well for outbound conversations, where the user's first utterance is something like "Hello?") - see [this doc on Deepgram](https://developers.deepgram.com/docs/understand-endpointing-interim-results) for more info.
87+
88+
Endpointing is highly use-case specific - building a realistic experience for this greatly depends on the person speaking to the AI. Here are few paradigms that we've used to help you along the way:
89+
7390
- Time-based endpointing: This method considers the speaker to be finished when there is a certain duration of silence.
7491
- Punctuation-based endpointing: This method considers the speaker to be finished when there is a certain duration of silence after a punctuation mark.
75-
76-
In the first example, the `PunctuationEndpointingConfig` is used to configure the Deepgram transcriber for punctuation-based endpointing.

0 commit comments

Comments
 (0)