Skip to content

Commit ab12329

Browse files
authored
Merge pull request #129 from pipecat-ai/mb/rime-websockets-docs
Update Rime docs for RimeTTSService
2 parents 03e2b44 + 7ac31db commit ab12329

File tree

2 files changed

+97
-19
lines changed

2 files changed

+97
-19
lines changed

server/services/supported-services.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ description: "AI services integrated with Pipecat and their setup requirements"
5757
| [LMNT](/server/services/tts/lmnt) | `pip install pipecat-ai[lmnt]` |
5858
| [OpenAI](/server/services/tts/openai) | `pip install pipecat-ai[openai]` |
5959
| [PlayHT](/server/services/tts/playht) | `pip install pipecat-ai[playht]` |
60-
| [Rime](/server/services/tts/rime) | No install required |
60+
| [Rime](/server/services/tts/rime) | `pip install pipecat-ai[rime]` |
6161
| [XTTS](/server/services/tts/xtts) | `pip install pipecat-ai[xtts]` |
6262

6363
## Speech-to-Speech

server/services/tts/rime.mdx

+96-18
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,72 @@
11
---
22
title: "Rime"
3-
description: "Text-to-speech service implementation using Rime AI"
3+
description: "Text-to-speech service implementations using Rime AI"
44
---
55

66
## Overview
77

8-
`RimeHttpTTSService` provides text-to-speech capabilities using Rime AI's TTS service. It supports streaming audio output and various speech customization options.
8+
Rime AI's text-to-speech capabilities are available through two service implementations:
9+
10+
- `RimeTTSService`: WebSocket-based implementation with word-level timing and interruption support
11+
- `RimeHttpTTSService`: HTTP-based implementation for simpler use cases
912

1013
<Tip>
1114
You can obtain a Rime API key by signing up at [Rime](https://rime.ai/signup).
1215
</Tip>
1316

14-
## Configuration
17+
## RimeTTSService (WebSocket Service)
18+
19+
Uses Rime's WebSocket JSON API for real-time speech synthesis with word-level timing information.
20+
21+
### Constructor Parameters
22+
23+
<ParamField path="api_key" type="str" required>
24+
Rime API key
25+
</ParamField>
26+
27+
<ParamField path="voice_id" type="str" required>
28+
Rime voice identifier
29+
</ParamField>
30+
31+
<ParamField path="url" type="str" default="wss://users-ws.rime.ai/ws2">
32+
Rime WebSocket API endpoint
33+
</ParamField>
34+
35+
<ParamField path="model" type="str" default="mistv2">
36+
Model ID to use for synthesis
37+
</ParamField>
38+
39+
<ParamField path="sample_rate" type="int" default="None">
40+
Output audio sample rate in Hz
41+
</ParamField>
42+
43+
<ParamField path="params" type="InputParams" default="InputParams()">
44+
Speech generation parameters
45+
<Expandable title="properties">
46+
<ParamField path="language" type="Language" default="Language.EN">
47+
Target language for synthesis
48+
</ParamField>
49+
50+
<ParamField path="speed_alpha" type="float" default="1.0">
51+
Speech rate multiplier (1.0 is normal speed)
52+
</ParamField>
53+
54+
<ParamField path="reduce_latency" type="bool" default="false">
55+
Trade accuracy for lower latency
56+
</ParamField>
57+
58+
</Expandable>
59+
</ParamField>
60+
61+
### Features
62+
63+
- Word-level timing information
64+
- Support for interruptions
65+
- Context tracking across multiple messages
66+
- Real-time audio streaming
67+
- Proper sentence aggregation
68+
69+
## RimeHttpTTSService (HTTP Service)
1570

1671
### Constructor Parameters
1772

@@ -66,6 +121,8 @@ description: "Text-to-speech service implementation using Rime AI"
66121

67122
## Output Frames
68123

124+
Both services generate the following frames:
125+
69126
### Control Frames
70127

71128
<ParamField path="TTSStartedFrame" type="Frame">
@@ -79,8 +136,14 @@ description: "Text-to-speech service implementation using Rime AI"
79136
### Audio Frames
80137

81138
<ParamField path="TTSAudioRawFrame" type="Frame">
82-
Contains generated audio data with: - PCM audio format - Specified sample rate
83-
- Single channel (mono)
139+
Contains generated audio data: - PCM audio format - Specified sample rate -
140+
Single channel (mono)
141+
</ParamField>
142+
143+
### Text Frames (WebSocket only)
144+
145+
<ParamField path="TTSTextFrame" type="Frame">
146+
Contains word-level text with timing information
84147
</ParamField>
85148

86149
### Error Frames
@@ -92,10 +155,23 @@ description: "Text-to-speech service implementation using Rime AI"
92155
## Usage Example
93156

94157
```python
158+
# WebSocket Service
159+
from pipecat.services.rime import RimeTTSService
160+
161+
ws_tts = RimeTTSService(
162+
api_key="your-rime-api-key",
163+
voice_id="cove",
164+
model="mistv2",
165+
params=RimeTTSService.InputParams(
166+
language=Language.EN,
167+
speed_alpha=1.0
168+
)
169+
)
170+
171+
# HTTP Service
95172
from pipecat.services.rime import RimeHttpTTSService
96173

97-
# Configure service
98-
tts_service = RimeHttpTTSService(
174+
http_tts = RimeHttpTTSService(
99175
api_key="your-rime-api-key",
100176
voice_id="eva",
101177
model="mist",
@@ -109,7 +185,7 @@ tts_service = RimeHttpTTSService(
109185
pipeline = Pipeline([
110186
...,
111187
llm,
112-
tts,
188+
ws_tts, # or http_tts
113189
transport.output(),
114190
])
115191
```
@@ -118,26 +194,28 @@ pipeline = Pipeline([
118194

119195
```mermaid
120196
graph TD
121-
A[TextFrame] --> B[RimeHttpTTSService]
197+
A[TextFrame] --> B[RimeTTSService]
122198
B --> C[TTSStartedFrame]
123199
B --> D[TTSAudioRawFrame]
200+
B --> G[TTSTextFrame]
124201
B --> E[TTSStoppedFrame]
125202
B --> F[ErrorFrame]
126203
```
127204

128205
## Metrics Support
129206

130-
The service collects processing metrics:
207+
Both services collect processing metrics:
131208

132209
- Time to First Byte (TTFB)
133210
- Character usage statistics
134211

135-
## Notes
212+
## Service Comparison
136213

137-
- Supports streaming audio output
138-
- Configurable speech speed
139-
- Latency optimization options
140-
- Bracket-based text processing
141-
- Thread-safe processing
142-
- Automatic error handling
143-
- Chunked audio delivery
214+
| Feature | WebSocket | HTTP |
215+
| -------------------- | --------- | ---- |
216+
| Word timing || - |
217+
| Interruption support || - |
218+
| Bracket-based pauses | - ||
219+
| Phoneme control | - ||
220+
| Inline speed control | - ||
221+
| Streaming audio |||

0 commit comments

Comments
 (0)