Merge pull request #129 from pipecat-ai/mb/rime-websockets-docs

markbackman · web-flow · commit ab12329041fe · 2025-02-12T09:52:02.000-05:00
Update Rime docs for RimeTTSService
diff --git a/server/services/supported-services.mdx b/server/services/supported-services.mdx
@@ -57,7 +57,7 @@ description: "AI services integrated with Pipecat and their setup requirements"
 | [LMNT](/server/services/tts/lmnt)             | `pip install pipecat-ai[lmnt]`       |
 | [OpenAI](/server/services/tts/openai)         | `pip install pipecat-ai[openai]`     |
 | [PlayHT](/server/services/tts/playht)         | `pip install pipecat-ai[playht]`     |
-| [Rime](/server/services/tts/rime)             | No install required                  |
+| [Rime](/server/services/tts/rime)             | `pip install pipecat-ai[rime]`       |
 | [XTTS](/server/services/tts/xtts)             | `pip install pipecat-ai[xtts]`       |
 
 ## Speech-to-Speech
diff --git a/server/services/tts/rime.mdx b/server/services/tts/rime.mdx
@@ -1,17 +1,72 @@
 ---
 title: "Rime"
-description: "Text-to-speech service implementation using Rime AI"
+description: "Text-to-speech service implementations using Rime AI"
 ---
 
 ## Overview
 
-`RimeHttpTTSService` provides text-to-speech capabilities using Rime AI's TTS service. It supports streaming audio output and various speech customization options.
+Rime AI's text-to-speech capabilities are available through two service implementations:
+
+- `RimeTTSService`: WebSocket-based implementation with word-level timing and interruption support
+- `RimeHttpTTSService`: HTTP-based implementation for simpler use cases
 
 <Tip>
   You can obtain a Rime API key by signing up at [Rime](https://rime.ai/signup).
 </Tip>
 
-## Configuration
+## RimeTTSService (WebSocket Service)
+
+Uses Rime's WebSocket JSON API for real-time speech synthesis with word-level timing information.
+
+### Constructor Parameters
+
+<ParamField path="api_key" type="str" required>
+  Rime API key
+</ParamField>
+
+<ParamField path="voice_id" type="str" required>
+  Rime voice identifier
+</ParamField>
+
+<ParamField path="url" type="str" default="wss://users-ws.rime.ai/ws2">
+  Rime WebSocket API endpoint
+</ParamField>
+
+<ParamField path="model" type="str" default="mistv2">
+  Model ID to use for synthesis
+</ParamField>
+
+<ParamField path="sample_rate" type="int" default="None">
+  Output audio sample rate in Hz
+</ParamField>
+
+<ParamField path="params" type="InputParams" default="InputParams()">
+  Speech generation parameters
+  <Expandable title="properties">
+    <ParamField path="language" type="Language" default="Language.EN">
+      Target language for synthesis
+    </ParamField>
+
+    <ParamField path="speed_alpha" type="float" default="1.0">
+      Speech rate multiplier (1.0 is normal speed)
+    </ParamField>
+
+    <ParamField path="reduce_latency" type="bool" default="false">
+      Trade accuracy for lower latency
+    </ParamField>
+
+  </Expandable>
+</ParamField>
+
+### Features
+
+- Word-level timing information
+- Support for interruptions
+- Context tracking across multiple messages
+- Real-time audio streaming
+- Proper sentence aggregation
+
+## RimeHttpTTSService (HTTP Service)
 
 ### Constructor Parameters
 
@@ -66,6 +121,8 @@ description: "Text-to-speech service implementation using Rime AI"
 
 ## Output Frames
 
+Both services generate the following frames:
+
 ### Control Frames
 
 <ParamField path="TTSStartedFrame" type="Frame">
@@ -79,8 +136,14 @@ description: "Text-to-speech service implementation using Rime AI"
 ### Audio Frames
 
 <ParamField path="TTSAudioRawFrame" type="Frame">
-  Contains generated audio data with: - PCM audio format - Specified sample rate
-  - Single channel (mono)
+  Contains generated audio data: - PCM audio format - Specified sample rate -
+  Single channel (mono)
+</ParamField>
+
+### Text Frames (WebSocket only)
+
+<ParamField path="TTSTextFrame" type="Frame">
+  Contains word-level text with timing information
 </ParamField>
 
 ### Error Frames
@@ -92,10 +155,23 @@ description: "Text-to-speech service implementation using Rime AI"
 ## Usage Example
 
 ```python
+# WebSocket Service
+from pipecat.services.rime import RimeTTSService
+
+ws_tts = RimeTTSService(
+    api_key="your-rime-api-key",
+    voice_id="cove",
+    model="mistv2",
+    params=RimeTTSService.InputParams(
+        language=Language.EN,
+        speed_alpha=1.0
+    )
+)
+
+# HTTP Service
 from pipecat.services.rime import RimeHttpTTSService
 
-# Configure service
-tts_service = RimeHttpTTSService(
+http_tts = RimeHttpTTSService(
     api_key="your-rime-api-key",
     voice_id="eva",
     model="mist",
@@ -109,7 +185,7 @@ tts_service = RimeHttpTTSService(
 pipeline = Pipeline([
     ...,
     llm,
-    tts,
+    ws_tts,  # or http_tts
     transport.output(),
 ])
 ```
@@ -118,26 +194,28 @@ pipeline = Pipeline([
 
 ```mermaid
 graph TD
-    A[TextFrame] --> B[RimeHttpTTSService]
+    A[TextFrame] --> B[RimeTTSService]
     B --> C[TTSStartedFrame]
     B --> D[TTSAudioRawFrame]
+    B --> G[TTSTextFrame]
     B --> E[TTSStoppedFrame]
     B --> F[ErrorFrame]
 ```
 
 ## Metrics Support
 
-The service collects processing metrics:
+Both services collect processing metrics:
 
 - Time to First Byte (TTFB)
 - Character usage statistics
 
-## Notes
+## Service Comparison
 
-- Supports streaming audio output
-- Configurable speech speed
-- Latency optimization options
-- Bracket-based text processing
-- Thread-safe processing
-- Automatic error handling
-- Chunked audio delivery
+| Feature              | WebSocket | HTTP |
+| -------------------- | --------- | ---- |
+| Word timing          | ✓         | -    |
+| Interruption support | ✓         | -    |
+| Bracket-based pauses | -         | ✓    |
+| Phoneme control      | -         | ✓    |
+| Inline speed control | -         | ✓    |
+| Streaming audio      | ✓         | ✓    |