add cartesia synthesizer #17

rjheeta · 2024-06-08T14:28:44Z

Add support for Cartesia (https://docs.cartesia.ai/)

…roup

macwilk · 2024-06-10T21:03:43Z

vocode/streaming/models/synthesizer.py

+class CartesiaSynthesizerConfig(SynthesizerConfig, type=SynthesizerType.CARTESIA.value):  # type: ignore
+    model_id: str = DEFAULT_CARTESIA_MODEL_ID
+    voice_id: str = DEFAULT_CARTESIA_VOICE_ID
+    output_format: str = DEFAULT_CARTESIA_OUTPUT_FORMAT


we should use the SynthesizerConfig default of audio_encoding for defining audio encodings and set it to the string expected by the cartesia API similar to here

macwilk · 2024-06-10T21:06:43Z

vocode/streaming/synthesizer/cartesia_synthesizer.py

+    cartesia_tts = None
+
+    def __init__(
+        self,
+        synthesizer_config: CartesiaSynthesizerConfig,
+    ):
+        super().__init__(synthesizer_config)
+
+        # Lazy import the cartesia module
+        if CartesiaSynthesizer.cartesia_tts is None:
+            from cartesia.tts import AsyncCartesiaTTS
+            CartesiaSynthesizer.cartesia_tts = AsyncCartesiaTTS


Suggested change

cartesia_tts = None

def __init__(

self,

synthesizer_config: CartesiaSynthesizerConfig,

):

super().__init__(synthesizer_config)

# Lazy import the cartesia module

if CartesiaSynthesizer.cartesia_tts is None:

from cartesia.tts import AsyncCartesiaTTS

CartesiaSynthesizer.cartesia_tts = AsyncCartesiaTTS

def __init__(

self,

synthesizer_config: CartesiaSynthesizerConfig,

):

super().__init__(synthesizer_config)

# Lazy import the cartesia module

try:

from cartesia.tts import AsyncCartesiaTTS

except ImportError as e:

raise ImportError(

f"Missing required dependancies for CartesiaSynthesizer"

) from e

self.cartesia_tts = AsyncCartesiaTTS

vocode/streaming/models/synthesizer.py

vocode/streaming/synthesizer/cartesia_synthesizer.py

macwilk · 2024-06-10T21:11:28Z

vocode/streaming/synthesizer/cartesia_synthesizer.py

+        self.voice_id = synthesizer_config.voice_id
+        self.sampling_rate = synthesizer_config.sampling_rate
+        self.output_format = synthesizer_config.output_format
+        self.client = AsyncCartesiaTTS(api_key=self.api_key)


Suggested change

self.client = AsyncCartesiaTTS(api_key=self.api_key)

self.client = self.cartesia_tts(api_key=self.api_key)

macwilk · 2024-06-10T21:20:14Z

vocode/streaming/synthesizer/cartesia_synthesizer.py

+    async def create_speech(
+        self,
+        message: BaseMessage,
+        chunk_size: int,
+        is_first_text_chunk: bool = False,
+        is_sole_text_chunk: bool = False,
+    ) -> SynthesisResult:
+        generator = await self.client.generate(
+            transcript=message.text,
+            voice=self.voice_embedding,
+            stream=True,
+            model_id=self.model_id,
+            data_rtype='bytes',
+            output_format=self.output_format
+        )
+
+        sample_rate = self.sampling_rate
+        audio_file = io.BytesIO()
+
+        with wave.open(audio_file, 'wb') as wav_file:
+            wav_file.setnchannels(1)
+            wav_file.setsampwidth(2)
+            wav_file.setframerate(sample_rate)
+            async for chunk in generator:
+                raw_data = chunk['audio']
+                wav_file.writeframes(raw_data)
+        audio_file.seek(0)
+
+        result = self.create_synthesis_result_from_wav(
+            synthesizer_config=self.synthesizer_config,
+            file=audio_file,
+            message=message,
+            chunk_size=chunk_size,
+        )
+
+        return result


could you convert this to a create_speech_uncached implementation?

check out our Eleven Labs implementation here as a good example of how to achieve this as it shouldn't require much changes to the existing code to get it up and running!

rjheeta · 2024-06-11T10:42:38Z

vocode/streaming/synthesizer/cartesia_synthesizer.py

+        self.api_key = getenv("CARTESIA_API_KEY")
+        self.model_id = synthesizer_config.model_id
+        self.voice_id = synthesizer_config.voice_id
+        self.sampling_rate = synthesizer_config.sampling_rate


@macwilk I just realized this is going to generate unintended consequences (specifically, slowed-down sounding audio) if we had initialized the Synthesizer with from_telephone_output_device() because it will be trying to pass DEFAULT_SAMPLING_RATE (8000).

* use create_speech_uncached * use existing abstractions default encoding and sample rates

ajar98

pretty simple fixes that you can auto apply from the UI, then you should be g2g!

vocode/streaming/models/synthesizer.py

vocode/streaming/synthesizer/cartesia_synthesizer.py

Co-authored-by: Ajay Raj <[email protected]>

the output device handles this Co-authored-by: Ajay Raj <[email protected]>

ajay approved

rjheeta · 2024-06-12T20:13:59Z

@ajar98 the test/build failed with this notice.

Run python -m poetry install -E all
Creating virtualenv vocode-zqFtuvVc-py3.10 in /home/runner/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
Warning: poetry.lock is not consistent with pyproject.toml. You may be getting improper dependencies. Run `poetry lock [--no-update]` to fix it.

So I just ran poetry lock --no-update and checked it in

* Update Readme with Preview Info (#1) * Update Readme with Preview Info * We're not quite that far along * Update Structure to be more Pleasing to the Eyes * Add changelog to readme --------- Co-authored-by: srhinos <[email protected]> Co-authored-by: Adnaan Sachidanandan <[email protected]> * The Big Diff (#2) * The Big Diff * remove tests on 3.8 and 3.9 * Update README.md * Update README.md * fix turn based quickstart (#3) * [hotfix] remove unused import (#4) * Update README.md * Update README.md * Remove create_speech() from rime synthesizer (#6) * Fix default factory for elevenlabs WS (#12) * dispatch into elvenlabsws if experimental_websocket is on * fix mypy * Merge In Recent Fixes (#14) * [docs sprint] Updates docs for using transcribers (#9) * [docs sprint] phrase trigger documentation (#16) * [docs sprint] update open source quickstarts (#15) * [docs sprint] Add Documentation on Using Vocode's Loguru Implementation (#19) * [docs sprint] Add Documentation on Using Vocode's Loguru Implementation * Remove Tracing --------- Co-authored-by: srhinos <[email protected]> * [docs sprint] Updates docs for using synthesizers (#8) * [docs sprint] using synthesizers docs update * update docs for elevenlabs ws * Apply suggestions from code review Co-authored-by: Adnaan Sachidanandan <[email protected]> --------- Co-authored-by: Adnaan Sachidanandan <[email protected]> * [docs sprint] Updates docs for react quickstart (#10) * [docs sprint] Updates docs for react quickstart * PR feedback * changes azure to override create_speech_uncached (#21) * [docs sprint] Adds docs for conversation mechanics and moves endpointing docs from transcribers (#11) * [docs sprint] Updates docs for using transcribers * Adds docs for conversation mechanics and moves endpointing docs from transcribers * Update docs/open-source/conversation-mechanics.md Co-authored-by: Adnaan Sachidanandan <[email protected]> * use mdx * PR feedback --------- Co-authored-by: Adnaan Sachidanandan <[email protected]> * updates docs for events manager (#7) * add cartesia synthesizer (#17) * add cartesia synthesizer * make Cartesia dependency optional, add it to the synthesizers extra group * lazy import cartesia * improved lazy loading, and added api_key as a config parameter * improvements to cartesia synth * use create_speech_uncached * use existing abstractions default encoding and sample rates * Remove redundant api_key assignment Co-authored-by: Ajay Raj <[email protected]> * Remove default setting of sampling rate Co-authored-by: Ajay Raj <[email protected]> * Remove default setting of audio_encoding Co-authored-by: Ajay Raj <[email protected]> * remove default setting of sampling rate Co-authored-by: Ajay Raj <[email protected]> * Remove redundant setting of audio enconding the output device handles this Co-authored-by: Ajay Raj <[email protected]> * build failed with poetry.lock file. re-updating it --------- Co-authored-by: Ajay Raj <[email protected]> * Unset docs / README changes * Unset docs changes (cont.) * unset poetry version change * update poetry.lock --------- Co-authored-by: Mac Wilkinson <[email protected]> Co-authored-by: srhinos <[email protected]> Co-authored-by: Adnaan Sachidanandan <[email protected]> Co-authored-by: rjheeta <[email protected]>

rjheeta added 3 commits June 8, 2024 10:22

add cartesia synthesizer

5dc4a8b

make Cartesia dependency optional, add it to the synthesizers extra g…

fd15758

…roup

lazy import cartesia

336171e

macwilk previously requested changes Jun 10, 2024

View reviewed changes

improved lazy loading, and added api_key as a config parameter

3118163

rjheeta commented Jun 11, 2024

View reviewed changes

improvements to cartesia synth

d78af56

* use create_speech_uncached * use existing abstractions default encoding and sample rates

ajar98 requested changes Jun 12, 2024

View reviewed changes

rjheeta and others added 5 commits June 12, 2024 15:13

Remove redundant api_key assignment

13ca11f

Co-authored-by: Ajay Raj <[email protected]>

Remove default setting of sampling rate

c2eaa4c

Co-authored-by: Ajay Raj <[email protected]>

Remove default setting of audio_encoding

a90a4cf

Co-authored-by: Ajay Raj <[email protected]>

remove default setting of sampling rate

731b112

Co-authored-by: Ajay Raj <[email protected]>

Remove redundant setting of audio enconding

12f3167

the output device handles this Co-authored-by: Ajay Raj <[email protected]>

ajar98 previously approved these changes Jun 12, 2024

View reviewed changes

build failed with poetry.lock file. re-updating it

2fe3809

rjheeta dismissed ajar98’s stale review via 2fe3809 June 12, 2024 20:12

ajar98 approved these changes Jun 12, 2024

View reviewed changes

ajar98 merged commit 5dc841a into ajar98:main Jun 12, 2024
2 checks passed

ajar98 mentioned this pull request Jun 12, 2024

feat: add cartesia synthesizer #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add cartesia synthesizer #17

add cartesia synthesizer #17

Uh oh!

rjheeta commented Jun 8, 2024

Uh oh!

macwilk Jun 10, 2024

Uh oh!

macwilk Jun 10, 2024

Uh oh!

Uh oh!

Uh oh!

macwilk Jun 10, 2024

Uh oh!

macwilk Jun 10, 2024

Uh oh!

rjheeta Jun 11, 2024

Uh oh!

ajar98 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rjheeta commented Jun 12, 2024

Uh oh!

Uh oh!

Uh oh!

	self.client = AsyncCartesiaTTS(api_key=self.api_key)
	self.client = self.cartesia_tts(api_key=self.api_key)

add cartesia synthesizer #17

add cartesia synthesizer #17

Uh oh!

Conversation

rjheeta commented Jun 8, 2024

Uh oh!

macwilk Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

macwilk Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

macwilk Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

macwilk Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

rjheeta Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

ajar98 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rjheeta commented Jun 12, 2024

Uh oh!

Uh oh!

Uh oh!