Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cartesia synthesizer #17

Merged
merged 11 commits into from
Jun 12, 2024
Merged

add cartesia synthesizer #17

merged 11 commits into from
Jun 12, 2024

Conversation

rjheeta
Copy link

@rjheeta rjheeta commented Jun 8, 2024

Add support for Cartesia (https://docs.cartesia.ai/)

class CartesiaSynthesizerConfig(SynthesizerConfig, type=SynthesizerType.CARTESIA.value): # type: ignore
model_id: str = DEFAULT_CARTESIA_MODEL_ID
voice_id: str = DEFAULT_CARTESIA_VOICE_ID
output_format: str = DEFAULT_CARTESIA_OUTPUT_FORMAT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use the SynthesizerConfig default of audio_encoding for defining audio encodings and set it to the string expected by the cartesia API similar to here

Comment on lines 11 to 22
cartesia_tts = None

def __init__(
self,
synthesizer_config: CartesiaSynthesizerConfig,
):
super().__init__(synthesizer_config)

# Lazy import the cartesia module
if CartesiaSynthesizer.cartesia_tts is None:
from cartesia.tts import AsyncCartesiaTTS
CartesiaSynthesizer.cartesia_tts = AsyncCartesiaTTS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cartesia_tts = None
def __init__(
self,
synthesizer_config: CartesiaSynthesizerConfig,
):
super().__init__(synthesizer_config)
# Lazy import the cartesia module
if CartesiaSynthesizer.cartesia_tts is None:
from cartesia.tts import AsyncCartesiaTTS
CartesiaSynthesizer.cartesia_tts = AsyncCartesiaTTS
def __init__(
self,
synthesizer_config: CartesiaSynthesizerConfig,
):
super().__init__(synthesizer_config)
# Lazy import the cartesia module
try:
from cartesia.tts import AsyncCartesiaTTS
except ImportError as e:
raise ImportError(
f"Missing required dependancies for CartesiaSynthesizer"
) from e
self.cartesia_tts = AsyncCartesiaTTS

self.voice_id = synthesizer_config.voice_id
self.sampling_rate = synthesizer_config.sampling_rate
self.output_format = synthesizer_config.output_format
self.client = AsyncCartesiaTTS(api_key=self.api_key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.client = AsyncCartesiaTTS(api_key=self.api_key)
self.client = self.cartesia_tts(api_key=self.api_key)

Comment on lines 32 to 67
async def create_speech(
self,
message: BaseMessage,
chunk_size: int,
is_first_text_chunk: bool = False,
is_sole_text_chunk: bool = False,
) -> SynthesisResult:
generator = await self.client.generate(
transcript=message.text,
voice=self.voice_embedding,
stream=True,
model_id=self.model_id,
data_rtype='bytes',
output_format=self.output_format
)

sample_rate = self.sampling_rate
audio_file = io.BytesIO()

with wave.open(audio_file, 'wb') as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(sample_rate)
async for chunk in generator:
raw_data = chunk['audio']
wav_file.writeframes(raw_data)
audio_file.seek(0)

result = self.create_synthesis_result_from_wav(
synthesizer_config=self.synthesizer_config,
file=audio_file,
message=message,
chunk_size=chunk_size,
)

return result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you convert this to a create_speech_uncached implementation?

check out our Eleven Labs implementation here as a good example of how to achieve this as it shouldn't require much changes to the existing code to get it up and running!

self.api_key = getenv("CARTESIA_API_KEY")
self.model_id = synthesizer_config.model_id
self.voice_id = synthesizer_config.voice_id
self.sampling_rate = synthesizer_config.sampling_rate
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macwilk I just realized this is going to generate unintended consequences (specifically, slowed-down sounding audio) if we had initialized the Synthesizer with from_telephone_output_device() because it will be trying to pass DEFAULT_SAMPLING_RATE (8000).

* use create_speech_uncached
* use existing abstractions default encoding and sample rates
Copy link
Owner

@ajar98 ajar98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty simple fixes that you can auto apply from the UI, then you should be g2g!

ajar98
ajar98 previously approved these changes Jun 12, 2024
@ajar98 ajar98 dismissed macwilk’s stale review June 12, 2024 20:07

ajay approved

@rjheeta
Copy link
Author

rjheeta commented Jun 12, 2024

@ajar98 the test/build failed with this notice.

Run python -m poetry install -E all
Creating virtualenv vocode-zqFtuvVc-py3.10 in /home/runner/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
Warning: poetry.lock is not consistent with pyproject.toml. You may be getting improper dependencies. Run `poetry lock [--no-update]` to fix it.

So I just ran poetry lock --no-update and checked it in

@ajar98 ajar98 merged commit 5dc841a into ajar98:main Jun 12, 2024
2 checks passed
ajar98 added a commit that referenced this pull request Jun 14, 2024
* Update Readme with Preview Info (#1)

* Update Readme with Preview Info

* We're not quite that far along

* Update Structure to be more Pleasing to the Eyes

* Add changelog to readme

---------

Co-authored-by: srhinos <[email protected]>
Co-authored-by: Adnaan Sachidanandan <[email protected]>

* The Big Diff (#2)

* The Big Diff

* remove tests on 3.8 and 3.9

* Update README.md

* Update README.md

* fix turn based quickstart (#3)

* [hotfix] remove unused import (#4)

* Update README.md

* Update README.md

* Remove create_speech() from rime synthesizer (#6)

* Fix default factory for elevenlabs WS (#12)

* dispatch into elvenlabsws if experimental_websocket is on

* fix mypy

* Merge In Recent Fixes (#14)

* [docs sprint] Updates docs for using transcribers (#9)

* [docs sprint] phrase trigger documentation (#16)

* [docs sprint] update open source quickstarts (#15)

* [docs sprint] Add Documentation on Using Vocode's Loguru Implementation (#19)

* [docs sprint] Add Documentation on Using Vocode's Loguru Implementation

* Remove Tracing

---------

Co-authored-by: srhinos <[email protected]>

* [docs sprint] Updates docs for using synthesizers (#8)

* [docs sprint] using synthesizers docs update

* update docs for elevenlabs ws

* Apply suggestions from code review

Co-authored-by: Adnaan Sachidanandan <[email protected]>

---------

Co-authored-by: Adnaan Sachidanandan <[email protected]>

* [docs sprint] Updates docs for react quickstart (#10)

* [docs sprint] Updates docs for react quickstart

* PR feedback

* changes azure to override create_speech_uncached (#21)

* [docs sprint] Adds docs for conversation mechanics and moves endpointing docs from transcribers (#11)

* [docs sprint] Updates docs for using transcribers

* Adds docs for conversation mechanics and moves endpointing docs from transcribers

* Update docs/open-source/conversation-mechanics.md

Co-authored-by: Adnaan Sachidanandan <[email protected]>

* use mdx

* PR feedback

---------

Co-authored-by: Adnaan Sachidanandan <[email protected]>

* updates docs for events manager (#7)

* add cartesia synthesizer (#17)

* add cartesia synthesizer

* make Cartesia dependency optional, add it to the synthesizers extra group

* lazy import cartesia

* improved lazy loading, and added api_key as a config parameter

* improvements to cartesia synth
* use create_speech_uncached
* use existing abstractions default encoding and sample rates

* Remove redundant api_key assignment

Co-authored-by: Ajay Raj <[email protected]>

* Remove default setting of sampling rate

Co-authored-by: Ajay Raj <[email protected]>

* Remove default setting of audio_encoding

Co-authored-by: Ajay Raj <[email protected]>

* remove default setting of sampling rate

Co-authored-by: Ajay Raj <[email protected]>

* Remove redundant setting of audio enconding

the output device handles this

Co-authored-by: Ajay Raj <[email protected]>

* build failed with poetry.lock file. re-updating it

---------

Co-authored-by: Ajay Raj <[email protected]>

* Unset docs / README changes

* Unset docs changes (cont.)

* unset poetry version change

* update poetry.lock

---------

Co-authored-by: Mac Wilkinson <[email protected]>
Co-authored-by: srhinos <[email protected]>
Co-authored-by: Adnaan Sachidanandan <[email protected]>
Co-authored-by: rjheeta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants