-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cartesia synthesizer #17
Conversation
class CartesiaSynthesizerConfig(SynthesizerConfig, type=SynthesizerType.CARTESIA.value): # type: ignore | ||
model_id: str = DEFAULT_CARTESIA_MODEL_ID | ||
voice_id: str = DEFAULT_CARTESIA_VOICE_ID | ||
output_format: str = DEFAULT_CARTESIA_OUTPUT_FORMAT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use the SynthesizerConfig
default of audio_encoding
for defining audio encodings and set it to the string expected by the cartesia API similar to here
cartesia_tts = None | ||
|
||
def __init__( | ||
self, | ||
synthesizer_config: CartesiaSynthesizerConfig, | ||
): | ||
super().__init__(synthesizer_config) | ||
|
||
# Lazy import the cartesia module | ||
if CartesiaSynthesizer.cartesia_tts is None: | ||
from cartesia.tts import AsyncCartesiaTTS | ||
CartesiaSynthesizer.cartesia_tts = AsyncCartesiaTTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cartesia_tts = None | |
def __init__( | |
self, | |
synthesizer_config: CartesiaSynthesizerConfig, | |
): | |
super().__init__(synthesizer_config) | |
# Lazy import the cartesia module | |
if CartesiaSynthesizer.cartesia_tts is None: | |
from cartesia.tts import AsyncCartesiaTTS | |
CartesiaSynthesizer.cartesia_tts = AsyncCartesiaTTS | |
def __init__( | |
self, | |
synthesizer_config: CartesiaSynthesizerConfig, | |
): | |
super().__init__(synthesizer_config) | |
# Lazy import the cartesia module | |
try: | |
from cartesia.tts import AsyncCartesiaTTS | |
except ImportError as e: | |
raise ImportError( | |
f"Missing required dependancies for CartesiaSynthesizer" | |
) from e | |
self.cartesia_tts = AsyncCartesiaTTS |
self.voice_id = synthesizer_config.voice_id | ||
self.sampling_rate = synthesizer_config.sampling_rate | ||
self.output_format = synthesizer_config.output_format | ||
self.client = AsyncCartesiaTTS(api_key=self.api_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.client = AsyncCartesiaTTS(api_key=self.api_key) | |
self.client = self.cartesia_tts(api_key=self.api_key) |
async def create_speech( | ||
self, | ||
message: BaseMessage, | ||
chunk_size: int, | ||
is_first_text_chunk: bool = False, | ||
is_sole_text_chunk: bool = False, | ||
) -> SynthesisResult: | ||
generator = await self.client.generate( | ||
transcript=message.text, | ||
voice=self.voice_embedding, | ||
stream=True, | ||
model_id=self.model_id, | ||
data_rtype='bytes', | ||
output_format=self.output_format | ||
) | ||
|
||
sample_rate = self.sampling_rate | ||
audio_file = io.BytesIO() | ||
|
||
with wave.open(audio_file, 'wb') as wav_file: | ||
wav_file.setnchannels(1) | ||
wav_file.setsampwidth(2) | ||
wav_file.setframerate(sample_rate) | ||
async for chunk in generator: | ||
raw_data = chunk['audio'] | ||
wav_file.writeframes(raw_data) | ||
audio_file.seek(0) | ||
|
||
result = self.create_synthesis_result_from_wav( | ||
synthesizer_config=self.synthesizer_config, | ||
file=audio_file, | ||
message=message, | ||
chunk_size=chunk_size, | ||
) | ||
|
||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you convert this to a create_speech_uncached
implementation?
check out our Eleven Labs implementation here as a good example of how to achieve this as it shouldn't require much changes to the existing code to get it up and running!
self.api_key = getenv("CARTESIA_API_KEY") | ||
self.model_id = synthesizer_config.model_id | ||
self.voice_id = synthesizer_config.voice_id | ||
self.sampling_rate = synthesizer_config.sampling_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@macwilk I just realized this is going to generate unintended consequences (specifically, slowed-down sounding audio) if we had initialized the Synthesizer with from_telephone_output_device()
because it will be trying to pass DEFAULT_SAMPLING_RATE
(8000).
* use create_speech_uncached * use existing abstractions default encoding and sample rates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty simple fixes that you can auto apply from the UI, then you should be g2g!
Co-authored-by: Ajay Raj <[email protected]>
Co-authored-by: Ajay Raj <[email protected]>
Co-authored-by: Ajay Raj <[email protected]>
Co-authored-by: Ajay Raj <[email protected]>
the output device handles this Co-authored-by: Ajay Raj <[email protected]>
@ajar98 the test/build failed with this notice.
So I just ran |
* Update Readme with Preview Info (#1) * Update Readme with Preview Info * We're not quite that far along * Update Structure to be more Pleasing to the Eyes * Add changelog to readme --------- Co-authored-by: srhinos <[email protected]> Co-authored-by: Adnaan Sachidanandan <[email protected]> * The Big Diff (#2) * The Big Diff * remove tests on 3.8 and 3.9 * Update README.md * Update README.md * fix turn based quickstart (#3) * [hotfix] remove unused import (#4) * Update README.md * Update README.md * Remove create_speech() from rime synthesizer (#6) * Fix default factory for elevenlabs WS (#12) * dispatch into elvenlabsws if experimental_websocket is on * fix mypy * Merge In Recent Fixes (#14) * [docs sprint] Updates docs for using transcribers (#9) * [docs sprint] phrase trigger documentation (#16) * [docs sprint] update open source quickstarts (#15) * [docs sprint] Add Documentation on Using Vocode's Loguru Implementation (#19) * [docs sprint] Add Documentation on Using Vocode's Loguru Implementation * Remove Tracing --------- Co-authored-by: srhinos <[email protected]> * [docs sprint] Updates docs for using synthesizers (#8) * [docs sprint] using synthesizers docs update * update docs for elevenlabs ws * Apply suggestions from code review Co-authored-by: Adnaan Sachidanandan <[email protected]> --------- Co-authored-by: Adnaan Sachidanandan <[email protected]> * [docs sprint] Updates docs for react quickstart (#10) * [docs sprint] Updates docs for react quickstart * PR feedback * changes azure to override create_speech_uncached (#21) * [docs sprint] Adds docs for conversation mechanics and moves endpointing docs from transcribers (#11) * [docs sprint] Updates docs for using transcribers * Adds docs for conversation mechanics and moves endpointing docs from transcribers * Update docs/open-source/conversation-mechanics.md Co-authored-by: Adnaan Sachidanandan <[email protected]> * use mdx * PR feedback --------- Co-authored-by: Adnaan Sachidanandan <[email protected]> * updates docs for events manager (#7) * add cartesia synthesizer (#17) * add cartesia synthesizer * make Cartesia dependency optional, add it to the synthesizers extra group * lazy import cartesia * improved lazy loading, and added api_key as a config parameter * improvements to cartesia synth * use create_speech_uncached * use existing abstractions default encoding and sample rates * Remove redundant api_key assignment Co-authored-by: Ajay Raj <[email protected]> * Remove default setting of sampling rate Co-authored-by: Ajay Raj <[email protected]> * Remove default setting of audio_encoding Co-authored-by: Ajay Raj <[email protected]> * remove default setting of sampling rate Co-authored-by: Ajay Raj <[email protected]> * Remove redundant setting of audio enconding the output device handles this Co-authored-by: Ajay Raj <[email protected]> * build failed with poetry.lock file. re-updating it --------- Co-authored-by: Ajay Raj <[email protected]> * Unset docs / README changes * Unset docs changes (cont.) * unset poetry version change * update poetry.lock --------- Co-authored-by: Mac Wilkinson <[email protected]> Co-authored-by: srhinos <[email protected]> Co-authored-by: Adnaan Sachidanandan <[email protected]> Co-authored-by: rjheeta <[email protected]>
Add support for Cartesia (https://docs.cartesia.ai/)