Skip to content

v1.0.0

Compare
Choose a tag to compare
@sauhardjain sauhardjain released this 25 Jun 00:37
· 39 commits to main since this release

v1.0.0 (06-24-2024)

A new major version release of the Cartesia Python client that overhauls the library structure.

Refer to the migration guide for more thorough details on changes.

Features

  • Adds support for model_id=sonic-multilingual to generate multilingual audio
  • Endpoint-specific methods for generating and streaming audio

Breaking changes

  • Renames CartesiaTTS and AsyncCartesiaTTS -> Cartesia and AsyncCartesia
  • Replaces client.generate with endpoint-specific methods for Text-to-Speech
  • output_format must be specified as an OutputFormat object, which is a dict specifying the keys: container, encoding and sample_rate
  • Both SSE and WebSocket requests no longer return sampling_rate in their output. They will respect the sample_rate corresponding to the OutputFormat passed in.

Adds

  • client.tts.sse methods for generating audio using Server-Sent Events
  • client.tts.websocket methods for managing a WebSocket connection and generating audio
  • client.tts.get_output_format() to obtain OutputFormat object from output format name
  • client.tts.get_sample_rate() to obtain sample_rate from output format name
  • client.voices.list() to fetch a list of all available voices
  • client.voices.get() to fetch a VoiceMetadata object from voice_id
  • client.voices.clone() to clone a voice by specifying a filepath
  • client.voices.create() to create a new voice
  • Specifies cartesia_version=2024-06-10 as default header for HTTP and WS requests

Removes

  • client.get_voices()
  • client.get_voice_embedding()
  • client.generate()
  • Ability to specify Numpy Array as a return type. We recommend using np.frombuffer with the appropriate dtype.
  • Ability to clone voices using a link