Skip to content

Releases: cartesia-ai/cartesia-python

v1.0.12

14 Aug 17:58
Compare
Choose a tag to compare

v1.0.12 (08-13-2024)

Features

  • Adds support for voices.mix(...) to mix voices and get an embedding back

Fixes

  • Removes duplicate import for websockets.sync.client.connect

v1.0.11

14 Aug 17:56
Compare
Choose a tag to compare

v1.0.11 (08-07-2024)

Features

  • Support timestamps on _TTSContext.send on the sync Cartesia client
  • Allow speed in _experimental_voice_controls to be specified as a float between [-1.0, 1.0]

Adds

  • Expands testing suite

v1.0.9

01 Aug 17:35
Compare
Choose a tag to compare

v1.0.9 (07-25-2024)

Adds

  • enhance param to the tts.voice.clone method which controls whether the sample clip submitted is enhanced prior to voice cloning

Chores

  • Returns a warning on the synchronous _WebSocket class if the user's websockets version < 12.0

v1.0.7

15 Jul 19:42
Compare
Choose a tag to compare

v1.0.7 (07-15-2024)

Features

  • Supports generating timestamps on the WebSocket endpoint to get detailed timing information for each word in input transcripts.
  • Experimental support for applying speed & emotion controls on voices.

Adds

  • add_timestamps param to _WebSocket.send(), _AsyncWebSocket.send(), _TTSContext.send() and _AsyncTTSContext.send() methods for generating timestamps corresponding to input transcripts.
    • Timestamp results are returned in a word_timestamps object with the keys: words, start and end
  • _experimental_voice_controls param to all send() methods which accepts an object with speed and emotion fields

Chores

  • Adds usage examples for feature updates to README

v1.0.5

12 Jul 17:39
Compare
Choose a tag to compare

v1.0.5 (07-12-2024)

Features

  • Support for audio continuations on synchronous Cartesia client. Users can pass in a text generator to receive streaming audio.

Adds

  • New _TTSContext class and context() method to _WebSocket for supporting input streaming use cases with continuations.
    • send() method that takes in an Iterator object as the transcript. Returns Generator that streams out audio data.

v1.0.4

07 Jul 19:30
Compare
Choose a tag to compare

v1.0.4 (07-06-2024)

Features

  • Support for audio continuations for seamless speech synthesis. Allows real-time audio generation and playback as text becomes available.

Adds

  • New _AsyncTTSContext class and context() method to _AsyncWebsocket for managing streaming sessions
    • send() method for streaming text inputs
    • no_more_inputs() method to signal the end of text input, which sends a message with continue_ = False. Otherwise, the context times out after 5 seconds of inactivity.
    • receive() method returns AsyncGenerator for asynchronous audio chunk retrieval
  • Support for specifying custom base_url when initializing Cartesia or AsyncCartesia

Changes

  • Modifies AsyncWebsocket to internally use the AsyncContext class. No change in usage.

Bug Fixes

  • Removes Content-Type header from filepath cloning to work with httpx.post
  • Fixes client.tts.get_output_format for deprecated output format names

v1.0.3

26 Jun 03:59
Compare
Choose a tag to compare

v1.0.3 (06-25-2024)

Changes

  • Fixes undefined import issue for cartesia.utils by modifying setup.py to include subdirectories

v1.0.2

26 Jun 03:57
Compare
Choose a tag to compare

v1.0.2 (06-25-2024)

Chores

  • Adds __init__ to cartesia/utils to make it a module

v1.0.1

25 Jun 21:16
Compare
Choose a tag to compare

v1.0.1 (06-25-2024)

Changes

  • Updates OutputFormatMapping with more clearly-defined names that will be supported going forward.
    • This deprecates the old string-based names and moves them to DeprecatedOutputFormatMapping. These will be removed in v1.2.0
    • The usage remains the same by calling client.tts.get_output_format

Chores

  • Adds utils.deprecated to allow using a @deprecated decorator for functions/methods that will be deprecated in future versions
  • Adds usage docs for client.tts.get_output_format
  • Updates documentation

v1.0.0

25 Jun 00:37
Compare
Choose a tag to compare

v1.0.0 (06-24-2024)

A new major version release of the Cartesia Python client that overhauls the library structure.

Refer to the migration guide for more thorough details on changes.

Features

  • Adds support for model_id=sonic-multilingual to generate multilingual audio
  • Endpoint-specific methods for generating and streaming audio

Breaking changes

  • Renames CartesiaTTS and AsyncCartesiaTTS -> Cartesia and AsyncCartesia
  • Replaces client.generate with endpoint-specific methods for Text-to-Speech
  • output_format must be specified as an OutputFormat object, which is a dict specifying the keys: container, encoding and sample_rate
  • Both SSE and WebSocket requests no longer return sampling_rate in their output. They will respect the sample_rate corresponding to the OutputFormat passed in.

Adds

  • client.tts.sse methods for generating audio using Server-Sent Events
  • client.tts.websocket methods for managing a WebSocket connection and generating audio
  • client.tts.get_output_format() to obtain OutputFormat object from output format name
  • client.tts.get_sample_rate() to obtain sample_rate from output format name
  • client.voices.list() to fetch a list of all available voices
  • client.voices.get() to fetch a VoiceMetadata object from voice_id
  • client.voices.clone() to clone a voice by specifying a filepath
  • client.voices.create() to create a new voice
  • Specifies cartesia_version=2024-06-10 as default header for HTTP and WS requests

Removes

  • client.get_voices()
  • client.get_voice_embedding()
  • client.generate()
  • Ability to specify Numpy Array as a return type. We recommend using np.frombuffer with the appropriate dtype.
  • Ability to clone voices using a link