Releases: cartesia-ai/cartesia-python
Releases · cartesia-ai/cartesia-python
v1.0.12
v1.0.11
v1.0.11 (08-07-2024)
Features
- Support timestamps on
_TTSContext.send
on the syncCartesia
client - Allow
speed
in_experimental_voice_controls
to be specified as a float between [-1.0, 1.0]
Adds
- Expands testing suite
v1.0.9
v1.0.9 (07-25-2024)
Adds
enhance
param to thetts.voice.clone
method which controls whether the sample clip submitted is enhanced prior to voice cloning
Chores
- Returns a warning on the synchronous
_WebSocket
class if the user'swebsockets
version < 12.0
v1.0.7
v1.0.7 (07-15-2024)
Features
- Supports generating timestamps on the WebSocket endpoint to get detailed timing information for each word in input transcripts.
- Experimental support for applying speed & emotion controls on voices.
Adds
add_timestamps
param to_WebSocket.send()
,_AsyncWebSocket.send()
,_TTSContext.send()
and_AsyncTTSContext.send()
methods for generating timestamps corresponding to input transcripts.- Timestamp results are returned in a
word_timestamps
object with the keys:words
,start
andend
- Timestamp results are returned in a
_experimental_voice_controls
param to allsend()
methods which accepts an object withspeed
andemotion
fields
Chores
- Adds usage examples for feature updates to README
v1.0.5
v1.0.5 (07-12-2024)
Features
- Support for audio continuations on synchronous
Cartesia
client. Users can pass in a text generator to receive streaming audio.
Adds
- New
_TTSContext
class andcontext()
method to_WebSocket
for supporting input streaming use cases with continuations.send()
method that takes in anIterator
object as thetranscript
. ReturnsGenerator
that streams out audio data.
v1.0.4
v1.0.4 (07-06-2024)
Features
- Support for audio continuations for seamless speech synthesis. Allows real-time audio generation and playback as text becomes available.
Adds
- New
_AsyncTTSContext
class andcontext()
method to_AsyncWebsocket
for managing streaming sessionssend()
method for streaming text inputsno_more_inputs()
method to signal the end of text input, which sends a message withcontinue_ = False
. Otherwise, thecontext
times out after 5 seconds of inactivity.receive()
method returnsAsyncGenerator
for asynchronous audio chunk retrieval
- Support for specifying custom
base_url
when initializingCartesia
orAsyncCartesia
Changes
- Modifies
AsyncWebsocket
to internally use theAsyncContext
class. No change in usage.
Bug Fixes
- Removes
Content-Type
header from filepath cloning to work withhttpx.post
- Fixes
client.tts.get_output_format
for deprecated output format names
v1.0.3
v1.0.3 (06-25-2024)
Changes
- Fixes undefined import issue for
cartesia.utils
by modifyingsetup.py
to include subdirectories
v1.0.2
v1.0.2 (06-25-2024)
Chores
- Adds
__init__
tocartesia/utils
to make it a module
v1.0.1
v1.0.1 (06-25-2024)
Changes
- Updates
OutputFormatMapping
with more clearly-defined names that will be supported going forward.- This deprecates the old string-based names and moves them to
DeprecatedOutputFormatMapping
. These will be removed in v1.2.0 - The usage remains the same by calling
client.tts.get_output_format
- This deprecates the old string-based names and moves them to
Chores
- Adds
utils.deprecated
to allow using a@deprecated
decorator for functions/methods that will be deprecated in future versions - Adds usage docs for
client.tts.get_output_format
- Updates documentation
v1.0.0
v1.0.0 (06-24-2024)
A new major version release of the Cartesia Python client that overhauls the library structure.
Refer to the migration guide for more thorough details on changes.
Features
- Adds support for
model_id=sonic-multilingual
to generate multilingual audio - Endpoint-specific methods for generating and streaming audio
Breaking changes
- Renames
CartesiaTTS
andAsyncCartesiaTTS
->Cartesia
andAsyncCartesia
- Replaces
client.generate
with endpoint-specific methods for Text-to-Speech output_format
must be specified as anOutputFormat
object, which is a dict specifying the keys:container
,encoding
andsample_rate
- Both SSE and WebSocket requests no longer return
sampling_rate
in their output. They will respect thesample_rate
corresponding to theOutputFormat
passed in.
Adds
client.tts.sse
methods for generating audio using Server-Sent Eventsclient.tts.websocket
methods for managing a WebSocket connection and generating audioclient.tts.get_output_format()
to obtainOutputFormat
object from output format nameclient.tts.get_sample_rate()
to obtainsample_rate
from output format nameclient.voices.list()
to fetch a list of all available voicesclient.voices.get()
to fetch aVoiceMetadata
object fromvoice_id
client.voices.clone()
to clone a voice by specifying a filepathclient.voices.create()
to create a new voice- Specifies
cartesia_version=2024-06-10
as default header for HTTP and WS requests
Removes
client.get_voices()
client.get_voice_embedding()
client.generate()
- Ability to specify Numpy Array as a return type. We recommend using
np.frombuffer
with the appropriatedtype
. - Ability to clone voices using a link