Note: the API is not fully stable yet. It may change at every new version. There are many methods, types and internal data structures that are not yet exposed.
To import the echograden
package as a Node.js module:
Install as a dependency in your project:
npm install echogarden
Import with:
import * as Echogarden from 'echogarden'
All methods, properties and arguments have TypeScript type information. You can use it to get more detailed and up-to-date type information that may not be covered here.
- Options reference
- List of all supported engines
- Quick guide to the command line interface
- WebSocket server reference
Synthesizes the given input.
input
: text to synthesize, can be astring
, or astring[]
. When given an array of strings, the elements of the array would be seen as predefined segments (this is useful if you would like to have more control over how segments are split, or your input has a special format requiring a custom splitting method).options
: synthesis options objectonSegment
: a callback that is called whenever a segment has been synthesized (optional)onSentence
: a callback that is called whenever a sentence has been synthesized (optional)
{
audio: RawAudio | Buffer
timeline: Timeline
language: string
}
audio
may either be a
RawAudio
object, which is a structure containing the sample rate and raw 32-bit float channels:
{
sampleRate: number
channels: Float32Array[]
}
- A
Buffer
containing the audio in encoded form, in the case a particular codec was specified in theoutputAudioFormat.codec
option.
You can optionally pass two async
callbacks to synthesize
, onSegment
and onSentence
.
For example:
async function onSegment(data: SynthesisSegmentEventData) {
console.log(data.transcript)
}
const { audio } = await Echogarden.synthesize("Hello World!", { engine: 'espeak' }, onSegment)
SynthesisSegmentEventData
is an object with the structure:
{
index: number // Index of part
total: number // Total number of parts
audio: RawAudio | Buffer // Audio for part
timeline: Timeline // Timeline for part
transcript: string // Transcript for part
language: string // Language for part
peakDecibelsSoFar: number // Peak decibels measured for all synthesized audio, so far
}
Requests a list of voices for a particular engine.
options
: voice list request options object
{
voiceList: SynthesisVoice[]
bestMatchingVoice: SynthesisVoice
}
Applies speech recognition to the input.
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)options
: recognition options object
{
transcript: string
timeline: Timeline
wordTimeline: Timeline
language: string
inputRawAudio: RawAudio
isolatedRawAudio?: RawAudio
backgroundRawAudio?: RawAudio
}
Aligns input audio with the given transcript.
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)transcript
: the transcript to align tooptions
: alignment options object
{
timeline: Timeline
wordTimeline: Timeline
transcript: string
language: string
inputRawAudio: RawAudio
isolatedRawAudio?: RawAudio
backgroundRawAudio?: RawAudio
}
Translates speech audio directly to a transcript in a different language (only English is currently supported).
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)options
: speech translation options object
{
transcript: string
timeline: Timeline
wordTimeline?: Timeline
sourceLanguage: string
targetLanguage: string
inputRawAudio: RawAudio
isolatedRawAudio?: RawAudio
backgroundRawAudio?: RawAudio
}
Translates text to text.
input
: stringoptions
: text translation options object
{
text: string
translatedText: string
translationPairs: TranslationPair[]
sourceLanguage: string
targetLanguage: string
}
translationPairs
is an array of objects corresponding to individual segments of the text and their translations.
Aligns input audio with the given translated transcript.
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)translatedTranscript
: the translated transcript to align tooptions
: translation alignment options object
{
timeline: Timeline
wordTimeline: Timeline
translatedTranscript: string
sourceLanguage: string
targetLanguage: string
inputRawAudio: RawAudio
isolatedRawAudio?: RawAudio
backgroundRawAudio?: RawAudio
}
Aligns input audio to both the native language transcript a translated one.
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)transcript
: the transcript to align to, in the native speech languagetranslatedTranscript
: the translated transcript to align tooptions
: transcript and translation alignment options object
{
timeline: Timeline
wordTimeline: Timeline
translatedTimeline: Timeline
translatedWordTimeline: Timeline
transcript: string
translatedTranscript: string
sourceLanguage: string
targetLanguage: string
inputRawAudio: RawAudio
isolatedRawAudio?: RawAudio
backgroundRawAudio?: RawAudio
}
Aligns given timeline with its translated transcript.
inputTimeline
: input timeline in the native languagetranslatedTranscript
: the translated transcript to align tooptions
: timeline translation alignment options object
{
timeline: Timeline
wordTimeline: Timeline
sourceLanguage?: string
targetLanguage: string
rawAudio?: RawAudio
}
Detects language of spoken audio.
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)options
: speech language detection options object
{
detectedLanguage: string
detectedLanguageName: string
detectedLanguageProbabilities: LanguageDetectionResults
}
Detects language of text.
input
: input text asstring
options
: text language detection options object
{
detectedLanguage: string
detectedLanguageName: string
detectedLanguageProbabilities: LanguageDetectionResults
}
Detects voice activity in audio (non-real-time).
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)options
: voice activity detection options object
{
timeline: Timeline
}
Tries to reduce background noise in spoken audio.
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)options
: denoising options object
{
denoisedAudio: RawAudio
}
Attempts to isolate an individual audio stem, like human voice, or one or more musical instruments (depending on model training), from the given waveform.
input
: can be an audio file path (string
), encoded audio (Buffer
orUint8array
) or a raw audio object (RawAudio
)options
: source separation options object
{
inputRawAudio: RawAudio
isolatedRawAudio: RawAudio
backgroundRawAudio: RawAudio
}
Converts a timeline to subtitles.
timeline
: timeline objectoptions
: subtitles configuration object
Subtitle file content, as a string.
Converts subtitles to a timeline.
subtitles
: timeline object
Note: This function simply converts each individual cue to a segment entry in a timeline. Since subtitle cues may contain parts of sentences or phrases, this may not produce very useful results for your needs. However, you can use it as a means to parse a subtitle file (srt
or vtt
), and apply your own processing later.
Timeline object.
Sets a global option.
See the options reference for more details about the available global options.
Gets a global option.
The value associated with the given key.
- Expose more methods that may be useful for developers, like phonemization, etc.
- Expose audio playback used in CLI, possibly with timeline synchronization support.