title |
---|
C# SDK Reference |
As of April 2025, AssemblyAI C# SDK has been discontinued and will no longer be maintained. While the SDK will no longer be updated, any previously published releases will remain available.
Going forward, see the C# Pre-Recorded Audio page for information on how to integrate with our API directly.
We know this is a disruptive change. If you need help with this transition, reach out to our Support team at [email protected] and we'll help you in any way we can.
Our Speech-to-Text model enables you to transcribe pre-recorded audio into written text.
On top of the transcription, you can enable other features and models, such as Speaker Diarization, by adding additional parameters to the same transcription request.
Choose between Best and Nano based on the cost and performance tradeoffs best suited for your application.
The following example transcribes an audio file from a URL.
using AssemblyAI;
using AssemblyAI.Transcripts;
var client = new AssemblyAIClient("<YOUR_API_KEY>");
// You can use a local file:
/*
var transcript = await client.Transcripts.TranscribeAsync(
new FileInfo("./example.mp3")
);
*/
// Or use a publicly-accessible URL:
const string audioUrl = "https://assembly.ai/wildfires.mp3";
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl
});
if (transcript.Status == TranscriptStatus.Error)
{
Console.WriteLine(transcript.Error);
Environment.Exit(1);
}
// Alternatively, you can use the EnsureStatusCompleted() method
// to throw an exception if the transcription status is not "completed".
// transcript.EnsureStatusCompleted();
Console.WriteLine(transcript.Text);
Example output
Smoke from hundreds of wildfires in Canada is triggering air quality alerts
throughout the US. Skylines from Maine to Maryland to Minnesota are gray and
smoggy. And...
The response also includes an array with information about each word:
foreach (var word in transcript.Words!)
{
Console.WriteLine(
"Word: {0}, Start: {1}, End: {2}, Confidence: {3}",
word.Text, word.Start, word.End, word.Confidence
);
}
Word: Smoke, Start: 250, End: 650, Confidence: 0.73033
Word: from, Start: 730, End: 1022, Confidence: 0.99996
...
After you've submitted a file for transcription, your transcript has one of the following statuses:
Status | Description |
---|---|
processing |
The audio file is being processed. |
queued |
The audio file is waiting to be processed. |
completed |
The transcription has completed successfully. |
error |
An error occurred while processing the audio file. |
If the transcription fails, the status of the transcript is error
, and the transcript includes an error
property explaining what went wrong.
if (transcript.Status == TranscriptStatus.Error)
{
Console.WriteLine(transcript.Error);
Environment.Exit(1);
}
// Alternatively, you can use the EnsureStatusCompleted() method
// to throw an exception if the transcription status is not "completed".
// transcript.EnsureStatusCompleted();
A transcription may fail for various reasons:
- Unsupported file format
- Missing audio in file
- Unreachable audio URL
If a transcription fails due to a server error, we recommend that you resubmit the file for transcription to allow another server to process the audio.
We use a combination of models to produce your results. You can select the class of models to use in order to make cost-performance tradeoffs best suited for your application. You can visit our pricing page for more information on our model tiers.
Name | SDK Parameter | Description |
---|---|---|
Best (default) | SpeechModel.Best |
Use our most accurate and capable models with the best results, recommended for most use cases. |
Nano | SpeechModel.Nano |
Use our less accurate, but much lower cost models to produce your results. |
You can change the model by setting the SpeechModel
in the transcription parameters:
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
SpeechModel = SpeechModel.Nano
});
For a list of the supported languages for each model, see Supported languages.
The default region is US, with base URL api.assemblyai.com
. For EU data residency requirements, you can use our base URL for EU at api.eu.assemblyai.com
.
The base URL for EU is currently only available for Async transcription.
Region | Base URL |
---|---|
US (default) | api.assemblyai.com |
EU | api.eu.assemblyai.com |
To use the EU endpoint, set the BaseUrl
in the ClientOptions
object like this:
// Create ClientOptions object
var options = new ClientOptions
{
ApiKey = Environment.GetEnvironmentVariable("YOUR_API_KEY")!,
BaseUrl = "https://api.eu.assemblyai.com"
};
// Initialize client with options
var client = new AssemblyAIClient(options);
By default, the API automatically punctuates the transcription text and formats proper nouns, as well as converts numbers to their numerical form.
To disable punctuation and text formatting, set Punctuate
and FormatText
to false
in the transcription config.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
Punctuate = false,
FormatText = false
});
Identify the dominant language spoken in an audio file and use it during the transcription. Enable it to detect any of the supported languages.
To reliably identify the dominant language, the file must contain at least 50 seconds of spoken audio.
To enable it, set LanguageDetection
to true
in the transcription parameters.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
LanguageDetection = true
});
Confidence score
If language detection is enabled, the API returns a confidence score for the detected language. The score ranges from 0.0 (low confidence) to 1.0 (high confidence).
Console.WriteLine(transcript.LanguageConfidence);
Set a language confidence threshold
You can set the confidence threshold that must be reached if language detection is enabled. An error will be returned if the language confidence is below this threshold. Valid values are in the range [0,1] inclusive.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
LanguageCode = TranscriptLanguageCode.Es
});
To see all supported languages and their codes, see Supported languages.
Custom Spelling lets you customize how words are spelled or formatted in the transcript.
To use Custom Spelling, include CustomSpelling
in your transcription parameters. The parameter should be an array of objects, with each object specifying a mapping from a word or phrase to a new spelling or format.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
CustomSpelling =
[
new TranscriptCustomSpelling
{
From = ["gettleman"],
To = "Gettleman"
},
new TranscriptCustomSpelling
{
From = ["Sequel"],
To = "SQL"
}
]
});
The value in the to
key is case-sensitive, but the value in the from
key isn't. Additionally, the to
key must only contain one word, while the from
key can contain multiple words.
To improve the transcription accuracy, you can boost certain words or phrases that appear frequently in your audio file.
To boost words or phrases, include the word_boost
parameter in the transcription config.
You can also control how much weight to apply to each keyword or phrase. Include boost_param
in the transcription config with a value of low
, default
, or high
.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
WordBoost = ["aws", "azure", "google cloud"],
BoostParam = TranscriptBoostParam.High
});
Follow formatting guidelines for custom vocabulary to ensure the best results:
- Remove all punctuation except apostrophes.
- Make sure each word is in its spoken form. For example,
iphone seven
instead ofiphone 7
. - Remove spaces between letters in acronyms.
Additionally, the model still accepts words with unique characters such as é, but converts them to their ASCII equivalent.
You can boost a maximum of 1,000 unique keywords and phrases, where each of them can contain up to 6 words.
If you have a multichannel audio file with multiple speakers, you can transcribe each of them separately.
To enable it, set Multichannel
to true
in the transcription parameters.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
Multichannel = true
});
foreach (var utterance in transcript.Utterances!)
{
Console.WriteLine($"Speaker: {utterance.Speaker}, Word: {utterance.Text}");
}
Multichannel audio increases the transcription time by approximately 25%.
The response includes an audio_channels
property with the number of different channels, and an additional utterances
property, containing a list of turn-by-turn utterances.
Each utterance contains channel information, starting at 1.
Additionally, each word in the words
array contains the channel identifier.
Use Multichannel instead.
You can export completed transcripts in SRT or VTT format, which can be used for subtitles and closed captions in videos.
You can also customize the maximum number of characters per caption by specifying the chars_per_caption
parameter.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl
});
var stt = await client.Transcripts.GetSubtitlesAsync(transcript.Id, SubtitleFormat.Srt);
stt = await client.Transcripts.GetSubtitlesAsync(transcript.Id, SubtitleFormat.Srt, charsPerCaption: 32);
var vtt = await client.Transcripts.GetSubtitlesAsync(transcript.Id, SubtitleFormat.Vtt);
vtt = await client.Transcripts.GetSubtitlesAsync(transcript.Id, SubtitleFormat.Vtt, charsPerCaption: 32);
You can retrieve transcripts that are automatically segmented into paragraphs or sentences, for a more reader-friendly experience.
The text of the transcript is broken down by either paragraphs or sentences, along with additional metadata.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl
});
var sentencesResponse = await client.Transcripts.GetSentencesAsync(transcript.Id);
foreach (var sentence in sentencesResponse.Sentences) {
Console.WriteLine(sentence.Text);
}
var paragraphsResponse = await client.Transcripts.GetParagraphsAsync(transcript.Id);
foreach (var paragraph in paragraphsResponse.Paragraphs) {
Console.WriteLine(paragraph.Text);
}
The response is an array of objects, each representing a sentence or a paragraph in the transcript. See the API reference for more info.
The following filler words are removed by default:
- "um"
- "uh"
- "hmm"
- "mhm"
- "uh-huh"
- "ah"
- "huh"
- "hm"
- "m"
If you want to keep filler words in the transcript, you can set the disfluencies
to true
in the transcription config.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
Disfluencies = true
});
You can automatically filter out profanity from the transcripts by setting filter_profanity
to true
in your transcription config.
Any profanity in the returned text
will be replaced with asterisks.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
FilterProfanity = true
});
If you only want to transcribe a portion of your file, you can set the audio_start_from
and the audio_end_at
parameters in your transcription config.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
AudioStartFrom = 5000,
AudioEndAt = 15000
});
To only transcribe files that contain at least a specified percentage of spoken audio, you can set the speech_threshold
parameter. You can pass any value between 0 and 1.
If the percentage of speech in the audio file is below the provided threshold, the value of text
is None
and the response contains an error
message:
Audio speech threshold 0.9461 is below the requested speech threshold value 1.0
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
SpeechThreshold = 0.1f
});
You can search through a completed transcript for a specific set of keywords, which is useful for quickly finding relevant information.
The parameter can be a list of words, numbers, or phrases up to five words.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
});
var matchesResponse = await client.Transcripts.WordSearchAsync(
transcript.Id,
["foo", "bar", "foo bar", "42"]
);
foreach (var match in matchesResponse.Matches)
{
Console.WriteLine($"Found '{match.Text}' {match.Count} times in the transcript");
}
You can remove the data from the transcript and mark it as deleted.
await client.Transcripts.DeleteAsync("1234");
As of the feature launch date:
- The TTL is set to 3 days (subject to change).
- Customers can still manually delete transcripts before the TTL period by using the deletion endpoint. However, they cannot keep transcripts on the platform after the TTL period has expired.
BAAs are limited to customers who process PHI, subject to HIPAA. If you are processing PHI and require a BAA, please reach out to [email protected].
The Speaker Diarization model lets you detect multiple speakers in an audio file and what each speaker said.
If you enable Speaker Diarization, the resulting transcript will return a list of utterances, where each utterance corresponds to an uninterrupted segment of speech from a single speaker.
Speaker Diarization doesn't support multichannel transcription. Enabling both Speaker Diarization and multichannel will result in an error.
Quickstart
To enable Speaker Diarization, set SpeakerLabels
to true
in the transcription parameters.
using AssemblyAI;
using AssemblyAI.Transcripts;
var client = new AssemblyAIClient("<YOUR_API_KEY>");
// You can use a local file:
/*
var transcript = await client.Transcripts.TranscribeAsync(
new FileInfo("./example.mp3"),
new TranscriptOptionalParams
{
SpeakerLabels = true
}
);
*/
// Or use a publicly-accessible URL:
const string audioUrl = "https://assembly.ai/wildfires.mp3";
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
SpeakerLabels = true
});
foreach (var utterance in transcript.Utterances!)
{
Console.WriteLine($"Speaker {utterance.Speaker}: {utterance.Text}");
}
Example output
Speaker A: Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from Maine to Maryland to Minnesota are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what's happening here and why, so we called Peter DiCarlo, an associate professor in the Department of Environmental Health and Engineering at Johns Hopkins University. Good morning, professor.
Speaker B: Good morning.
Speaker A: So what is it about the conditions right now that have caused this round of wildfires to affect so many people so far away?
Speaker B: Well, there's a couple of things. The season has been pretty dry already, and then the fact that we're getting hit in the US. Is because there's a couple of weather systems that are essentially channeling the smoke from those Canadian wildfires through Pennsylvania into the Mid Atlantic and the Northeast and kind of just dropping the smoke there.
Speaker A: So what is it in this haze that makes it harmful? And I'm assuming it is.
...
Set number of speakers
If you know the number of speakers in advance, you can improve the diarization performance by setting the SpeakersExpected
parameter.
var transcript = await client.Transcripts.TranscribeAsync(new TranscriptParams
{
AudioUrl = audioUrl,
SpeakerLabels = true,
SpeakersExpected = 3
});
The SpeakersExpected
parameter is ignored for audio files with a duration less than 2 minutes.