Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SpeechToText docs, Add OfflineSpeechToText #489

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 42 additions & 49 deletions docs/maui/essentials/speech-to-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ ms.date: 05/26/2023

# SpeechToText

The `SpeechToText` API provides the ability to convert speech to text.
The `SpeechToText` API provides the ability to convert speech to text using online recognition. For offline recognition, you can use the `OfflineSpeechToText`.

![Screenshot of SpeechText implemented on macOS](../images/essentials/speech-to-text-mac.gif "SpeechToText on macOS")

Expand All @@ -20,6 +20,8 @@ Add permissions to `AndroidManifest.xml`:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
```

For `OfflineSpeechToText`, Android 33.0 or higher is required.

# [iOS/MacCatalyst](#tab/ios)

Add permissions to `Info.plist`
Expand All @@ -31,6 +33,8 @@ Add permissions to `Info.plist`
<string>SpeechToText requires microphone usage</string>
```

For `OfflineSpeechToText`, iOS 13.0 or higher is required.

# [Windows](#tab/windows)

Add permissions to `Package.appxmanifest`
Expand All @@ -57,36 +61,6 @@ Add permissions to `tizen-manifest.xml`:

The `SpeechToText` can be used as follows in C#:

```csharp
async Task Listen(CancellationToken cancellationToken)
{
var isGranted = await speechToText.RequestPermissions(cancellationToken);
if (!isGranted)
{
await Toast.Make("Permission not granted").Show(CancellationToken.None);
return;
}

var recognitionResult = await speechToText.ListenAsync(
CultureInfo.GetCultureInfo(Language),
new Progress<string>(partialText =>
{
RecognitionText += partialText + " ";
}), cancellationToken);

if (recognitionResult.IsSuccessful)
{
RecognitionText = recognitionResult.Text;
}
else
{
await Toast.Make(recognitionResult.Exception?.Message ?? "Unable to recognize speech").Show(CancellationToken.None);
}
}
```

or using events:

```csharp
async Task StartListening(CancellationToken cancellationToken)
{
Expand All @@ -99,7 +73,7 @@ async Task StartListening(CancellationToken cancellationToken)

speechToText.RecognitionResultUpdated += OnRecognitionTextUpdated;
speechToText.RecognitionResultCompleted += OnRecognitionTextCompleted;
await speechToText.StartListenAsync(CultureInfo.CurrentCulture, CancellationToken.None);
await speechToText.StartListenAsync(new SpeechToTextOptions { Culture = CultureInfo.CurrentCulture, ShouldReportPartialResults = true }, CancellationToken.None);
}

async Task StopListening(CancellationToken cancellationToken)
Expand All @@ -125,31 +99,48 @@ void OnRecognitionTextCompleted(object? sender, SpeechToTextRecognitionResultCom
|Method |Description |
|---------|---------|
| RequestPermissions | Asks for permission. |
| ListenAsync | Starts speech recognition. |
| StartListenAsync | Starts the SpeechToText service. (Real time speech recognition results will be surfaced via RecognitionResultUpdated and RecognitionResultCompleted) |
| StopListenAsync | Stops the SpeechToText service. (Speech recognition results will be surfaced via RecognitionResultCompleted) |

### SpeechToTextResult

The result returned from the `ListenAsync` method. This can be used to verify whether the recognition was successful, and also access any exceptions that may have ocurred during the speech recognition.

#### Properties
## Properties

|Property |Type |Description |
|---------|---------|---------|
| Text | `string` | The recognized text. |
| Exception | `Exception` | Gets the `Exception` if the speech recognition operation failed. |
| IsSuccessful | `bool` | Gets a value determining whether the operation was successful. |
| CurrentState | `SpeechToTextState` | Gets a current listening state. |

#### Events
## Events

|EventName |EventArgs |Description |
|---------|---------|---------|
| RecognitionResultUpdated | `SpeechToTextRecognitionResultUpdatedEventArgs` | Triggers when SpeechToText has real time updates. |
| RecognitionResultCompleted | `SpeechToTextRecognitionResultCompletedEventArgs` | Triggers when SpeechToText has completed. |
| StateChanged | `SpeechToTextStateChangedEventArgs` | Triggers when `CurrentState` has changed. |


### SpeechToTextOptions

The `SpeechToTextOptions` class provides the ability to configure the speech recognition service.

#### Properties

|Property |Type |Description |
|---------|---------|---------|
| Culture | `CultureInfo` | The spoken language to use for speech recognition. |
| ShouldReportPartialResults | `bool` | Gets or sets if include partial results. `True` by default. |


### SpeechToTextResult

The result returned from the `RecognitionResultCompleted` event. This can be used to verify whether the recognition was successful, and also access any exceptions that may have ocurred during the speech recognition.

#### Properties

|Property |Type |Description |
|---------|---------|---------|
| Text | `string` | The recognized text. |
| Exception | `Exception` | Gets the `Exception` if the speech recognition operation failed. |
| IsSuccessful | `bool` | Gets a value determining whether the operation was successful. |

#### Methods

|Method |Description |
Expand All @@ -159,6 +150,7 @@ The result returned from the `ListenAsync` method. This can be used to verify wh
> [!WARNING]
> `EnsureSuccess` will throw an `Exception` if the recognition operation was unsuccessful.


## Dependency Registration

In case you want to inject service, you first need to register it.
Expand All @@ -174,12 +166,16 @@ public static class MauiProgram
.UseMauiApp<App>()
.UseMauiCommunityToolkit();

builder.Services.AddSingleton<ISpeechToText>(SpeechToText.Default);
builder.Services.AddSingleton<ISpeechToText>(SpeechToText.Default);
// For offline recognition
// builder.Services.AddSingleton<IOfflineSpeechToText>(OfflineSpeechToText.Default);
return builder.Build();
}
}
```

> In case you need to register both `SpeechToText` and `OfflineSpeechToText`, you can use `KeyedService`.

Now you can inject the service like this:

```csharp
Expand All @@ -202,12 +198,7 @@ public partial class MainPage : ContentPage
return;
}

var recognitionResult = await speechToText.ListenAsync(
CultureInfo.GetCultureInfo("uk-ua"),
new Progress<string>(), cancellationToken);

recognitionResult.EnsureSuccess();
await Toast.Make($"RecognizedText: {recognitionResult.Text}").Show(cancellationToken);
await speechToText.StartListenAsync(new SpeechToTextOptions { Culture = CultureInfo.CurrentCulture, ShouldReportPartialResults = true }, CancellationToken.None);
}
}
```
Expand All @@ -216,6 +207,8 @@ public partial class MainPage : ContentPage

You can find an example of `SpeechToText` in action in the [.NET MAUI Community Toolkit Sample Application](https://github.com/CommunityToolkit/Maui/blob/main/samples/CommunityToolkit.Maui.Sample/Pages/Essentials/SpeechToTextPage.xaml).

For Offline recognition, you can use this sample: [.NET MAUI Community Toolkit Sample Application](https://github.com/CommunityToolkit/Maui/blob/main/samples/CommunityToolkit.Maui.Sample/Pages/Essentials/OfflineSpeechToTextPage.xaml).

## API

You can find the source code for `SpeechToText` over on the [.NET MAUI Community Toolkit GitHub repository](https://github.com/CommunityToolkit/Maui/blob/main/src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs).