22 Aug 14:38

reuben

02e4c76

DeepSpeech 0.8.2

General

This is the 0.8.2 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X should work with this release. As with previous releases, this release includes the source code:

v0.8.2.tar.gz

and the acoustic models:

deepspeech-0.8.2-models.pbmm
deepspeech-0.8.2-models.tflite

all under the MPL-2.0 license.

The model with the ".pbmm" extension is memory mapped and thus memory efficient and fast to load. The model with the ".tflite" extension is converted to use TFLite, has post-training quantization enabled, and is more suitable for resource constrained environments.

The acoustic models were trained on American English and the pbmm model achieves an 5.97% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

deepspeech-0.8.2-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

We also include example audio files:

audio-0.8.2.tar.gz

which can be used to test the engine, and checkpoint files:

deepspeech-0.8.2-checkpoint.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Fixed incorrect minimum OS version in macOS binaries (#3259)
Fixed bug in metadata output for Python package client (#3264)
Added ElectronJS v9.2 support (#3266)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred in several phases each phase with a lower learning rate than the phase before it.

The initial phase used the hyperparameters:

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.
dev_files LibriSpeech clean dev corpus.
test_files LibriSpeech clean test corpus
train_batch_size 128
dev_batch_size 128
test_batch_size 128
n_hidden 2048
learning_rate 0.0001
dropout_rate 0.40
epochs 125

The weights with the best validation loss were selected at the end of 125 epochs using --noearly_stop.

The second phase was started using the weights with the best validation loss from the previous phase. This second phase used the same hyperparameters as the first but with the following changes:

learning_rate 0.00001
epochs 100

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop.

Like the second, the third phase was started using the weights with the best validation loss from the previous phase. This third phase used the same hyperparameters as the second but with the following changes:

learning_rate 0.000005

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop. The model selected under this process was trained for a sum total of 732522 steps over all phases.

Subsequent to this the lm_optimizer.py was used with the following parameters:

lm_alpha_max 5
lm_beta_max 5
n_trials 2400
test_files LibriSpeech clean dev corpus.

to determine the optimal lm_alpha and lm_beta with respect to the LibriSpeech clean dev corpus. This resulted in:

lm_alpha 0.931289039105002
lm_beta 1.1834137581510284

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via
```
pip install deepspeech
```
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
```
pip install deepspeech-gpu
```
On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:
```
pip install deepspeech-tflite
```
NodeJS (Versions 10.x, 11.x, 12.x, 13.x and 14.x) installed via
```
npm install deepspeech
```
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
```
npm install deepspeech-gpu
```
On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:
```
npm install deepspeech-tflite
```
ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, 9.1 and 9.2 are also supported
C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)
.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.
Go which is installed by following the instructions on the external Go repo.
V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).
OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15
Linux x86 64 bit with a modern CPU (at least AVX/FMA)
Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
Raspbian Buster on Raspberry Pi 3, Pi 4
Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards
Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.
iOS with Swift bindings (experimental). Tested on iPhone Xs.
TFLite Delegation API is here as a preview: do not expect released models to work out-of-the box, but feedback / PRs is welcome.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on [General Topics](https://discou...

Assets 72

18 Aug 14:09

v0.9.0-alpha.7

02afc2a

v0.9.0-alpha.7 Pre-release

Pre-release

Merge pull request #3254 from lissyx/bump-v0.9.0a7

Bump VERSION to 0.9.0-alpha.7

Assets 68

12 Aug 18:15

v0.9.0-alpha.6

a6f40a3

v0.9.0-alpha.6 Pre-release

Pre-release

Merge pull request #3244 from lissyx/bump-v0.9.0a6

Bump VERSION to 0.9.0-alpha.6

Assets 69

11 Aug 08:25

lissyx

v0.8.1

fa883eb

DeepSpeech 0.8.1

General

This is the 0.8.1 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X should work with this release. As with previous releases, this release includes the source code:

v0.8.1.tar.gz

and the acoustic models:

deepspeech-0.8.1-models.pbmm
deepspeech-0.8.1-models.tflite

all under the MPL-2.0 license.

The acoustic models were trained on American English and the pbmm model achieves an 5.97% word error rate on the LibriSpeech clean test corpus.

In addition we release the scorer:

deepspeech-0.8.1-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

We also include example audio files:

audio-0.8.1.tar.gz

which can be used to test the engine, and checkpoint files:

deepspeech-0.8.1-checkpoint.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Fixed references to older models in the docs and swift code (#3216)
Fixed incorrect linkage, -shared was forced (#3207)

Training Regimen + Hyperparameters for fine-tuning

In contrast to some previous releases, training for this release occurred in several phases each phase with a lower learning rate than the phase before it.

The initial phase used the hyperparameters:

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.
dev_files LibriSpeech clean dev corpus.
test_files LibriSpeech clean test corpus
train_batch_size 128
dev_batch_size 128
test_batch_size 128
n_hidden 2048
learning_rate 0.0001
dropout_rate 0.40
epochs 125

The weights with the best validation loss were selected at the end of 125 epochs using --noearly_stop.

The second phase was started using the weights with the best validation loss from the previous phase. This second phase used the same hyperparameters as the first but with the following changes:

learning_rate 0.00001
epochs 100

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop.

learning_rate 0.000005

Subsequent to this the lm_optimizer.py was used with the following parameters:

lm_alpha_max 5
lm_beta_max 5
n_trials 2400
test_files LibriSpeech clean dev corpus.

to determine the optimal lm_alpha and lm_beta with respect to the LibriSpeech clean dev corpus. This resulted in:

lm_alpha 0.931289039105002
lm_beta 1.1834137581510284

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via
```
pip install deepspeech
```
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
```
pip install deepspeech-gpu
```
On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:
```
pip install deepspeech-tflite
```
NodeJS (Versions 10.x, 11.x, 12.x, 13.x and 14.x) installed via
```
npm install deepspeech
```
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
```
npm install deepspeech-gpu
```
On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:
```
npm install deepspeech-tflite
```
ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, and 9.1 are also supported
C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)
.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.
Go which is installed by following the instructions on the external Go repo.
V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).
OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15
Linux x86 64 bit with a modern CPU (at least AVX/FMA)
Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
Raspbian Buster on Raspberry Pi 3, Pi 4
Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards
Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.
iOS with Swift bindings (experimental). Tested on iPhone Xs.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and [Deep Speech Developm...

Assets 72

07 Aug 12:44

v0.9.0-alpha.5

457198c

v0.9.0-alpha.5 Pre-release

Pre-release

Merge pull request #3232 from lissyx/bump-v0.9.0-alpha.5

Bump VERSION to 0.9.0-alpha.5

Assets 69

06 Aug 13:25

v0.9.0-alpha.4

c31df0f

v0.9.0-alpha.4 Pre-release

Pre-release

Bump VERSION to 0.9.0-alpha.4

Assets 69

30 Jul 17:16

kdavis-mozilla

v0.8.0

f56b07d

DeepSpeech 0.8.0

General

This is the 0.8.0 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X should work with this release. As with previous releases, this release includes the source code:

v0.8.0.tar.gz

and the acoustic models:

deepspeech-0.8.0-models.pbmm
deepspeech-0.8.0-models.tflite

all under the MPL-2.0 license.

The acoustic models were trained on American English and the pbmm model achieves an 5.97% word error rate on the LibriSpeech clean test corpus.

In addition we release the scorer:

deepspeech-0.8.0-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

We also include example audio files:

audio-0.8.0.tar.gz

which can be used to test the engine, and checkpoint files:

deepspeech-0.8.0-checkpoint.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Removed scorer file from Git LFS (#3192)
Added iOS microphone streaming (#3191)
Added ability to reverse data set order to quickly probe OOM conditions (#3177)
Build and publish iOS framework in GitHub release files (#3173)
Added iOS support (#3150)
Add csv output to SDB building (#3147)
Add augmentation support to SDB building (#3145)
Fixed some style inconsistencies in Java bindings (#3135)
Added methods to check for label presence in the Alphabet (#3131)
Fixed some regressions from Alphabet refactoring (#3125)
Re-wrote generate_package.py in C++ to avoid training dependencies (#3113)
Added building of kenlm in training container image (#3108)
Added TensorFlow as a submodule (#3107)
Use TensorFlow r2.2 and build TFLite with Ruy (enables threaded computations on TFLite models) (#2952)
Enable TFLite delegate support (#3100)
Add UWP Nuget packing support (#3100)
Added warp augmentation (#3091)
Fix of overlay augmentation hang after first epoch (#3090)

Training Regimen + Hyperparameters for fine-tuning

In contrast to some previous releases, training for this release occurred in several phases each phase with a lower learning rate than the phase before it.

The initial phase used the hyperparameters:

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.
dev_files LibriSpeech clean dev corpus.
test_files LibriSpeech clean test corpus
train_batch_size 128
dev_batch_size 128
test_batch_size 128
n_hidden 2048
learning_rate 0.0001
dropout_rate 0.40
epochs 125

The weights with the best validation loss were selected at the end of 125 epochs using --noearly_stop.

The second phase was started using the weights with the best validation loss from the previous phase. This second phase used the same hyperparameters as the first but with the following changes:

learning_rate 0.00001
epochs 100

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop.

learning_rate 0.000005

Subsequent to this the lm_optimizer.py was used with the following parameters:

lm_alpha_max 5
lm_beta_max 5
n_trials 2400
test_files LibriSpeech clean dev corpus.

to determine the optimal lm_alpha and lm_beta with respect to the LibriSpeech clean dev corpus. This resulted in:

lm_alpha 0.931289039105002
lm_beta 1.1834137581510284

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via
```
pip install deepspeech
```
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
```
pip install deepspeech-gpu
```
On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:
```
pip install deepspeech-tflite
```
NodeJS (Versions 10.x, 11.x, 12.x, 13.x and 14.x) installed via
```
npm install deepspeech
```
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
```
npm install deepspeech-gpu
```
On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:
```
npm install deepspeech-tflite
```
ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, and 9.1 are also supported
C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)
.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.
Go which is installed by following the...

Assets 76

27 Jul 14:30

v0.8.0-alpha.8

03b5689

v0.8.0-alpha.8 Pre-release

Pre-release

Merge pull request #3186 from lissyx/update-r0.8

Update r0.8

Assets 68

15 Jul 20:33

v0.9.0-alpha.3

78ae08c

v0.9.0-alpha.3 Pre-release

Pre-release

Merge pull request #3161 from lissyx/bump-v0.9.0-alpha.3

Bump VERSION to 0.9.0-alpha.3

Assets 68

15 Jul 22:18

v0.8.0-alpha.7

32e185f

v0.8.0-alpha.7 Pre-release

Pre-release

Merge pull request #3162 from lissyx/update-r0.8

Update r0.8

Assets 68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Releases: mozilla/DeepSpeech

DeepSpeech 0.8.2

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

v0.9.0-alpha.7

v0.9.0-alpha.6

DeepSpeech 0.8.1

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

v0.9.0-alpha.5

v0.9.0-alpha.4

DeepSpeech 0.8.0

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

v0.8.0-alpha.8

v0.9.0-alpha.3

v0.8.0-alpha.7