Releases: mozilla/DeepSpeech
Deep Speech 0.1.1
General
This is the 0.1.1 release of Deep Speech, an open speech-to-text engine. This release includes source code
v0.1.1.tar.gz
and a model, not yet optimized for size,
deepspeech-0.1.1-models.tar.gz
trained on American English which achieves a 5.6% word error rate (The language model included some test data.) on the LibriSpeech clean test corpus, and example audio
audio-0.1.1.tar.gz
which can be used to test the engine and checkpoint files
deepspeech-0.1.1-checkpoint.tar.gz
which can be used as the basis for further fine-tuning. Unfortunately licensing issues prevent us from releasing the text used to train the language model.
Notable changes from the previous release
- Rust bindings were contributed by RustAudio
- Lowering dependency on AVX2 to AVX instruction sets (mozilla/tensorflow#46)
- Pre-built binaries now work with upstream TensorFlow 1.4 (mozilla/tensorflow#43)
- Switching GPU build to CUDA 8.0 / CuDNN v6 (mozilla/tensorflow#43)
- Added support for Node.JS 7/8/9 (#1042)
- Initializing a training run from a frozen graph (eg. a release model) is now easier (#1149)
- The Python package no longer holds the GIL during inference and can be used in multi-threaded Python programs (#1164)
- The Python package now works on macOS 10.10 and 10.11 (#1065)
Hyperparameters for fine-tuning
The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a two node cluster where each node has 8 TitanX Pascal GPU's.
train_files
Fisher, LibriSpeech, and Switchboard training corpora.dev_files
LibriSpeech clean dev corpustest_files
LibriSpeech clean test corpustrain_batch_size
12dev_batch_size
8test_batch_size
8epoch
13learning_rate
0.0001display_step
0validation_step
1dropout_rate
0.2367default_stddev
0.046875checkpoint_step
1log_level
0checkpoint_dir
value specific to hardware setupwer_log_pattern
"GLOBAL LOG: logwer('${COMPUTE_ID}', '%s', '%s', %f)"decoder_library_path
value specific to hardware setupn_hidden
2048
Bindings
This release also includes a Python based command line tool deepspeech
, installed through
pip install deepspeech
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech-gpu
Also, it exposes bindings for the following languages
- Python (Versions 2.7, 3.4, 3.5, and 3.6) installed via
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech
pip install deepspeech-gpu
- NodeJS (Versions 4.x, 5.x, 6.x, 7.x, 8.x and 9.x) installed via
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
npm install deepspeech
npm install deepspeech-gpu
- C++ which requires the appropriate shared objects are installed from
native_client.tar.xz
(See the section in the main README which describesnative_client.tar.xz
installation.)
In addition there are third party bindings that are supported by external developers, for example
- Rust which is installed by following the instructions on the external Rust repo.
Supported Platforms
- OS X 10.10, 10.11, 10.12 and 10.13
- Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)
- Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
- Raspbian Jessie on Raspberry Pi 3
Contact/Getting Help
- FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
- Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
- IRC - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the
#machinelearning
channel on Mozilla IRC; people there can try to answer/help - Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.
Contributors to 0.1.1 release
v0.2.0-alpha.9
Bump to v0.2.0-alpha.9
v0.2.0-alpha.8
Merge pull request #1470 from lissyx/alpha-8 Bump to version 0.2.0-alpha.8
v0.2.0-alpha.7
Merge pull request #1449 from lissyx/bump-0.2.0-a7 Bump to version 0.2.0-alpha.7
v0.2.0-prod-ctcdecode
Merge pull request #1 from mozilla/master Update fork
v0.2.0-prod
Merge pull request #1 from mozilla/master Update fork
v0.0.1-alpha
Merge pull request #1 from mozilla/master Update fork
v0.2.0-alpha.6
Merge pull request #1396 from lissyx/python-37+node-10 Add NodeJS v10.x
v0.2.0-alpha.5
Merge pull request #1379 from lissyx/fix-markdown Fix markdown
v0.2.0-alpha.4
Merge pull request #1377 from lissyx/packages-readme Improve packages docs references and links