- Spoken language identification (LId) out of the box using TensorFlow.
- Models implemented with
tf.keras
. - Metadata handling with pandas
DataFrames
. - High-performance, parallel preprocessing pipelines with
tf.data
- Simple spectral and cepstral feature extraction on the GPU with
tf.signal
. - Average detection cost (
C_avg
) implemented as atf.keras.metrics.Metric
subclass. - Angular proximity loss implemented as a
tf.keras.losses.Loss
subclass.
- You need a simple, deep learning based speech classification pipeline. For example: waveform -> VAD filter -> augment audio data -> serialize all data to a single binary file -> extract log-scale Mel-spectra or MFCC -> use DNN/CNN/LSTM/GRU/attention (etc.) to classify by signal labels
- You want to train a language vector/embedding extractor model (e.g. x-vector) on large amounts of data.
- You have a TensorFlow/Keras model that you train on the GPU and want the
tf.data.Dataset
extraction pipeline to also be on the GPU - You want an end-to-end pipeline that uses TensorFlow 2 as much as possible
- You are happy doing everything with Kaldi or some other toolkits
- You don't want to debug by reading the source code when something goes wrong
- You don't want to install TensorFlow 2 and configure its dependencies (CUDA etc.)
- You want to train phoneme recognizers or use CTC
Python 3.7 or 3.8 is required.
python3 -m pip install https://github.com/py-lidbox/lidbox/archive/master.zip
python3 -m pip install 'lidbox==1.0.0rc0'
TensorFlow 2 is not included in the package requirements because you might want to do custom configuration to get the GPU working etc.
If you don't want to customize anything and instead prefer something that just works for now, the following should be enough:
python3 -m pip install tensorflow
If you plan on making changes to the code, it is easier to install lidbox
as a Python package in setuptools develop mode:
git clone --depth 1 https://github.com/py-lidbox/lidbox.git
python3 -m pip install --editable ./lidbox
Then, if you make changes to the code, there's no need to reinstall the package since the changes are reflected immediately.
Just be careful not to make changes when lidbox
is running, because TensorFlow will use its autograph
package to convert some of the Python functions to TF graphs, which might fail if the code changes suddenly.
@inproceedings{Lindgren2020,
author={Matias Lindgren and Tommi Jauhiainen and Mikko Kurimo},
title={{Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets}},
year=2020,
booktitle={Proc. Interspeech 2020},
pages={467--471},
doi={10.21437/Interspeech.2020-2706},
url={http://dx.doi.org/10.21437/Interspeech.2020-2706}
}