Skip to content

Commit

Permalink
update readme and bump version
Browse files Browse the repository at this point in the history
  • Loading branch information
matiaslindgren committed Jul 4, 2020
1 parent 3a9ba61 commit 9323393
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 10 deletions.
30 changes: 21 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
* Average detection cost (`C_avg`) implemented as a `tf.keras.metrics.Metric`.
* You can also try `lidbox` for speaker recognition, since no assumptions will be made of the signal labels. E.g. use utt2speaker as utt2label and see what happens.

[Here](./examples/common-voice/common-voice-4.ipynb) is an example notebook showing `lidbox` in action.
[Here](./examples/common-voice/common-voice-4.ipynb) is a full example notebook showing what `lidbox` can do.

## Why would I want to use this?

Expand All @@ -27,31 +27,43 @@

## Installing

Install TensorFlow 2.1 or 2.2 (both have been tested).

Clone the repo and install `lidbox` as a Python package (note the explicit `./`).
This will install all other required dependencies, but not TensorFlow.
```
git clone --depth 1 https://github.com/matiaslindgren/lidbox.git
pip install ./lidbox
```
Check that the command line entry point is working
Check that the command line entry point is working:
```
lidbox -h
```
If not, make sure the `setuptools` entry point scripts (e.g. directory `$HOME/.local/bin`) are on your path.

Then, install TensorFlow 2.1 or 2.2 (both should work), unless it is already installed.

If everything is working, see [this](./examples/common-voice) for a simple example to get started.

### Note
### Language embeddings

If you want to use language embeddings, install the [PLDA package](https://github.com/RaviSoji/plda) from [here](https://github.com/matiaslindgren/plda/tree/as-setuptools-package):
```
pip install plda@https://github.com/matiaslindgren/plda/archive/as-setuptools-package.zip#egg=plda-0.1.0
```

### Editable install

If you plan on making changes to the code, it is easier to install `lidbox` as a Python package in setuptools develop mode:
```
pip install --editable ./lidbox
git clone --depth 1 https://github.com/matiaslindgren/lidbox.git
pip install ./lidbox
```
Then, if you make changes to the code, there's no need to reinstall the package since the changes are reflected immediately.
Just be careful not to make changes when `lidbox` is running, because TensorFlow will use its `autograph` package to convert some of the Python functions to TF graphs, which might fail if the code changes suddenly.

### X-vector embeddings from a trained model for 4 languages
## X-vector embeddings

One benefit of deep learning classifiers is that you can first train them on large amounts of data and then use them as feature extractors to produce low-dimensional, fixed-length language vectors from speech.
See e.g. the [x-vector](http://danielpovey.com/files/2018_odyssey_xvector_lid.pdf) approach by Snyder et al.

Below is visualization of test set language embeddings for 4 languages in 2-dimensional space.
Each data point represents 2 seconds of speech in one of the 4 languages.

![2-dimensional PCA plot of 400 random x-vectors for 4 Common Voice languages](./examples/common-voice/img/embeddings-PCA-2D.png)
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setuptools.setup(
name="lidbox",
version="0.5.0",
version="0.6.0",
description="End-to-end spoken language identification (LID) on TensorFlow",
long_description=readmefile_contents,
long_description_content_type="text/markdown",
Expand Down

0 comments on commit 9323393

Please sign in to comment.