Skip to content

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding

License

Notifications You must be signed in to change notification settings

ICTatRTI/pyannote-audio

 
 

Repository files navigation

pyannote-audio | neural building blocks for speaker diarization

Open In Colab

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines:

pyannote.audio also comes with pretrained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding:

segmentation

Installation

pyannote.audio only supports Python 3.7 (or later) on Linux and macOS. It might work on Windows but there is no garantee that it does, nor any plan to add official support for Windows.

The instructions below assume that pytorch has been installed using the instructions from https://pytorch.org.

Until a proper release of pyannote.audio is available on PyPI, it must be installed from source:

$ git clone https://github.com/pyannote/pyannote-audio.git
$ cd pyannote-audio
$ git checkout voice_type_classifier
$ conda env create -f env.yml   # This will create a conda environment called `pyannote`
$ conda activate pyannote       # You must activate this environment each time you want to run a pyannote command

Documentation

Part of the API is described in this tutorial.

Documentation is a work in progress and is scheduled to be ready by end of April 2020.

Tutorials

Citation

If you use pyannote.audio please use the following citation

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Address = {Barcelona, Spain},
  Month = {May},
  Year = {2020},
}

About

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 80.0%
  • Python 19.3%
  • Shell 0.7%