Skip to content

Commit

Permalink
Merge branch 'release/3.2.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
hbredin committed May 8, 2024
2 parents 2a72067 + bb4dd2e commit 70a8507
Show file tree
Hide file tree
Showing 89 changed files with 6,018 additions and 4,135 deletions.
56 changes: 56 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
name: Bug report
description: Report a bug in pyannote.audio
body:

- type: markdown
attributes:
value: |
When reporting bugs, please follow the guidelines in this template. This helps identify the problem precisely and thus enables contributors to fix it faster.
- Write a descriptive issue title above.
- The golden rule is to **always open *one* issue for *one* bug**. If you notice several bugs and want to report them, make sure to create one new issue for each of them.
- Search [open](https://github.com/pyannote/pyannote-audio/issues) and [closed](https://github.com/pyannote/pyannote-audio/issues?q=is%3Aissue+is%3Aclosed) issues to ensure it has not already been reported. If you don't find a relevant match or if you're unsure, don't hesitate to **open a new issue**. The bugsquad will handle it from there if it's a duplicate.
- Please always check if your issue is reproducible in the latest version – it may already have been fixed!
- If you use a custom build, please test if your issue is reproducible in official releases too.
- type: textarea
attributes:
label: Tested versions
description: |
To properly fix a bug, we need to identify if the bug was recently introduced in the engine, or if it was always present.
- Please specify the pyannote.audio version you found the issue in, including the **Git commit hash** if using a development build.
- If you can, **please test earlier pyannote.audio versions** and, if applicable, newer versions (development branch). Mention whether the bug is reproducible or not in the versions you tested.
- The aim is for us to identify whether a bug is a **regression**, i.e. an issue that didn't exist in a previous version, but was introduced later on, breaking existing functionality. For example, if a bug is reproducible in 3.2 but not in 3.0, we would like you to test intermediate 3.1 to find which version is the first one where the issue can be reproduced.
placeholder: |
- Reproducible in: 3.1, 3.2, and later
- Not reproducible in: 3.0
validations:
required: true

- type: input
attributes:
label: System information
description: |
- Specify the OS version, and when relevant hardware information.
- For issues that are likely OS-specific and/or GPU-related, please specify the GPU model and architecture.
- **Bug reports not including the required information may be closed at the maintainers' discretion.** If in doubt, always include all the requested information; it's better to include too much information than not enough information.
placeholder: macOS 13.6 - pyannote.audio 3.1.1 - M1 Pro
validations:
required: true

- type: textarea
attributes:
label: Issue description
description: |
Describe your issue briefly. What doesn't work, and how do you expect it to work instead?
You can include audio, images or videos with drag and drop, and format code blocks or logs with <code>```</code> tags.
validations:
required: true

- type: input
attributes:
label: Minimal reproduction example (MRE)
description: |
Having reproducible issues is a prerequisite for contributors to be able to solve them.
Include a link to minimal reproduction example using [this Google Colab notebook](https://colab.research.google.com/github/pyannote/pyannote-audio/blob/develop/tutorials/MRE_template.ipynb) as a starting point.
validations:
required: true
15 changes: 15 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
blank_issues_enabled: false

contact_links:

- name: Feature request
url: https://github.com/pyannote/pyannote-audio/discussions
about: Suggest an idea for this project.

- name: Consulting
url: https://herve.niderb.fr/consulting
about: Using pyannote.audio in production? Make the most of it thanks to our consulting services.

- name: Premium models
url: https://forms.office.com/e/GdqwVgkZ5C
about: We are considering selling premium models, extensions, or services around pyannote.audio.
20 changes: 0 additions & 20 deletions .github/ISSUE_TEMPLATE/feature_request.md

This file was deleted.

29 changes: 0 additions & 29 deletions .github/workflows/new_issue.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ jobs:
pip install -e .[dev,testing]
- name: Test with pytest
run: |
pytest
pytest -k "not test_cli.py"
33 changes: 33 additions & 0 deletions .github/workflows/test_cli.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: CLI tests

on:
push:
branches: [develop]
pull_request:
branches: [develop]

jobs:
build:
timeout-minutes: 20
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: ["3.10"]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install libsndfile
if: matrix.os == 'ubuntu-latest'
run: |
sudo apt-get update
sudo apt-get install libsndfile1
- name: Install pyannote.audio
run: |
pip install -e .[dev,testing,cli]
- name: Test with pytest
run: |
pytest tests/test_cli.py
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ repos:

# Sort imports
- repo: https://github.com/PyCQA/isort
rev: 5.10.1
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black"]
Expand Down
35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,40 @@
# Changelog

## Version 3.2.0 (2024-05-08)

### New features

- feat(task): add option to cache task training metadata to speed up training (with [@clement-pages](https://github.com/clement-pages/))
- feat(model): add `receptive_field`, `num_frames` and `dimension` to models (with [@Bilal-Rahou](https://github.com/Bilal-Rahou))
- feat(model): add `fbank_only` property to `WeSpeaker` models
- feat(util): add `Powerset.permutation_mapping` to help with permutation in powerset space (with [@FrenchKrab](https://github.com/FrenchKrab))
- feat(sample): add sample file at `pyannote.audio.sample.SAMPLE_FILE`
- feat(metric): add `reduce` option to `diarization_error_rate` metric (with [@Bilal-Rahou](https://github.com/Bilal-Rahou))
- feat(pipeline): add `Waveform` and `SampleRate` preprocessors

### Fixes

- fix(task): fix random generators and their reproducibility (with [@FrenchKrab](https://github.com/FrenchKrab))
- fix(task): fix estimation of training set size (with [@FrenchKrab](https://github.com/FrenchKrab))
- fix(hook): fix `torch.Tensor` support in `ArtifactHook`
- fix(doc): fix typo in `Powerset` docstring (with [@lukasstorck](https://github.com/lukasstorck))

### Improvements

- improve(metric): add support for number of speakers mismatch in `diarization_error_rate` metric
- improve(pipeline): track both `Model` and `nn.Module` attributes in `Pipeline.to(device)`
- improve(io): switch to `torchaudio >= 2.2.0`
- improve(doc): update tutorials (with [@clement-pages](https://github.com/clement-pages/))

## Breaking changes

- BREAKING(model): get rid of `Model.example_output` in favor of `num_frames` method, `receptive_field` property, and `dimension` property
- BREAKING(task): custom tasks need to be updated (see "Add your own task" tutorial)

## Community contributions

- community: add tutorial for offline use of `pyannote/speaker-diarization-3.1` (by [@simonottenhauskenbun](https://github.com/simonottenhauskenbun))

## Version 3.1.1 (2023-12-01)

### TL;DR
Expand Down
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
recursive-include pyannote *.py
recursive-include pyannote *.yaml
recursive-include pyannote *.wav
recursive-include pyannote *.rttm
global-exclude *.pyc
global-exclude __pycache__
32 changes: 18 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,26 +70,30 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
- Videos
- [Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min
- [Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min
- [First releaase of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
- [First release of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
- Community contributions (not maintained by the core team)
- 2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun)

## Benchmark

Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.1) v3.1 is expected to be much better (and faster) than v2.x.
Those numbers are diarization error rates (in %):

| Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.office.com/e/GdqwVgkZ5C) |
| ---------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ---------------------------------------------- |
| AISHELL-4 | 14.1 | 12.3 | 11.9 |
| AliMeeting (channel 1) | 27.4 | 24.5 | 22.5 |
| AMI (IHM) | 18.9 | 18.8 | 16.6 |
| AMI (SDM) | 27.1 | 22.6 | 20.9 |
| AVA-AVD | 66.3 | 50.0 | 39.8 |
| CALLHOME (part 2) | 31.6 | 28.4 | 22.2 |
| DIHARD 3 (full) | 26.9 | 21.4 | 17.2 |
| Ego4D (dev.) | 61.5 | 51.2 | 43.8 |
| MSDWild | 32.8 | 25.4 | 19.8 |
| REPERE (phase2) | 8.2 | 7.8 | 7.6 |
| VoxConverse (v0.3) | 11.2 | 11.2 | 9.4 |
| Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.office.com/e/GdqwVgkZ5C) |
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------ |
| [AISHELL-4](https://arxiv.org/abs/2104.03603) | 14.1 | 12.2 | 11.9 |
| [AliMeeting](https://www.openslr.org/119/) (channel 1) | 27.4 | 24.4 | 22.5 |
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.9 | 18.8 | 16.6 |
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 27.1 | 22.4 | 20.9 |
| [AVA-AVD](https://arxiv.org/abs/2111.14448) | 66.3 | 50.0 | 39.8 |
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6 | 28.4 | 22.2 |
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 26.9 | 21.7 | 17.2 |
| [Earnings21](https://github.com/revdotcom/speech-datasets) | 17.0 | 9.4 | 9.0 |
| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 61.5 | 51.2 | 43.8 |
| [MSDWild](https://github.com/X-LANCE/MSDWILD) | 32.8 | 25.3 | 19.8 |
| [RAMC](https://www.openslr.org/123/) | 22.5 | 22.2 | 18.4 |
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 8.2 | 7.8 | 7.6 |
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.3 | 9.4 |

[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)

Expand Down
10 changes: 5 additions & 5 deletions pyannote/audio/augmentation/mix.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,10 @@ def __init__(
max_snr_in_db: float = 5.0,
mode: str = "per_example",
p: float = 0.5,
p_mode: str = None,
sample_rate: int = None,
target_rate: int = None,
max_num_speakers: int = None,
p_mode: Optional[str] = None,
sample_rate: Optional[int] = None,
target_rate: Optional[int] = None,
max_num_speakers: Optional[int] = None,
output_type: str = "tensor",
):
super().__init__(
Expand All @@ -80,7 +80,7 @@ def __init__(

def randomize_parameters(
self,
samples: Tensor = None,
samples: Optional[Tensor] = None,
sample_rate: Optional[int] = None,
targets: Optional[Tensor] = None,
target_rate: Optional[int] = None,
Expand Down
16 changes: 11 additions & 5 deletions pyannote/audio/cli/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

import hydra
from omegaconf import DictConfig
from pyannote.database import FileFinder, ProtocolFile, get_protocol
from pyannote.database import FileFinder, ProtocolFile, registry
from rich.progress import Progress

from pyannote.audio import Inference, Model
Expand All @@ -41,8 +41,16 @@ def evaluate(cfg: DictConfig) -> Optional[float]:
(device,) = get_devices(needs=1)
model = Model.from_pretrained(cfg.model, device=device)

# load databases into registry if it was specified
if "registry" in cfg:
for database_yml in cfg.registry.split(","):
registry.load_database(database_yml)

# load evaluation files
protocol = get_protocol(cfg.protocol, preprocessors={"audio": FileFinder()})
protocol = registry.get_protocol(
cfg.protocol, preprocessors={"audio": FileFinder()}
)

files = list(getattr(protocol, cfg.subset)())

# load evaluation metric
Expand All @@ -53,7 +61,7 @@ def evaluate(cfg: DictConfig) -> Optional[float]:
main_task = progress.add_task(protocol.name, total=len(files))
file_task = progress.add_task("Processing", total=1.0)

def progress_hook(completed: int = None, total: int = None):
def progress_hook(completed: Optional[int] = None, total: Optional[int] = None):
progress.update(file_task, completed=completed / total)

inference = Inference(model, device=device)
Expand All @@ -65,8 +73,6 @@ def hypothesis(file: ProtocolFile):
warm_up=(warm_up, warm_up),
)

metric = DiscreteDiarizationErrorRate()

for file in files:
progress.update(file_task, description=file["uri"])
reference = file["annotation"]
Expand Down
3 changes: 2 additions & 1 deletion pyannote/audio/cli/evaluate_config/hydra/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ help:
template: |-
${hydra.help.header}
pyannote-audio-eval protocol={protocol_name}
pyannote-audio-eval registry={path_to_database.yml}
protocol={protocol_name}
subset={test | development | train}
model={path_to_pretrained_model}
warm_up={warm_up_duration_in_seconds}
Expand Down
Loading

0 comments on commit 70a8507

Please sign in to comment.