AgglomerativeClustering not honoring num_cluster parameter #1525

olvb · 2023-11-02T15:24:11Z

I have a very easy test case on which the "pyannote/speaker-diarization-3.0" pipeline is failing: a short audio file with 2 very different voices speaking one turn each, with a 1 sec silence in between, that I pass to SpeakerDiarization.apply() with both min_speakers and max_speakers set to 2. The pipeline detects 2 speech segments but only one speaker.

Funnily enough, calling SpeakerDiarization.apply() without min_speakers and max_speakers gives the expected result (2 speech segments with 2 speakers).

I managed to narrow the issue down to the clustering stage:

import numpy as np
from pyannote.audio.pipelines.clustering import AgglomerativeClustering

clustering = AgglomerativeClustering().instantiate(
    {
        "method": "centroid",
        "min_cluster_size": 0,
        "threshold": 0.0,
    }
)

# 2 embeddings different enough
embeddings = np.asarray([[1.0, 1.0, 1.0, 1.0], [1.0, 2.0, 1.0, 2.0]])

# call without num_clusters
clusters = clustering.cluster(
    embeddings=embeddings, min_clusters=2, max_clusters=2, num_clusters=None
)
# succeeds
assert clusters.tolist() == [0, 1]

# call with num_clusters=2
clusters = clustering.cluster(
    embeddings=embeddings, min_clusters=2, max_clusters=2, num_clusters=2
)
# fails (we get [0, 0])
assert clusters.tolist() == [0, 1]

I won't pretend to understand everything that's going on in AgglomerativeClustering.cluster() but the problem seems to arise in the branch begining at https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/clustering.py#L389. Before this step, we have the expected number of clusters, but we try anyway to match the target number of clusters even though we don't need to. Changing the condition to num_clusters is not None and num_large_clusters != num_clusters does the trick here but I don't know if there is a deeper underlying issue in the algorithm.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-02T15:24:33Z

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

hbredin · 2023-11-05T15:07:47Z

Changing the condition to num_clusters is not None and num_large_clusters != num_clusters does the trick here but I don't know if there is a deeper underlying issue in the algorithm.

Thanks, I think this should do the trick. Can you contribute this change via a PR?

olvb mentioned this issue Nov 6, 2023

fix: force nb clusters to match target only when needed #1531

Merged

hbredin closed this as completed in #1531 Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgglomerativeClustering not honoring num_cluster parameter #1525

AgglomerativeClustering not honoring num_cluster parameter #1525

olvb commented Nov 2, 2023 •

edited

Loading

github-actions bot commented Nov 2, 2023

hbredin commented Nov 5, 2023

AgglomerativeClustering not honoring num_cluster parameter #1525

AgglomerativeClustering not honoring num_cluster parameter #1525

Comments

olvb commented Nov 2, 2023 • edited Loading

github-actions bot commented Nov 2, 2023

hbredin commented Nov 5, 2023

olvb commented Nov 2, 2023 •

edited

Loading