Fix instantaneous speaker count exceeding `max_speakers` or detected number of clusters #1351

flyingleafe · 2023-04-27T08:17:07Z

The number of labels in the diarization result was based on the result of speaker_count method, which did not account for max_speakers variable or the actual detected number of speakers after clusterization. This led to the appearance of additional erroneous speakers in the diarization result, so the output could have more labels than provided max_speakers or more labels than number of clusters detected by the clusterization algorithm. This PR fixes this.

Additionally, the extra binarization step was removed from speaker_count method - now it accepts segmentations in already binarized form, so no extra binarization step in the pipeline is done.

codecov · 2023-04-27T08:27:14Z

Codecov Report

Patch coverage has no change and project coverage change: -0.21 ⚠️

Comparison is base (a581536) 33.00% compared to head (e289d18) 32.79%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1351      +/-   ##
===========================================
- Coverage    33.00%   32.79%   -0.21%     
===========================================
  Files           65       65              
  Lines         4109     4135      +26     
===========================================
  Hits          1356     1356              
- Misses        2753     2779      +26

Impacted Files	Coverage Δ
pyannote/audio/pipelines/clustering.py	`0.00% <0.00%> (ø)`
pyannote/audio/pipelines/speaker_diarization.py	`0.00% <0.00%> (ø)`
pyannote/audio/pipelines/utils/diarization.py	`0.00% <0.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

hbredin · 2023-06-23T14:26:29Z

Lots is going on in this PR.

Can we just focus on fixing speaker_count to honor max_speakers constraint?

flyingleafe · 2023-07-04T07:19:34Z

@hbredin Okay, no problem, I will move the stuff regarding the k-means fallback into the separate branch.

flyingleafe · 2023-07-04T09:46:32Z

@hbredin I only left the first commit. Now the PR only does the following:

speaker_count accepts binarized segmentations (one of your TODOs which is very straightforward, binarization is not done twice now)
speaker_count cannot exceed max_speakers or number of centroids.

The other stuff (falling back to k-means when no good cluster assignment is found by agglomerative clustering) is indeed another part of the problem, I can make a separate PR for this in separate branch arguing why I think it is necessary later.

hbredin · 2023-07-04T13:35:29Z

SpeakerDiarizationMixin.speaker_count is also used in Resegmentation pipeline, which is therefore broken by this PR.

Two options:

update Resegmentation pipeline to also use this new API
narrow down this PR to only fix the max_speakers thing.

flyingleafe · 2023-07-05T07:45:16Z

@hbredin good catch, went with the option 1.

BTW, I am willing to provide some basic tests for applicability of basic pipelines to the test suite in some separate PR. Currently, the test coverage of the entire pyannote.audio.pipelines module is 0%, which is why such silly bugs are not caught immediately by CI.

hbredin · 2023-07-06T10:59:37Z

pyannote/audio/pipelines/speaker_diarization.py

+        # quick sanity check
+        assert (
+            num_different_speakers >= min_speakers
+            and num_different_speakers <= max_speakers
+        )


Does that assert ever fail?

If it does, we should handle it or this will raise an error.

@hbredin Good catch. With the removal of the k-means fallback which used to be included in this PR (and which I use in my application anyway), this can definitely fail, and surely will in some circumstances due to the min_cluster_size parameter. The best we can do here I think is to issue a warning.

hbredin · 2023-07-06T11:14:19Z

pyannote/audio/pipelines/speaker_diarization.py

+        # during counting, we could possibly overcount the number of instantaneous
+        # speakers due to segmentation errors, so we cap the maximum instantaneous number
+        # of speakers by the number of detected clusters
+        count.data = np.minimum(count.data, num_different_speakers)


This needs further investigation on my side (e.g. to check that this change does not degrade the overall performance on my usual benchmarks) which might take some time...

Indeed, there might be cases where the embedding/clustering step would (wrongly) decide that there is only 1 speaker in the recording, while the segmentation step actually (correctly) detects 2 overlapping speakers for certain frames.

flyingleafe · 2023-07-14T06:36:08Z

@hbredin I removed the assert. In the meanwhile, did you have time to run benchmarks?
I can do so myself if you tell which data you use and how exactly you run those.

hbredin · 2023-07-16T09:22:36Z

@hbredin I removed the assert. In the meanwhile, did you have time to run benchmarks? I can do so myself if you tell which data you use and how exactly you run those.

I just ran a few benchmarks (with neither min_speakers nor max_speakers constraint) and this change actually slightly degrades the performance (though the difference is definitely not statistically significant). More precisely, it increases missed detection rate more than it reduces false alarm rate.

That being said, I do think you have a point in the case where max_speakers is provided by the user: the pipeline does need to honor this constraint. Can you please update the PR to only cap count.data when it goes above max_speakers?

Also, I think having the change in both Pipeline.speaker_count() and Pipeline.apply() is a bit redundant. You should remove it from Pipeline.speaker_count() and only do the change just before Pipeline.reconstruct().

flyingleafe · 2023-08-08T09:48:07Z

@hbredin Sorry for taking too long to get back to this. Your last comment is accounted for.

hbredin

A few comments and a question.

pyannote/audio/pipelines/speaker_diarization.py

…eline

…s mismatch

flyingleafe · 2023-11-14T08:37:40Z

@hbredin Sorry for such a large delay. Primary work was really heavy last couple of months. I have considered your last comments.

hbredin · 2023-11-16T10:30:00Z

🎉 Merged! Thanks @flyingleafe!

flyingleafe force-pushed the fix-max-speakers-count branch 4 times, most recently from 6e273e7 to 3124b28 Compare May 9, 2023 11:04

flyingleafe mentioned this pull request May 11, 2023

Allow the user to extract speaker embeddings along with the diarization #1346

Merged

flyingleafe force-pushed the fix-max-speakers-count branch from 3124b28 to e289d18 Compare May 18, 2023 05:20

flyingleafe force-pushed the fix-max-speakers-count branch from e289d18 to ddc19d5 Compare June 2, 2023 06:36

flyingleafe force-pushed the fix-max-speakers-count branch from ddc19d5 to 65f4a88 Compare June 15, 2023 09:31

flyingleafe force-pushed the fix-max-speakers-count branch 2 times, most recently from 6b118f9 to 6391cd6 Compare July 4, 2023 09:42

hbredin reviewed Jul 6, 2023

View reviewed changes

flyingleafe force-pushed the fix-max-speakers-count branch from 680438f to 602d702 Compare July 14, 2023 06:34

flyingleafe force-pushed the fix-max-speakers-count branch from 602d702 to d49f5fe Compare August 8, 2023 09:46

hbredin reviewed Sep 4, 2023

View reviewed changes

pyannote/audio/pipelines/speaker_diarization.py Outdated Show resolved Hide resolved

pyannote/audio/pipelines/speaker_diarization.py Outdated Show resolved Hide resolved

pyannote/audio/pipelines/speaker_diarization.py Show resolved Hide resolved

hbredin force-pushed the develop branch from e487e0e to b9548a7 Compare September 20, 2023 15:43

flyingleafe added 5 commits November 14, 2023 07:34

Fix instantaneous speaker numbers overcounting in the diarization pip…

629fdd2

…eline

Fix resegmentation pipeline

3c503ec

Issue warning instead of an assert

e561a3b

Constraint the speaker counting only with max_speakers

0cfa92e

Fix typing of count.data and possible error due to number of centroid…

47a3ec7

…s mismatch

Fix a couple of elusive bugs

0d72a4e

flyingleafe force-pushed the fix-max-speakers-count branch from 44568f8 to 0d72a4e Compare November 14, 2023 07:36

Small suggestions

05a333c

flyingleafe force-pushed the fix-max-speakers-count branch from d1be399 to 05a333c Compare November 14, 2023 08:21

hbredin added 4 commits November 16, 2023 09:35

Merge branch 'develop' into fix-max-speakers-count

68db787

Merge branch 'develop' into fix-max-speakers-count

3d42da5

Merge branch 'develop' into fix-max-speakers-count

7d80bd2

doc: update changelog

c1254c4

hbredin merged commit bbc8044 into pyannote:develop Nov 16, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix instantaneous speaker count exceeding `max_speakers` or detected number of clusters #1351

Fix instantaneous speaker count exceeding `max_speakers` or detected number of clusters #1351

flyingleafe commented Apr 27, 2023 •

edited

Loading

codecov bot commented Apr 27, 2023 •

edited

Loading

hbredin commented Jun 23, 2023 •

edited

Loading

flyingleafe commented Jul 4, 2023

flyingleafe commented Jul 4, 2023

hbredin commented Jul 4, 2023

flyingleafe commented Jul 5, 2023

hbredin Jul 6, 2023

flyingleafe Jul 6, 2023

hbredin Jul 6, 2023 •

edited

Loading

flyingleafe commented Jul 14, 2023

hbredin commented Jul 16, 2023

flyingleafe commented Aug 8, 2023

hbredin left a comment

flyingleafe commented Nov 14, 2023

hbredin commented Nov 16, 2023

Fix instantaneous speaker count exceeding max_speakers or detected number of clusters #1351

Fix instantaneous speaker count exceeding max_speakers or detected number of clusters #1351

Conversation

flyingleafe commented Apr 27, 2023 • edited Loading

codecov bot commented Apr 27, 2023 • edited Loading

Codecov Report

hbredin commented Jun 23, 2023 • edited Loading

flyingleafe commented Jul 4, 2023

flyingleafe commented Jul 4, 2023

hbredin commented Jul 4, 2023

flyingleafe commented Jul 5, 2023

hbredin Jul 6, 2023

Choose a reason for hiding this comment

flyingleafe Jul 6, 2023

Choose a reason for hiding this comment

hbredin Jul 6, 2023 • edited Loading

Choose a reason for hiding this comment

flyingleafe commented Jul 14, 2023

hbredin commented Jul 16, 2023

flyingleafe commented Aug 8, 2023

hbredin left a comment

Choose a reason for hiding this comment

flyingleafe commented Nov 14, 2023

hbredin commented Nov 16, 2023

Fix instantaneous speaker count exceeding `max_speakers` or detected number of clusters #1351

Fix instantaneous speaker count exceeding `max_speakers` or detected number of clusters #1351

flyingleafe commented Apr 27, 2023 •

edited

Loading

codecov bot commented Apr 27, 2023 •

edited

Loading

hbredin commented Jun 23, 2023 •

edited

Loading

hbredin Jul 6, 2023 •

edited

Loading