Estudar diarization, separação dos integrantes na conversa #1

MatMercer · 2024-07-06T20:26:27Z

Estudos

Temos algumas possibilidades

Planos

Usar 2 modelos

https://github.com/MahmoudAshraf97/whisper-diarization

Esse em específico, usa 2 modelos NeMo, e whisper.

https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/intro.html

Existe uma limitação: "atualmente não é possível lidar com 2 pessoas falando ao mesmo tempo, uma forma de melhorar isso é criar 2 áudios para isolar os participantes, usando outro modelo, mas isso aumenta muito o processamento"

Usar "Insanely fast whisper", que aparentemente suporta diarization

https://github.com/Vaibhavs10/insanely-fast-whisper

--diarization_model DIARIZATION_MODEL
                        Name of the pretrained model/ checkpoint to perform diarization. (default: pyannote/speaker-diarization)
--num-speakers NUM_SPEAKERS
                        Specifies the exact number of speakers present in the audio file. Useful when the exact number of participants in the conversation is known. Must be at least 1. Cannot be used together with --min-speakers or --max-speakers. (default: None)
  --min-speakers MIN_SPEAKERS
                        Sets the minimum number of speakers that the system should consider during diarization. Must be at least 1. Cannot be used together with --num-speakers. Must be less than or equal to --max-speakers if both are specified. (default: None)
  --max-speakers MAX_SPEAKERS
                        Defines the maximum number of speakers that the system should consider in diarization. Must be at least 1. Cannot be used together with --num-speakers. Must be greater than or equal to --min-speakers if both are specified. (default: None)

Separar 2 faixas de áudio, e depois mergear as legendas

Talvez existam algoritmos que permitam a separação das vozes, é um problema muito comum para quem trabalha com áudio, pré AI.

WhisperX

https://github.com/m-bain/whisperX

pyannote

https://github.com/pyannote/pyannote-audio

Error rate alto, entre 25%.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estudar diarization, separação dos integrantes na conversa #1

Estudar diarization, separação dos integrantes na conversa #1

MatMercer commented Jul 6, 2024 •

edited

Loading

Estudar diarization, separação dos integrantes na conversa #1

Estudar diarization, separação dos integrantes na conversa #1

Comments

MatMercer commented Jul 6, 2024 • edited Loading

Estudos

Planos

Usar 2 modelos

Usar "Insanely fast whisper", que aparentemente suporta diarization

Separar 2 faixas de áudio, e depois mergear as legendas

WhisperX

pyannote

MatMercer commented Jul 6, 2024 •

edited

Loading