Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug - Offline use of "speechbrain/spkrec-ecapa-voxceleb " does not work #1427

Closed
asusdisciple opened this issue Jul 5, 2023 · 4 comments
Closed
Labels

Comments

@asusdisciple
Copy link

asusdisciple commented Jul 5, 2023

I am trying to use the speaker-diarzation pipeline offline. The problem occurs when I try to load the model for speaker
embeddings. I found out that the problem is, that the model is from Speechbrain (https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb/tree/main) which is also used in the pyannote speaker-diarization config.yaml on huggingface:

pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: speechbrain/spkrec-ecapa-voxceleb
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    segmentation: pyannote/[email protected]
    segmentation_batch_size: 32

I looked into the. pkl file and it seems like pyannote tags the models with a string which is not found inside the speechbrain models when pyannote tries to extract the module name with module_name: str = loaded_checkpoint["pyannote.audio"]["architecture"]["module"] .
So if I use a pyannote speaker embedding model everything works fine (tested it) but if I try to run the speaker-diarization pipeline offline with the aforementioned speechbrain model it will not work.

Maybe you have an idea for a workaround? The error related to this issue is:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[164], line 9
      6 audio = Audio(sample_rate=16000, mono="downmix")
      8 #emb_model = Model.from_pretrained("config/config_pyannote.yaml")
----> 9 pipeline = Pipeline.from_pretrained("config/config_pyannote.yaml")
     11 # Diarization to Annotation object
     12 diarization = pipeline(file)

File ~/PycharmProjects/envs/diary/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py:126, in Pipeline.from_pretrained(cls, checkpoint_path, hparams_file, use_auth_token, cache_dir)
    124 params = config["pipeline"].get("params", {})
    125 params.setdefault("use_auth_token", use_auth_token)
--> 126 pipeline = Klass(**params)
    128 # freeze  parameters
    129 if "freeze" in config:

File ~/PycharmProjects/envs/diary/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py:163, in SpeakerDiarization.__init__(self, segmentation, segmentation_duration, segmentation_step, embedding, embedding_exclude_overlap, clustering, embedding_batch_size, segmentation_batch_size, der_variant, use_auth_token)
    160     metric = "not_applicable"
    162 else:
--> 163     self._embedding = PretrainedSpeakerEmbedding(
    164         self.embedding, device=emb_device, use_auth_token=use_auth_token
    165     )
    166     self._audio = Audio(sample_rate=self._embedding.sample_rate, mono=True)
    167     metric = self._embedding.metric

File ~/PycharmProjects/envs/diary/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py:471, in PretrainedSpeakerEmbedding(embedding, device, use_auth_token)
    468     return NeMoPretrainedSpeakerEmbedding(embedding, device=device)
    470 else:
--> 471     return PyannoteAudioPretrainedSpeakerEmbedding(
    472         embedding, device=device, use_auth_token=use_auth_token
    473     )

File ~/PycharmProjects/envs/diary/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py:391, in PyannoteAudioPretrainedSpeakerEmbedding.__init__(self, embedding, device, use_auth_token)
    388 self.embedding = embedding
    389 self.device = device
--> 391 self.model_: Model = get_model(self.embedding, use_auth_token=use_auth_token)
    392 self.model_.eval()
    393 self.model_.to(self.device)

File ~/PycharmProjects/envs/diary/lib/python3.10/site-packages/pyannote/audio/pipelines/utils/getter.py:75, in get_model(model, use_auth_token)
     72     pass
     74 elif isinstance(model, Text):
---> 75     model = Model.from_pretrained(
     76         model, use_auth_token=use_auth_token, strict=False
     77     )
     79 elif isinstance(model, Mapping):
     80     model.setdefault("use_auth_token", use_auth_token)

File ~/PycharmProjects/envs/diary/lib/python3.10/site-packages/pyannote/audio/core/model.py:853, in Model.from_pretrained(cls, checkpoint, map_location, hparams_file, strict, use_auth_token, cache_dir, **kwargs)
    851 # obtain model class from the checkpoint
    852 loaded_checkpoint = pl_load(path_for_pl, map_location=map_location)
--> 853 module_name: str = loaded_checkpoint["pyannote.audio"]["architecture"]["module"]
    854 module = import_module(module_name)
    855 class_name: str = loaded_checkpoint["pyannote.audio"]["architecture"]["class"]

KeyError: 'pyannote.audio'

My local .yaml file looks like this:

pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: models/emb_pya.ckpt
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    segmentation: models/seg_pya.bin
    segmentation_batch_size: 32

params:
  clustering:
    method: centroid
    min_cluster_size: 15
    threshold: 0.7153814381597874
  segmentation:
    min_duration_off: 0.5817029604921046
    threshold: 0.4442333667381752

I call the pipeline with

pipeline = Pipeline.from_pretrained("config/config_pyannote.yaml")

All the names are correct and all models were downloaded from huggingface. Do you have any ideas why this could happen?

@github-actions
Copy link

github-actions bot commented Jul 5, 2023

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

  • installation
  • data preparation
  • model download
  • etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

We also offer paid scientific consulting services around speaker diarization (and speech processing in general).

This is an automated reply, generated by FAQtory

@asusdisciple asusdisciple changed the title Bug - Open model locally with config.yaml file Bug - Can not load speechbrain model offline, because of tags by pyannote Jul 6, 2023
@asusdisciple asusdisciple changed the title Bug - Can not load speechbrain model offline, because of tags by pyannote Bug - Offline use of "speechbrain/spkrec-ecapa-voxceleb " does not work Jul 6, 2023
@haiderasad
Copy link

@asusdisciple any luck ?having the same error

@haiderasad
Copy link

found the solution at #1294, basically for embedding path there has to be "speechbrain" in it like
embedding: /home/haider/Documents/speechbrain/spkrec-ecapa-voxceleb

Copy link

stale bot commented Feb 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants