-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language and Speaker ID mapping issue #327
Comments
Hey there, @tushar-31093! 👋 I'm here to help you with any bugs, questions, or contributions. I'm your friendly neighborhood bot, ready to assist while you wait for a human maintainer. To resolve the issue with the Language and Speaker ID mapping without hardcoding it for a specific speaker, you can dynamically iterate over the available speaker IDs for each language model. The approach involves iterating over for language, text in texts.items():
model = TTS(language=language, device=device)
speaker_ids = model.hps.data.spk2id
for speaker_key in speaker_ids.keys():
speaker_id = speaker_ids[speaker_key]
speaker_key = speaker_key.lower().replace('_', '-')
source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)
model.tts_to_file(text, speaker_id, src_path, speed=speed)
save_path = f'{output_dir}/output_v2_{speaker_key}.wav'
# Run the tone color converter
encode_message = "@MyShell"
tone_color_converter.convert(
audio_src_path=src_path,
src_se=source_se,
tgt_se=target_se,
output_path=save_path,
message=encode_message) This code dynamically loads the speaker embeddings and processes each speaker, providing flexibility in using different English speaker models [1]. |
This code exactly is the same and it doesn't work if I specify a speaker ID
…On Wed, 6 Nov, 2024, 11:28 pm dosubot[bot], ***@***.***> wrote:
Hey there, @tushar-31093 <https://github.com/tushar-31093>! 👋 I'm here
to help you with any bugs, questions, or contributions. I'm your friendly
neighborhood bot, ready to assist while you wait for a human maintainer.
To resolve the issue with the Language and Speaker ID mapping without
hardcoding it for a specific speaker, you can dynamically iterate over the
available speaker IDs for each language model. The approach involves
iterating over speaker_ids.keys() for each language model, which allows
you to process each speaker without hardcoding specific speaker IDs. Here's
a relevant code snippet that demonstrates this approach:
for language, text in texts.items():
model = TTS(language=language, device=device)
speaker_ids = model.hps.data.spk2id
for speaker_key in speaker_ids.keys():
speaker_id = speaker_ids[speaker_key]
speaker_key = speaker_key.lower().replace('_', '-')
source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)
model.tts_to_file(text, speaker_id, src_path, speed=speed)
save_path = f'{output_dir}/output_v2_{speaker_key}.wav'
# Run the tone color converter
encode_message = ***@***.***"
tone_color_converter.convert(
audio_src_path=src_path,
src_se=source_se,
tgt_se=target_se,
output_path=save_path,
message=encode_message)
This code dynamically loads the speaker embeddings and processes each
speaker, providing flexibility in using different English speaker models
[1] <https://github.com/myshell-ai/OpenVoice/blob/main/demo_part3.ipynb>.
*To continue talking to Dosu <https://dosu.dev>, mention @dosu
<https://github.com/dosu>.*
—
Reply to this email directly, view it on GitHub
<#327 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJX575FTN5AOLVOAELXZN4DZ7JKFTAVCNFSM6AAAAABRJMZ5TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRQGQZTINBVGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
If specifying a speaker ID doesn't work, there might be an issue with how the speaker IDs are being mapped or used in the code. Here are a few things to check:
If these checks don't resolve the issue, consider logging the values of |
If I specify just EN in texts and pass it as
Text = {'EN':'Hello'}
This generates audio for all English centric models. So in a way underlying
code seems to be an issue.
The place where it puts in the variable speaker_key, if I add en-india,
en-br or anything else it works but the iterator and mapping seems to be
faulty.
…On Wed, 6 Nov, 2024, 11:43 pm dosubot[bot], ***@***.***> wrote:
If specifying a speaker ID doesn't work, there might be an issue with how
the speaker IDs are being mapped or used in the code. Here are a few things
to check:
1.
*Speaker ID Mapping*: Ensure that the speaker_ids dictionary is
correctly populated and that the keys match the expected format. The keys
should be processed by converting them to lowercase and replacing
underscores with hyphens, as shown in the code snippet.
2.
*Speaker Embeddings*: Verify that the speaker embeddings are correctly
loaded from the specified path. The file path should match the processed
speaker key.
3.
*Error Handling*: Check if there are any error messages or exceptions
when specifying a speaker ID. This might give clues about what is going
wrong.
4.
*Model Configuration*: Ensure that the model configuration supports
the specified speaker ID. There might be limitations or specific
configurations required for certain speaker models.
If these checks don't resolve the issue, consider logging the values of
speaker_ids, speaker_key, and speaker_id to ensure they are as expected.
This can help identify where the process might be failing.
*To continue talking to Dosu <https://dosu.dev>, mention @dosu
<https://github.com/dosu>.*
—
Reply to this email directly, view it on GitHub
<#327 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJX575EFUNVMTYY76MH26UTZ7JL3HAVCNFSM6AAAAABRJMZ5TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRQGQ3DCMZYG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It seems like the issue might be related to how the speaker keys are being processed and mapped in the code. Here are a few steps to help troubleshoot and potentially resolve the issue:
By following these steps, you should be able to identify where the mapping might be failing and make the necessary adjustments to support additional English variants. |
I simply wanted to try another speaker model. Made the setting the way it should have worked but somehow under the hood it seems broken.
If we leave it to EN-NEWEST and other langauges it works fine. If we weak it to work with other english models, it fails to do so. I can hardcode it to run it for a specific speaker but that ideally should not be the case.
Any suggestions or I simply hardcode it.
The text was updated successfully, but these errors were encountered: