adding a new language #128

tculjaga · 2024-09-01T15:39:49Z

Hi, is it possible to add support for a new language as slovenian/croatian/serbian ?

Do you have any procedure we can follow to train the model for that languages ?

ylacombe · 2024-09-02T11:06:36Z

Hey @tculjaga, it's possible but there's no guide yet! I don't have the bandwidth for the next few weeks though

Oyemade · 2024-09-03T16:26:24Z

Would you mind sharing some pointers? I don't mind taking a stab at it. I just successfully fine-tuned MMS for some Western African languages and hoping to build off of that.

tculjaga · 2024-09-03T17:36:14Z

Hi @ylacombe thanks for a quick response. I'd like to give it a try... all we need is a small and short guide (a bullet list is enough for start) how you do it on a specific language then we can try our best to battle through it :)

SherryS997 · 2024-09-12T07:56:21Z

Would you mind sharing some pointers? I don't mind taking a stab at it. I just successfully fine-tuned MMS for some Western African languages and hoping to build off of that.

Hello, we're actually trying to train parler-tts for Indian languages and our pilot test gave great results. We followed exactly the same training process, and only changed to text encoder to mT5. It worked well!

ylacombe · 2024-09-12T08:05:34Z

Hey @SherryS997,
thanks for giving this feedback. I actually trained a version of Mini v1 which uses mT5 instead of T5, but it's only in English.
Do you think you'd be able to either open-source your model, or to share some snippet?
Thanks!

SherryS997 · 2024-09-18T15:49:20Z

Hi @ylacombe,
Yes, we plan to open-source our model, along with the dataset, code, and captions, in the next month or two once we’re fully satisfied with the results. This work is part of the TTS research at AI4Bharat, and the model will be designed to support all 23 official languages of India, including English. We're also experimenting with various English accents and are optimistic that the model will handle those effectively as well.
In our pilot training, we used mT5, which delivered excellent results. However, we are now experimenting with other tokenizers, as mT5 supports only 12 of the 23 languages we’re targeting. These alternatives will help us achieve broader language coverage for the project.

ylacombe · 2024-09-19T09:23:05Z

Hey @SherryS997, let's speak over mail if that's okay with you: yoach [at] huggingface.co

showgan · 2024-09-20T07:35:53Z

Hi @ylacombe and @SherryS997,
I'm also very interested in training for a low resource language which is not supported by any tokenizer. I'd really appreciate it if you could share with me some advice as well.
Thanks!

Strive-for-excellence · 2024-09-24T07:54:45Z

Hi @ylacombe, Yes, we plan to open-source our model, along with the dataset, code, and captions, in the next month or two once we’re fully satisfied with the results. This work is part of the TTS research at AI4Bharat, and the model will be designed to support all 23 official languages of India, including English. We're also experimenting with various English accents and are optimistic that the model will handle those effectively as well. In our pilot training, we used mT5, which delivered excellent results. However, we are now experimenting with other tokenizers, as mT5 supports only 12 of the 23 languages we’re targeting. These alternatives will help us achieve broader language coverage for the project.

Great job. Looking forward to your open-source release.

SherryS997 · 2024-09-25T07:51:54Z

Hi @ylacombe and @SherryS997, I'm also very interested in training for a low resource language which is not supported by any tokenizer. I'd really appreciate it if you could share with me some advice as well. Thanks!

You may need to build a tokenizer or better yet extend the flanT5 tokenizer. Then all you need to do is point to this tokenzer is the start json config.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding a new language #128

adding a new language #128

tculjaga commented Sep 1, 2024

ylacombe commented Sep 2, 2024

Oyemade commented Sep 3, 2024

tculjaga commented Sep 3, 2024

SherryS997 commented Sep 12, 2024

ylacombe commented Sep 12, 2024

SherryS997 commented Sep 18, 2024 •

edited

Loading

ylacombe commented Sep 19, 2024

showgan commented Sep 20, 2024

Strive-for-excellence commented Sep 24, 2024

SherryS997 commented Sep 25, 2024

adding a new language #128

adding a new language #128

Comments

tculjaga commented Sep 1, 2024

ylacombe commented Sep 2, 2024

Oyemade commented Sep 3, 2024

tculjaga commented Sep 3, 2024

SherryS997 commented Sep 12, 2024

ylacombe commented Sep 12, 2024

SherryS997 commented Sep 18, 2024 • edited Loading

ylacombe commented Sep 19, 2024

showgan commented Sep 20, 2024

Strive-for-excellence commented Sep 24, 2024

SherryS997 commented Sep 25, 2024

SherryS997 commented Sep 18, 2024 •

edited

Loading