Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stress on the vowel in Russian language. #791

Open
6 tasks done
AngryBearr opened this issue Dec 28, 2024 · 1 comment
Open
6 tasks done

Stress on the vowel in Russian language. #791

AngryBearr opened this issue Dec 28, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@AngryBearr
Copy link

AngryBearr commented Dec 28, 2024

Self Checks

  • I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find any relevant information that meets my needs. English 中文 日本語 Portuguese (Brazil)
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell us your story.

I love the model and your work overall, but there is an issue for the Russian language (I guess because of the lack of data model was trained). It's not an issue with the model itself, but some Russian language related problem.
There is a thing as a stress for the vowel. It can change the word in different ways if stress is on one vowel or another.

2. What is your suggested solution?

I would suggest to maybe train a model in some way with a special character before the vowel to add stress for it. In the long run someone can create a vocabulary and add words to it and have model read the vocabulary before the initialization and pronounce added words as they are in the vocabulary.
I know that this is just for one language and maybe it's too much work to be done, but maybe you can guide someone with more knowledge then me, to write the code for finetuning the LLAMA model to add that special character.
I can propose to look at some other model that was trained based on GPT-Sovits for Russian with addition of that stress symbol.
Here is a link to the repo: vosk-tts. Here is the model from HF. You put the model into the folder vosk-tts/gpt-sovitst/pretrained_models. You can also find mentioned dictionary on HF in the dictionary it contains the vocabulary with stress described in the 1 if stress is applied to a vowel and 0 if vowel should be read as is.

Hope that makes sense.

3. Additional context or comments

Maybe there is a way to add stress to vowels that I don't know. What I have tried so far to apply at least some stress to vowels in words that model interprets incorrectly:

  1. Add another vowel (eg. if it letter o then add another o).
  2. Add a grave accent symbol before a vowel (eg. `o)
  3. Add a special char that we use in Russian literature when learning the language and how to read words it is called acute accent (eg. )
  4. Use the capital letter if I need a stress for a vowel (eg. sOme wOrd)
  5. Add a duplicate of letter and add a hyphen (eg. so-ome wo-ord)

4. Can you help us with this feature?

  • I am interested in contributing to this feature.
@AngryBearr AngryBearr added the enhancement New feature or request label Dec 28, 2024
@PoTaTo-Mika
Copy link
Collaborator

Thanks for your advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants