Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual language branches #187

Open
StuartIanNaylor opened this issue Feb 15, 2025 · 0 comments
Open

Multilingual language branches #187

StuartIanNaylor opened this issue Feb 15, 2025 · 0 comments
Labels
question Further information is requested

Comments

@StuartIanNaylor
Copy link

StuartIanNaylor commented Feb 15, 2025

Just wondering if mixing language branches of small multi-lingual non-autoregressive models such as SenseVoice-Small reduces accuracy.
Being English my base language branch could be considered https://en.wikipedia.org/wiki/West_Germanic_languages and curious if stayed inside a singular language branch or at least only included similar say like North Germanic languages in this case than say what could be considered considerably different language branches that Mandarin, Cantonese, Japanese, and Korean may consider being part of.

Just curiosity but if English was omitted from SenseVoice-Small when trained would the accuracy for Mandarin, Cantonese, Japanese, and Korean increase or stay the same, or would it need to be purely Sino-Tibetan, wider or narrower in language branch?
Also when focussing on a language branch does this allow languages to be included that have much less resources?

Curiosity and as said was just wondering how a multi-lingual language branch model would perform...

@StuartIanNaylor StuartIanNaylor added the question Further information is requested label Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant