Multilingual language branches #187

StuartIanNaylor · 2025-02-15T22:43:41Z

Just wondering if mixing language branches of small multi-lingual non-autoregressive models such as SenseVoice-Small reduces accuracy.
Being English my base language branch could be considered https://en.wikipedia.org/wiki/West_Germanic_languages and curious if stayed inside a singular language branch or at least only included similar say like North Germanic languages in this case than say what could be considered considerably different language branches that Mandarin, Cantonese, Japanese, and Korean may consider being part of.

Just curiosity but if English was omitted from SenseVoice-Small when trained would the accuracy for Mandarin, Cantonese, Japanese, and Korean increase or stay the same, or would it need to be purely Sino-Tibetan, wider or narrower in language branch?
Also when focussing on a language branch does this allow languages to be included that have much less resources?

Curiosity and as said was just wondering how a multi-lingual language branch model would perform...

StuartIanNaylor added the question Further information is requested label Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilingual language branches #187

Multilingual language branches #187

StuartIanNaylor commented Feb 15, 2025 •

edited

Loading

Multilingual language branches #187

Multilingual language branches #187

Comments

StuartIanNaylor commented Feb 15, 2025 • edited Loading

StuartIanNaylor commented Feb 15, 2025 •

edited

Loading