Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode support #94

Open
slbinilkumar opened this issue Jul 31, 2017 · 6 comments
Open

Unicode support #94

slbinilkumar opened this issue Jul 31, 2017 · 6 comments

Comments

@slbinilkumar
Copy link

Hi,
What are the modifications had to be done for Unicode support . I need to do it for Indian languages.

@stephenvxx
Copy link

Change dictionary to Indian langugages, modify DeepSpeechModel.lua,
fullyConnected:add(nn.Linear(rnnHiddenSize, dict_size))

Change dict_size to the length of dictionary, example the length of dictionary_english : 29

@slbinilkumar
Copy link
Author

slbinilkumar commented Aug 24, 2017 via email

@slbinilkumar
Copy link
Author

slbinilkumar commented Aug 24, 2017 via email

@SeanNaren
Copy link
Owner

If you could open a PR with those changes that would be awesome :)

@stephenvxx
Copy link

stephenvxx commented Aug 28, 2017

@slbinilkumar Dont worry, use Lua UTF-8 library instead of string library.
*use utf8.lower instead of string.lower
*In for loop line 29, change to : for _, c in utf8.codes(line) do
local character = utf8.char(c)
table.insert(label, self.alphabet2token[character])
end
Please install utf-8 library
https://github.com/starwing/luautf8
*Make sure your dictionary don't copy from the Internet or other. Should self-writing Indian Language.

@slbinilkumar
Copy link
Author

slbinilkumar commented Aug 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants