Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am not getting any text for decoding #32

Open
viju2008 opened this issue Dec 17, 2017 · 14 comments
Open

I am not getting any text for decoding #32

viju2008 opened this issue Dec 17, 2017 · 14 comments

Comments

@viju2008
Copy link

I have followed the steps given

However i always get the following output from the asr server

{"status":"ok","data":[{"confidence":0.862751,"text":""}],"interrupted":"endofspeech","time":1080}

Please guide on how to check the asr logs

@viju2008
Copy link
Author

Sometimes i get only text as NO

@mikenewman1
Copy link

I think you might be seeing the same problem that I posted about in #31
If I switch in a model that I built in January, the recognition is great. With the latest Kaldi I get nothing but [NOISE] tokens
I posted a question to Kaldi-help
https://groups.google.com/forum/#!topic/kaldi-help/1N4aVb75IdU
but DP did not have any ideas

@mikenewman1
Copy link

I found the problem. In order to run with the latest (batchnorm) models you need to add a line after loading

    {
      bool binary;
      kaldi::Input ki(nnet3_rxfilename_, &binary);
      trans_model_->Read(ki.Stream(), binary);
      nnet_->Read(ki.Stream(), binary);

      // This is the crucial line
      SetBatchnormTestMode(true, &(nnet_->GetNnet()));
}

Note that this only affects newer models (built using Kaldi source from after about March 2017)
For full compatability with the latest Kaldi, these two are probably a good idea as well:

      SetDropoutTestMode(true, &(nnet_->GetNnet()));
      kaldi::nnet3::CollapseModel(kaldi::nnet3::CollapseModelConfig(), &(nnet_->GetNnet()));

This is shamelessly lifted from (eg) kaldi/src/online2bin/online2-wav-nnet3-latgen-faster.cc

@formigone
Copy link

I put some details on this same issue on #37 for what helped me get over this "issue."

@dpny518
Copy link

dpny518 commented Oct 23, 2018

in which file do we add this line
SetBatchnormTestMode(true, &(nnet_->GetNnet()));

@mikenewman1
Copy link

In Nnet3LatgenFasterDecoder.cc

(in the function Nnet3LatgenFasterDecoder::Initialize)

@hc038
Copy link

hc038 commented Nov 11, 2020

@viju2008 I am in the same situation now. did you solve the problem?

@mikenewman1
Copy link

See the posts above. The code needed updating to support batchnorm. After this fix everything worked fine. Note however that I haven't used this code in years so it may be broken again.

@mikenewman1
Copy link

mikenewman1 commented Nov 11, 2020 via email

@hc038
Copy link

hc038 commented Nov 12, 2020

I am trying to do with the system mic(Recognition using web browser), does it automatically convert to 16000hz audio format?

@realill
Copy link
Contributor

realill commented Nov 12, 2020

Javascript code downsamples browser input to 16000 https://github.com/dialogflow/asr-server/blob/master/asr-html/res/recorderWorker.js#L70

@hc038
Copy link

hc038 commented Nov 13, 2020

thanks Ilya.

@hc038
Copy link

hc038 commented Nov 13, 2020

This server is working fine with "curl" command but with "system mic(Recognition using web browser)" I only get this
image
any suggestions?

@realill
Copy link
Contributor

realill commented Nov 13, 2020

  • Ensure browser records data correctly, I believe there is a way to import recorded stream.
  • Use Chrome Developer Console to debug javascript.
  • I do not remember if javascript client uses multi-part to send data to server, but this maybe a difference between curl and javascript.
  • You can emulate multi-part data sending with curl as well and see if it works.

By the end of the day if curl works you can write your own code to emulate what it does. But without multi-part you wont be able to productionize it very well. Multi-part allows to do "online" decoding where stream is decoded as you speak. So you better figure it out. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants