Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different decode results when decode batch_size=1 and >1 #21

Open
xinq2016 opened this issue Mar 13, 2017 · 3 comments
Open

Different decode results when decode batch_size=1 and >1 #21

xinq2016 opened this issue Mar 13, 2017 · 3 comments

Comments

@xinq2016
Copy link

Found decode results difference with same utt when decode batch size =1 and batch_size=16

When decode batch size =1, the argmax of output of the network likes this:

blank C C blank A B B Z D blank A blank blank blank T T blank

using the arg_max, the result will be: cabzdat

but the ground truth is: cat

While when I use batch_size=16 to decode the same utt(there are more than 2 utts in the test json), then the result will be just "cat".

Why would it happen?

Many thanks
Xin.q.

@srvinay
Copy link
Collaborator

srvinay commented Apr 4, 2017

This may be due to the batch-normalization layers. Could you retrain another network without batch-normalization? keras is now 2.0 and doesn't support the mode flag anymore. You could also try upgrading to that, this tutorial is quite old.

@xf4fresh
Copy link

xf4fresh commented May 8, 2017

@srvinay @xinq2016
I encountered the same problem. When training the model, the parameter mb_size (mini-batch size) defaults to 16, but during test, the prediction results will be different if mb_size is modified to other values, such as 1, 8.

I thought that setting the value of mode 0 would solve the problem. Experiments show that this does not work.

        mode: integer, 0, 1 or 2.
            - 0: feature-wise normalization.
                Each feature map in the input will
                be normalized separately. The axis on which
                to normalize is specified by the `axis` argument.
                Note that if the input is a 4D image tensor
                using Theano conventions (samples, channels, rows, cols)
                then you should set `axis` to `1` to normalize along
                the channels axis.
                During training we use per-batch statistics to normalize
                the data, and during testing we use running averages
                computed during the training phase.
            - 1: sample-wise normalization. This mode assumes a 2D input.
            - 2: feature-wise normalization, like mode 0, but
                using per-batch statistics to normalize the data during both
                testing and training.

The version of keras used is 1.1.2. If upgrade the keras to 2.0, how do i modify the code? I would be very grateful if the code snippet can be given. Now I do not know which code in the project needs to be modified if keras is upgraded.

@reith
Copy link

reith commented Aug 20, 2017

@xf4fresh Beside dropping mode on v2, have you tried setting learning phase to False druing test?
https://github.com/baidu-research/ba-dls-deepspeech/blob/master/visualize.py#L40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants