Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs report and problems #10

Open
Liu0329 opened this issue Jun 29, 2016 · 7 comments
Open

Bugs report and problems #10

Liu0329 opened this issue Jun 29, 2016 · 7 comments

Comments

@Liu0329
Copy link

Liu0329 commented Jun 29, 2016

@jazzsaxmafia
Have you trained out a good model from your code ?

There might be a bug:
In the function of build_model and build_generator in model_tensorflow.py
h = o * tf.nn.tanh(new_c) should be replaced by
h = o * tf.nn.tanh(c)

Another problem is about context_encode. Is that same with the original code ?
Moreover, I think data should be shuffled for each epoch. The code seems only shuffle the data once.

@jazzsaxmafia
Copy link
Owner

Yes, you are totally right. I should've corrected my codes but I was too busy nowadays.
I will work on it as soon as possible.

Thank you for pointing the bugs out.
-Taeksoo

@RutgersHan
Copy link

RutgersHan commented Aug 11, 2016

@Liu0329
@jazzsaxmafia
Agree with Liu0329.
Is there another bug in the next line:
logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b should be replaced by
logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b
+ tf.matmul(weighted_context, self.decode_lstm_image_W)
+tf.matmul(word_emb, self.decode_lstm_word_W)

As (7) in the original papers, logits should be computed by W^T concat(weighted_context, word_emb, h) ?

@Liu0329
Copy link
Author

Liu0329 commented Aug 12, 2016

@RutgersHan Have you trained out a model that can be used ?

@RutgersHan
Copy link

@Liu0329 I did not use attention model for image captioning. I used the attention model for another task, For me, the result is not very good. It is only a little better comparing the model without attention. I am not sure whether it is due to my implementation or something else. So that's why I am asking in the previous question, do you think there is bug for the logits computation in this implementation?

@Liu0329
Copy link
Author

Liu0329 commented Aug 20, 2016

@RutgersHan Yes, according to the paper the logits have 3 parts as you showed. Now I will try the training again. This repo seems unfinished, lots of parts simplified compared with the original code. We can help to improve it.

@2g-XzenG
Copy link

@Liu0329
@jazzsaxmafia
@RutgersHan

Hello, I am currently reading the code and thanks for the post! For the 'alpha' calculation, aren't we suppose to use the context ('context_encode') and the previous hidden state? but in the implementation, it seems that it's summing up all the previous hidden state?

context_encode = context_encode +
tf.expand_dims(tf.matmul(h, self.hidden_att_W), 1) +
self.pre_att_b

Thanks!

@sjksong
Copy link

sjksong commented Mar 3, 2019

@Liu0329
@jazzsaxmafia
@RutgersHan
@1230pitchanqw
Hello,I'm a undergraduate from Communication University Of China ,I'm currently working on this area for my graduation paper . Could you please tell me if you have worked out a satisfactory result based on this repo ? if it is ,how many epoch did you use to train out a good model? and what else did you change the original code?
I would appreciate it if you can give me a reply ,thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants