Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entailment pr #39

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Entailment pr #39

wants to merge 6 commits into from

Conversation

vyskoto4
Copy link
Contributor

The entailment model implementation is not 100% final, I had to replace some hardcoded code by the SequenceSplit layer and the performance is not as good as before.

model.add_node(RepeatVector(spad), name='Wh_n_cross_e', input='Wh_n')
model.add_node(TimeDistributedDense(N,W_regularizer=l2(l2reg)), name='WY', input='Y')
model.add_node(Activation('tanh'), name='M', inputs=['Wh_n_cross_e', 'WY'], merge_mode='sum')
model.add_node(TimeDistributedDense(1,activation='softmax'), name='alpha', input='M')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

softmax on 1 dim?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it was a typo, replaced it with linear

@pasky
Copy link
Member

pasky commented Sep 27, 2016

Thanks a lot for the update! I'm a lot happier I think, a bit wary about the ugly looking code in rnn_input but I understand the neccessity and we'll be able to throw away that complexity again with the Keras1.0 port. So it's okay. :)

But I'm still a bit leery about the ptscorer business. I think the basic sanity check is:

  • What happens when we use a different model with the rte task? (E.g. re the sigmoid -> linear activation change. But I don't understand that anyway?)
  • What happens when we use the model with a different task?
    Are these cases both okay?

The other problem I see is that we use completely different ptscoring strategies when the to_n is involved - I don't understand that. I think we discussed just making the output dimension a parameter of mlp_ptscorer? Would that be inferior to the current solution?

@vyskoto4
Copy link
Contributor Author

vyskoto4 commented Oct 3, 2016

  • The sigmoid-linear change in RTE task has been done only because of the new model, as the reached accuracy was the best with these settings. I also tried RNN, AVG (and maybe CNN) and their performance seemed to be the same as before.
  • I did not test it with a different task. That is on my todo list.
  • I'l test the modified mlp_scorer as soon as I get some GPU time.

EDIT:

  • I deleted the custom scorer and the mlp_scorer is being used. I did not notice any significant performance changes after this.
  • Now testing the new (ok, its actually old today) on anssel/wang. I did not run full eval but the best val MRR during traning was 0.845.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants