-
Notifications
You must be signed in to change notification settings - Fork 0
To Do
Andrew Runge edited this page Mar 11, 2018
·
12 revisions
baseline:
- attn decoder: cuda + minibatch capable
- cross-entropy loss - Note: didn't need to do this
- gradient clipping
- learning rate decay by 0.5 after every 10 epochs
- beam search (beam size=5)
not baseline:
- morph-tag data, bpe it
maybe:
- initialize decoder with mean encoder hidden instead of last (i vote try with and without)
- linear between embeds and hidden (personally I'd like to try this with and without to compare)
- maxibatches
- conditional gru for first decoder layer
- early stopping? nematus default is just after 10. i think we're okay to not do this & just do 2 restarts