-
Notifications
You must be signed in to change notification settings - Fork 0
esalesky edited this page Mar 16, 2018
·
8 revisions
unanswered:
- maxi-batches in pytorch: how to implement that?
- why does it break if init_hidden is not assigned to None?
- can decoder act on sequence for teacher forcing? (tried, appears not, why?)
- normalize loss by batch_size? (going with yes for now)
answered:
- Should we primarily/exclusively use teacher forcing to train the MT model? (can try TA or Graham before/after class)
- Graham answer: Generally this is the case, can explore alternatives or MRT with BLEU as well. Shouldn't be necessary to do the fancy stuff though.
- Normally pytorch averages losses of batch using loss_fn. When masking, should we average losses only over the non-zero elements?
- TA Answer: Yes