?s

Jump to bottom

esalesky edited this page Mar 16, 2018 · 8 revisions

unanswered:

maxi-batches in pytorch: how to implement that?
why does it break if init_hidden is not assigned to None?
can decoder act on sequence for teacher forcing? (tried, appears not, why?)
normalize loss by batch_size? (going with yes for now)

answered:

Should we primarily/exclusively use teacher forcing to train the MT model? (can try TA or Graham before/after class)
- Graham answer: Generally this is the case, can explore alternatives or MRT with BLEU as well. Shouldn't be necessary to do the fancy stuff though.
Normally pytorch averages losses of batch using loss_fn. When masking, should we average losses only over the non-zero elements?
- TA Answer: Yes

Clone this wiki locally