You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question but not sure if this is the right forum.
The encoder starts with word embeddings to generate a sentence embedding. The closer the network moves to the sentence embedding, the further it is from the original language. Same applies to the decoder.
Has anybody considered a 2-stage encoder + 2-stage decoder where:
first stage of the encoder is language-specific (LS)
second stage of the encoder is language-agnostic (LA)
decoder is 'symmetric'
Why would that be useful?
The first intuition is that this could be used to have 2 separate training processes. one stage is extremely LS; the other is not. Use same-language translation to train the two LA stages (and only them). Use language pairs to train the two LS stages. Here same language translation just means that the sentence getting into the encoder has to be reproduced by the decoder. Impossible to imagine a bigger training corpus of training examples.
The second intuition is that the two can be done in parallel, and that training 2 half models is cheaper than training the full one.
An alternative way to think about it is to 'explode' the current stage where the encoder output exits into the decoder without any changes. Take the the encoded output, run it through a few stages (BLSTM +/- attention) and, only then, get that into the decoder. The existing encoder/decoder parts would be trained on same-language paris. Intermediate stage trained on language pairs.
Let me know if that's been done elsewhere or if I should ask on another forum.
Thanks.
The text was updated successfully, but these errors were encountered:
I have a question but not sure if this is the right forum.
The encoder starts with word embeddings to generate a sentence embedding. The closer the network moves to the sentence embedding, the further it is from the original language. Same applies to the decoder.
Has anybody considered a 2-stage encoder + 2-stage decoder where:
Why would that be useful?
The first intuition is that this could be used to have 2 separate training processes. one stage is extremely LS; the other is not. Use same-language translation to train the two LA stages (and only them). Use language pairs to train the two LS stages. Here same language translation just means that the sentence getting into the encoder has to be reproduced by the decoder. Impossible to imagine a bigger training corpus of training examples.
The second intuition is that the two can be done in parallel, and that training 2 half models is cheaper than training the full one.
An alternative way to think about it is to 'explode' the current stage where the encoder output exits into the decoder without any changes. Take the the encoded output, run it through a few stages (BLSTM +/- attention) and, only then, get that into the decoder. The existing encoder/decoder parts would be trained on same-language paris. Intermediate stage trained on language pairs.
Let me know if that's been done elsewhere or if I should ask on another forum.
Thanks.
The text was updated successfully, but these errors were encountered: