Normalize the embeddings or not? #233

jasperhyp · 2022-07-17T15:52:54Z

jasperhyp
Jul 17, 2022

I simply used ESM-1b to get the pretrained embeddings (extracted from 33th layer following the tutorial) for a batch of sequences, and I found that the L2-norms of the learned embeddings are about 25. Is it generally recommended to normalize the embeddings before using it for downstream or not?

Answered by tomsercu

Jul 20, 2022

Yes it's definitely been said in deep learning literature that whitening input features is a good idea. Definitely important if you do feature combination. However we've seen in the past that for feeding outer concat of the embeddings into a resnet (supervised contact prediction), the input scaling didn't make any difference. Maybe it's just trivial for the first layer of the convnet to adapt. Would be interested to hear if you find any difference in downstream experiments

View full answer

tomsercu · 2022-07-20T19:53:38Z

tomsercu
Jul 20, 2022

Yes it's definitely been said in deep learning literature that whitening input features is a good idea. Definitely important if you do feature combination. However we've seen in the past that for feeding outer concat of the embeddings into a resnet (supervised contact prediction), the input scaling didn't make any difference. Maybe it's just trivial for the first layer of the convnet to adapt. Would be interested to hear if you find any difference in downstream experiments

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize the embeddings or not? #233

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Normalize the embeddings or not? #233

jasperhyp Jul 17, 2022

Replies: 1 comment

tomsercu Jul 20, 2022

jasperhyp
Jul 17, 2022

tomsercu
Jul 20, 2022