-
I simply used ESM-1b to get the pretrained embeddings (extracted from 33th layer following the tutorial) for a batch of sequences, and I found that the L2-norms of the learned embeddings are about 25. Is it generally recommended to normalize the embeddings before using it for downstream or not? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Yes it's definitely been said in deep learning literature that whitening input features is a good idea. Definitely important if you do feature combination. However we've seen in the past that for feeding outer concat of the embeddings into a resnet (supervised contact prediction), the input scaling didn't make any difference. Maybe it's just trivial for the first layer of the convnet to adapt. Would be interested to hear if you find any difference in downstream experiments |
Beta Was this translation helpful? Give feedback.
Yes it's definitely been said in deep learning literature that whitening input features is a good idea. Definitely important if you do feature combination. However we've seen in the past that for feeding outer concat of the embeddings into a resnet (supervised contact prediction), the input scaling didn't make any difference. Maybe it's just trivial for the first layer of the convnet to adapt. Would be interested to hear if you find any difference in downstream experiments