Reduce protein embedding dimensions #612

lincoln-harris · 2023-09-12T17:26:07Z

lincoln-harris
Sep 12, 2023

I'm using the ESM-2 model to generate embeddings for proteins. I'm following the instructions on the README, namely, batching the data, generating per-residue representations and marginalizing those to get per-sequence representations. I'm wondering if there is a way to generate an embedding vector for a protein with fewer than 1280 dimensions? I have a small-scale deep neural network model that may struggle to learn linear layer parameters for such high-dimensional vectors. Using the ESM-2 model to generate, say, 32 dimensional protein embeddings would be super useful to me.
Thanks!

ptynecki · 2023-09-12T18:17:39Z

ptynecki
Sep 12, 2023

@lincoln-harris did you try to adopt vector decomposition method (PCA, UMAP or t-SNE) after the origin embedding and before the network? It should meet your requirements.

2 replies

lincoln-harris Sep 18, 2023
Author

Dimensionality reduction with PCA is definitely an option; I was wondering if there was a way to do this within the ESM-2 model code itself. I'm guessing the answer is no?

cclark1e Jun 30, 2024

@lincoln-harris Hi Lincoln, did you ever find a way to get this going? I'm facing a similar issue and dimensionality reduction is dropping too much information to be viable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce protein embedding dimensions #612

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Reduce protein embedding dimensions #612

lincoln-harris Sep 12, 2023

Replies: 1 comment · 2 replies

ptynecki Sep 12, 2023

lincoln-harris Sep 18, 2023 Author

cclark1e Jun 30, 2024

lincoln-harris
Sep 12, 2023

Replies: 1 comment 2 replies

ptynecki
Sep 12, 2023

lincoln-harris Sep 18, 2023
Author