Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the model's accuracy so low when using residue embeddings from pre-trained model? #156

Open
Gift-OYS opened this issue Dec 5, 2024 · 1 comment

Comments

@Gift-OYS
Copy link

Gift-OYS commented Dec 5, 2024

I’m using a pre-trained model esm3-sm-open-v1 to extract residue embeddings via link. However, the precision of my model is unexpectedly low. For reference, here’s a small snippet of the residue embeddings (shape: [num_tokens, embedding_dim]):

tensor([[ 175.0000,  102.5000,  -99.0000,  ..., -106.5000,  -35.5000,
           86.0000],
        [ 205.0000,  103.5000, -139.0000,  ..., -328.0000, -224.0000,
          134.0000],
        [ 130.0000,   49.2500,  -26.2500,  ..., -202.0000, -161.0000,
          134.0000],
        ...,
        [  65.0000,  -75.0000,   54.5000,  ...,  -62.0000,  -26.2500,
         -102.5000],
        [ 173.0000,  -60.0000,  205.0000,  ...,   43.0000,  -89.0000,
         -115.5000],
        [  -6.0000,  170.0000,  113.0000,  ...,  -44.0000, -115.0000,
           43.7500]])
@thnhan
Copy link

thnhan commented Feb 16, 2025

I’m using a pre-trained model esm3-sm-open-v1 to extract residue embeddings via link. However, the precision of my model is unexpectedly low. For reference, here’s a small snippet of the residue embeddings (shape: [num_tokens, embedding_dim]):

tensor([[ 175.0000,  102.5000,  -99.0000,  ..., -106.5000,  -35.5000,
           86.0000],
        [ 205.0000,  103.5000, -139.0000,  ..., -328.0000, -224.0000,
          134.0000],
        [ 130.0000,   49.2500,  -26.2500,  ..., -202.0000, -161.0000,
          134.0000],
        ...,
        [  65.0000,  -75.0000,   54.5000,  ...,  -62.0000,  -26.2500,
         -102.5000],
        [ 173.0000,  -60.0000,  205.0000,  ...,   43.0000,  -89.0000,
         -115.5000],
        [  -6.0000,  170.0000,  113.0000,  ...,  -44.0000, -115.0000,
           43.7500]])

I also get the same embeddings as this. And I also worry about why the values of elements of the embedding are larger than those embeddings generated by the various other modes like ProtT5, ESMC, so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants