Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature extraction - ESM embeddings #13

Merged
merged 13 commits into from
Jul 26, 2023
Merged

Feature extraction - ESM embeddings #13

merged 13 commits into from
Jul 26, 2023

Conversation

jyaacoub
Copy link
Owner

This development line leads to the successful introduction of esm embeddings to my feature set for the graph convolutions

This fixes #8!

Storing Esm embeddings wont work since they are 320-d vectors PER amino acid...

Instead the better approach is to just store the sequence strings and leave the esm emb calc to be done on the model side of things.

See #8 for more
Esm embedding now works
Not too great, outliers ruining training?
This has to be done since argparse doesnt handle boolean choices well through the cli.
Main thing is to freeze ESM layers since its too large to train.
This is useful since epochs would be cluttered with transformer tokenizer warning logs
@jyaacoub jyaacoub added the enhancement New feature or request label Jul 26, 2023
@jyaacoub jyaacoub merged commit 9e39fa5 into main Jul 26, 2023
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add esm features
1 participant