You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.
Then it generates a pt like this -> embeds/sp|Q99966|CITE1_HUMAN Cbp/p300-interacting transactivator 1 OS=Homo sapiens OX=9606 GN=CITED1 PE=1 SV=2.pt
Expected behavior
The expected behavior would be that the special character is handled correctly and the output .pt
is in the top level, and not broken into a directory.
Bug description
Fasta headers with special characters generate inconsistent paths when using the extract entry point.
https://github.com/facebookresearch/esm/blob/2b369911bb5b4b0dda914521b9475cad1656b2ac/scripts/extract.py#L105C1-L105C67
Reproduction steps
If a fasta file with this input is used as an input:
Then it generates a pt like this ->
embeds/sp|Q99966|CITE1_HUMAN Cbp/p300-interacting transactivator 1 OS=Homo sapiens OX=9606 GN=CITED1 PE=1 SV=2.pt
Expected behavior
The expected behavior would be that the special character is handled correctly and the output .pt
is in the top level, and not broken into a directory.
GOT ->
embeds
/sp|Q99966|CITE1_HUMAN Cbp
/p300-interacting transactivator 1 OS=Homo sapiens OX=9606 GN=CITED1 PE=1 SV=2.pt
EXPECTED ->
embeds
/sp|Q99966|CITE1_HUMAN Cbp/p300-interacting transactivator 1 OS=Homo sapiens OX=9606 GN=CITED1 PE=1 SV=2.pt
OR throw a warning and generateembeds
/sp|Q99966|CITE1_HUMAN Cbp_p300-interacting transactivator 1 OS=Homo sapiens OX=9606 GN=CITED1 PE=1 SV=2.pt
LMK if you would like a PR for it!
The text was updated successfully, but these errors were encountered: