You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In get_esm_embedding.py>process_fasta,get_esm_if_embedding.py>embedding and data_process.py>prep_test_dataset,utils.py>process_fasta_file functions,fasta ids are parsed like:
# in get_esm_embedding.py>process_fastaID_list.append(rec.id.split("|")[1])
# in get_esm_if_embedding.py>embeddingids= [rec.id.split("|")[1] forrecinrecs]
seqs= {rec.id.split("|")[1]: str(rec.seq) forrecinrecs}
# in data_process.py>prep_test_datasetID_list= [rec.idforrecinrecs]
# in utils.py>process_fasta_fileforiinrange(0, len(lines), 3): # hard code of fasta formats,not robustid=lines[i].strip().replace(">", "")
I use get_esm*_embedding.py to generate embedding (see.npy) from a fasta file like:
>|see # `>sea` will lead to get_esm*_embedding.py raise IndexError: list index out of range
some sequence
>|sea
some sequence
When I use inference.py,the id is parsed as |sea and the script fails.I adjusts data_process.py to make it work.
I suggest:
the scripts above use the same strategy to parse ids
refactor code to call only one function to keep consistent
define an optional argument key to pass a Callable object to let others decide how to parse rec.id,like python's list.sort(key=None).
The text was updated successfully, but these errors were encountered:
Thanks for testing the code and pointing out inconsistencies. They resulted from handling different fasta ID conventions when I developed the pipeline. I will refactor the code to use a single function to parse the fasta file to be consistent.
In
get_esm_embedding.py>process_fasta
,get_esm_if_embedding.py>embedding
anddata_process.py>prep_test_dataset
,utils.py>process_fasta_file
functions,fasta ids are parsed like:I use
get_esm*_embedding.py
to generate embedding (see.npy) from a fasta file like:>|see # `>sea` will lead to get_esm*_embedding.py raise IndexError: list index out of range some sequence >|sea some sequence
When I use
inference.py
,the id is parsed as|sea
and the script fails.I adjustsdata_process.py
to make it work.I suggest:
key
to pass a Callable object to let others decide how to parserec.id
,like python'slist.sort(key=None)
.The text was updated successfully, but these errors were encountered: