Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

ESMFold crashing with specific FASTA files #698

Open
npcooley opened this issue Jul 26, 2024 · 0 comments
Open

ESMFold crashing with specific FASTA files #698

npcooley opened this issue Jul 26, 2024 · 0 comments

Comments

@npcooley
Copy link

npcooley commented Jul 26, 2024

Bug description
I am deploying ESMFold on the open science pool, and there are some sets of FASTA files that seem to always crash, they also appear to crash when run locally in a docker container. This doesn't appear to be a resource issue, but it can be difficult to tell from the Condor logs sometimes.

Reproduction steps
Running ESMFold out of a docker container, using a command that generically looks like:

conda run -n py39-esmfold esm-fold -i <seqs.fa> -o <you/can/send/this/wherever> -m <some/mounted/volume> --cpu-only > result.txt

Expected behavior
This command completes cleanly for some input files, but not others. It seems to uniformly fail with the error pasted below when running interactively. I've attached two fasta files, one for which the command runs cleanly, and one for which it fails.

Logs
Failure output, for an interactive job:
(base) root@6843f707dd31:/# conda run -n py39-esmfold esm-fold -i UserData/id0001partners00002.fa -o . -m ESModels --cpu-only
24/07/26 14:46:28 | INFO | root | Reading sequences from UserData/id0001partners00002.fa
24/07/26 14:46:28 | INFO | root | Loaded 2 sequences from UserData/id0001partners00002.fa
24/07/26 14:46:28 | INFO | root | Loading model
24/07/26 14:48:03 | INFO | root | Starting Predictions

/tmp/tmplazwrhgo: line 3: 39 Killed esm-fold -i UserData/id0001partners00002.fa -o . -m ESModels --cpu-only

ERROR conda.cli.main_run:execute(125): conda run esm-fold -i UserData/id0001partners00002.fa -o . -m ESModels --cpu-only failed. (See above for error)

Success output, for an interactive job:
(base) root@6843f707dd31:/# conda run -n py39-esmfold esm-fold -i UserData/id0001partners00011.fa -o . -m ESModels --cpu-only
24/07/26 15:14:04 | INFO | root | Reading sequences from UserData/id0001partners00011.fa
24/07/26 15:14:04 | INFO | root | Loaded 2 sequences from UserData/id0001partners00011.fa
24/07/26 15:14:04 | INFO | root | Loading model
24/07/26 15:16:31 | INFO | root | Starting Predictions
24/07/26 15:30:19 | INFO | root | Predicted structure for 1_1_3688 with length 335, pLDDT 91.6, pTM 0.726 in 414.3s (amortized, batch size 2). 1 / 2 completed.
24/07/26 15:30:19 | INFO | root | Predicted structure for 2_1_23 with length 335, pLDDT 91.8, pTM 0.719 in 414.3s (amortized, batch size 2). 2 / 2 completed.

Output goes here

Additional context
Technically when these jobs are running on the OSPool they're running out of singularity containers, as opposed to docker containers, though I don't know how much that matters. I get different kill codes on the OSPool, though that could be a site specific thing, i.e, when i interrogate my logs from Condor for jobs that Condor believes did not go over memory, I get:

$ cat LogFilesCB/out.2.err
/srv/tmpgo8p_04o: line 3: 34 Killed esm-fold -i id0001partners00002.fa -o structs -m ESModels --cpu-only

ERROR conda.cli.main_run:execute(125): conda run esm-fold -i id0001partners00002.fa -o structs -m ESModels --cpu-only failed. (See above for error)

$ cat LogFilesCB/out.137.err
/srv/tmpdklsx5w4: line 3: 34 Bus error (core dumped) esm-fold -i id0001partners00137.fa -o structs -m ESModels --cpu-only

ERROR conda.cli.main_run:execute(125): conda run esm-fold -i id0001partners00137.fa -o structs -m ESModels --cpu-only failed. (See above for error)
/srv//Run.sh: line 35: 24 Bus error (core dumped) conda run -n py39-esmfold esm-fold -i "$TARGET" -o structs -m ESModels --cpu-only > "$INFILE1"

$ cat LogFilesCB/out.173.err
/srv/tmp0pg3lc07: line 3: 36 Killed esm-fold -i id0001partners00173.fa -o structs -m ESModels --cpu-only

ERROR conda.cli.main_run:execute(125): conda run esm-fold -i id0001partners00173.fa -o structs -m ESModels --cpu-only failed. (See above for error)

The '00002' and '00011' FASTA files have been attached as 'txt' files because of file extension restrictions.
id0001partners00002.txt
id0001partners00011.txt

EDIT
One additional extra piece of context is that when these jobs complete successfully, CPU usage is near the max possible for the requested resource. When they fail like this, CPU usage is nearly minimal.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant