Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omamer_run terminated with an error exit status (254) #51

Open
vincentkaiheng opened this issue Jan 13, 2025 · 6 comments
Open

omamer_run terminated with an error exit status (254) #51

vincentkaiheng opened this issue Jan 13, 2025 · 6 comments

Comments

@vincentkaiheng
Copy link

Hi,
Thank you for developing such a useful tool!
I encountered some problems when running it on my dataset. My command was:
nextflow run FastOMA.nf -profile standard --input_folder ./in_folder --output_folder ./out_folder --omamer_db ./LUCA.h5 --report
The following is the error message:

ERROR ~ Error executing process > 'omamer_run (B.vulgaris.fa)'

Caused by:
  Process `omamer_run (B.vulgaris.fa)` terminated with an error exit status (254)


Command executed:

  if [ -f hogmap_in/B.vulgaris.fa.hogmap ] ; then
      cp hogmap_in/B.vulgaris.fa.hogmap  B.vulgaris.fa.hogmap
  else
      omamer search -n 10 --db LUCA.h5 --query B.vulgaris.fa --out B.vulgaris.fa.hogmap
  fi

Command exit status:
  254

Command output:
  (empty)

Command error:
  .command.run: fork: retry: No child processes
  .command.run: fork: retry: No child processes
  .command.run: fork: retry: No child processes
  .command.run: fork: retry: No child processes
  .command.run: fork: Resource temporarily unavailable

Work dir:
  /home/yankh/extra/OMA/work/7e/bcd9288ed8d1c4aa96795a7bb0a0f2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

Attached below is one of the input fasta files and the tree file, as well as '.nextflow.log'
B.vulgaris.fa.txt
species_tree.nwk.txt
nextflow.log.txt

I would be very grateful if you can answer!

Best wishes,
Vincent

@sinamajidian
Copy link
Collaborator

Hi Vincent
Thank you for using fastOMA. I ran omamer on the fasta file you shared and it looks ok.

1- Have you tried running fastoma on the test dataset provided on our github?

2- Are your running fastOMA on a MAC or linux, laptop or university cluster? It seems that your system ran out of processes. FastOMA uses Nextflow, which generates hundreds of tasks. There is some discussion here. I'm not sure about their suggestions but at least rebooting your laptop is an option. (Also comparing output of ps -e | awk '{print $4" "$5" "$6}' | sort | uniq -c | sort -n | tail with output max number of processes of your systemulimit -u)

3- Could you please share with me these hidden files .command.out .command.log .command.err in the folder /home/yankh/extra/OMA/work/7e/bcd9288ed8d1c4aa96795a7bb0a0f2/

4- You could also limit the number of tasks that nextflow requests by editing nextflow.config file in the same folder as FastOMA.nf, and adding executor{queueSize=5} to the section standard {,

standard {
    executor{queueSize=5}
    process.executor = 'local'
  }

and rerun fastoma by adding -resume to nextflow run.

Hope it helps.
Best,
Sina

@vincentkaiheng
Copy link
Author

Hi Sina,
Thank you for your timely reply!

1- I have tried to run the test data, and it's OK.
2- I am running fastOMA on university cluster. Also, I used the command you mentioned. Before I reconnected to the cluster, the maximum number of processes in the system was not reached.
4- I disconnected and reconnected to the cluster, set up nextflow.config as you said and continued my run. So far everything seems to be working fine, and I am in the second step:

executor >  local (13)
[92/df5df9] process > check_input (1)                     [100%] 1 of 1, cached: 1 ✔
[f7/750259] process > omamer_run (E.japonica_B.fa)        [ 23%] 8 of 34
[-        ] process > infer_roothogs                      -
[-        ] process > batch_roothogs                      -
[-        ] process > hog_big                             -
[-        ] process > hog_rest                            -
[-        ] process > collect_subhogs                     -
[-        ] process > extract_pairwise_ortholog_relations -
[-        ] process > fastoma_report                      -

By the way, is the executor{queueSize=5} in the config file used to specify the number of parallel programs? I am wondering if I have more than a hundred input files, can I increase the specified value a bit?

3- Here below is my .command.err .command.log, but I can't find the .command.out file in the folder.
command.err.txt
command.log.txt

Thanks again for your help! I will continue to ask if I encounter other questions in the future, I hope this will not disturb you.
Best,
Vincent

@vincentkaiheng
Copy link
Author

Hi,
I encounter another error:

executor >  local (38)
[92/df5df9] process > check_input (1)                     [100%] 1 of 1, cached: 1 ✔
[8d/db5b76] process > omamer_run (E.japonica_A.fa)        [100%] 34 of 34 ✔
[93/15c9df] process > infer_roothogs (1)                  [ 75%] 3 of 4, failed: 3, retries: 3
[-        ] process > batch_roothogs                      -
[-        ] process > hog_big                             -
[-        ] process > hog_rest                            -
[-        ] process > collect_subhogs                     -
[-        ] process > extract_pairwise_ortholog_relations -
[-        ] process > fastoma_report                      -

ERROR ~ Error executing process > 'infer_roothogs (1)'

Caused by:
  Process `infer_roothogs (1)` terminated with an error exit status (1)


Command executed:

  fastoma-infer-roothogs  --proteomes proteome                                --hogmap hogmaps                                --splice splice                                --out-rhog-folder "omamer_rhogs"                                --min-sequence-length 40                                -vv

Command exit status:
  1

Command output:
  230499
  There are 18130 candidate pairs of rhogs for merging.
  There are 4001 clusters.
  ** the recursion limit is 1000
  There are 4444 selected clusters.
  len(rhogs_prots) is  29900
  22629 715640
  109869
  2833

Command error:
  2025-01-14 13:31:32 DEBUG    The roothog E0737906 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0592599 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0621638 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0779940 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0216848 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0478059 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0745579 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0431155 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0410240 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0281001 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0639257 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0218398 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0613804 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0695505 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0145579 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0140176 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0706763 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0988691 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0623030 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0815490 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0257854 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0338657 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0255413 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0746685 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0270213 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:32 DEBUG    The roothog E0070964 was too small with size of 1 which is smaller than threshold 2
  2025-01-14 13:31:36 INFO     Writing Sequences of 19796 roothogs finished.
  2025-01-14 13:31:37 DEBUG    linclust rooting startedmmseqs easy-linclust --threads 10 singleton_unmapped.fa singleton_unmapped tmp_linclust
  tmp_linclust/15935287327383337746/clu_tmp/13112024653720786912/linclust.sh: line 76: 142289 Segmentation fault      (core dumped) $RUNNER "$MMSEQS" "${ALIGN_MODULE}" "$INPUT" "$INPUT" "$RESULTDB" "${TMP_PATH}/aln" ${ALIGNMENT_PAR}
  2025-01-14 13:31:38 DEBUG     linclust is done done
  Traceback (most recent call last):
  230499
  There are 18130 candidate pairs of rhogs for merging.
  There are 4001 clusters.
  ** the recursion limit is 1000
  There are 4444 selected clusters.
  len(rhogs_prots) is  29900

Here is my .nextflow.log file and some files in the folder /home/yankh/extra/OMA/work/93/15c9df735f45ed377690dd8c9e4216
nextflow.log.txt
command.err.txt
command.out.txt
command.log.txt

Could you help to figure out what's wrong with this run?

@sinamajidian
Copy link
Collaborator

No worries, I'd be happy to help and thanks for the updates.

Yes, queueSize limits number of parallel tasks and increase the wait time. In our slurm cluster I set it as 500 to have fastOMA as fast as possible.

The new error on linclust might again relates to lack of resource since it is requiring -n 10 cpus. Unfortunately Segmentation fault is a broad term and hard to debug. People already reported some seq error on linclust github. could you run this to make sure it is one time issue or not?

mkdir test; cd test
cp /home/yankh/extra/OMA/work/93/15c9df735f45ed377690dd8c9e4216/singleton_unmapped.fa .
grep ">" singleton_unmapped.fa | wc -l
tail -2 singleton_unmapped.fa
mmseqs easy-linclust --threads 10 singleton_unmapped.fa singleton_unmapped tmp_linclust

grep ">" singleton_unmapped_all_seqs.fasta | wc -l
tail -n2 singleton_unmapped_all_seqs.fasta

If it didn't work, please share the file singleton_unmapped.fa with me.
When mmseqs is installed in the system, fastOMA uses it, which I believe provide slight improvements. So, last resort would be to uninstall mmseqs.

So, you mentioned you are using university cluster, is it one machine with for example 48 CPUs shared with many people? or there are several computation nodes, supporting SLURM? are you requesting computing nodes with Sinteractive? how many cpus and RAM you are requesting.
If it supports slurm, instead of -profile standard you can use -profile slurm with nextflow run and try it on testdata. In this way, it requests one computing node per tasks and you can track the submitted jobs with squeue.

@sinamajidian
Copy link
Collaborator

sinamajidian commented Jan 15, 2025

If you get error when running mmseqs easy-linclust --threads 10 singleton_unmapped.fa singleton_unmapped tmp_linclust You could try this out

conda install -c conda-forge -c bioconda mmseqs2=14.7e284

I was able to get almost similar error with mmseqs version 16, but when I installed the older version it worked.

@vincentkaiheng
Copy link
Author

Sorry for the late response. Thanks again for your enthusiastic response!

I ran the command line you mentioned and found that it reported an error, then I downgraded mmseqs2 to 14.7e284 (in a separate environment), and it worked. However, I found that this 14.7e284 version of mmseqs2 conflicts with the software that omamer depends on, that is, I currently have no way to properly install it in the mamba environment where I run fastoma.

Finally, I uninstalled mmseqs2. So far, the previous problem has been solved and I have run to this step:

executor >  local (13)
[c6/b326d2] process > check_input (1)                     [100%] 1 of 1, cached: 1 ✔
[34/5568bf] process > omamer_run (A.rusticana_S2_h1.fa)   [100%] 34 of 34, cached: 34 ✔
[02/28e956] process > infer_roothogs (1)                  [100%] 1 of 1 ✔
[0c/f87e9a] process > batch_roothogs (1)                  [100%] 1 of 1 ✔
[cc/cfa521] process > hog_big (8)                         [  0%] 1 of 138
[1d/712daa] process > hog_rest (77)                       [  0%] 0 of 276
[-        ] process > collect_subhogs                     -
[-        ] process > extract_pairwise_ortholog_relations -
[-        ] process > fastoma_report                      -

Btw, I'm using one machine with 112 CPUs and 503G RAM shared with a few people, and it doesn't support the SLURM. I think the standard mod is suitable for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants