Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nextflow error #46

Open
Song-10-YF opened this issue Jan 6, 2025 · 7 comments
Open

nextflow error #46

Song-10-YF opened this issue Jan 6, 2025 · 7 comments

Comments

@Song-10-YF
Copy link

executor > local (4)
[6f/512ff1] check_input (1) | 3 of 4, failed: 3, retries: 3
[- ] omamer_run -
[- ] infer_roothogs -
[- ] batch_roothogs -
[- ] hog_big -
[- ] hog_rest -
[- ] collect_subhogs -
[- ] extract_pairwise_ortholog_relations -
[- ] fastoma_report -
[da/21a398] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (1)
[cf/393603] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (2)
[b6/087b8b] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (3)
ERROR ~ Error executing process > 'check_input (1)'

Caused by:
Process check_input (1) terminated with an error exit status (1)

Command executed:

fastoma-check-input --proteomes proteome --species-tree species_tree.nwk --out-tree species_tree_checked.nwk --splice splice --hogmap hogmap_in --omamer_db LUCA.h5 -vv

Command exit status:
1

executor > local (4)
[6f/512ff1] check_input (1) | 4 of 4, failed: 4, retries: 3 ?
[- ] omamer_run -
[- ] infer_roothogs -
[- ] batch_roothogs -[- ] hog_big -
[- ] hog_rest -
[- ] collect_subhogs -[- ] extract_pairwise_ortholog_relations -
[- ] fastoma_report -
[da/21a398] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (1)
[cf/393603] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (2)
[b6/087b8b] NOTE: Process check_input (1) terminated with an error exit status (1) -- Execution is retried (3)
ERROR ~ Error executing process > 'check_input (1)'

Caused by:
Process check_input (1) terminated with an error exit status (1)

Command executed:

fastoma-check-input --proteomes proteome --species-tree species_tree.nwk --out-tree species_tree_checked.nwk --splice splice --hogmap hogmap_in --omamer_db LUCA.h5 -vv

Command exit status:
1

Command output:
(empty)

Command error:
2025-01-06 21:21:03 DEBUG Arguments: Namespace(proteomes='proteome', species_tree='species_tree.nwk', out_tree='species_tree_checked.nwk', splice='splice', hogmap='hogmap_in', omamer_db='LUCA.h5', v=2)
2025-01-06 21:21:03 INFO There are 3 files in the proteome folder.
2025-01-06 21:21:03 WARNING We expect that only fa/fasta files are in the proteome folder. Better to remove these ['TCP.pep', 'FN.pep', 'DCP.pep']
2025-01-06 21:21:03 ERROR There are not enough proteomes in the folder
2025-01-06 21:21:03 ERROR Check input failed. FastOMA halted!
2025-01-06 21:21:03 ERROR Halting FastOMA because of invalid proteome input data

Work dir:
/home/songyf/software/FastOMA/work/6f/512ff1e24ecfb95498cfe18fdda78f

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details
Hello, I encountered the above errors while running locally. Why is this happening?

@sinamajidian
Copy link
Collaborator

Hi @Song-10-YF
The input proteome files should be in fasta format, ending with .fa. I guess your file names are TCP.pep, FN.pep, DCP.pep.
Note that a (rough) species tree in newick format is also needed.

Best,
Sina

@Song-10-YF
Copy link
Author

Thanks!
But changing the file extension, the staging of the foreign file at https://omabrowser.org/All/LUCA.h5 has been stuck at this step for nearly 10 hours.

@alpae
Copy link
Member

alpae commented Jan 7, 2025

Hi,

the LUCA.h5 file is ~8.8GB large. Depending on your internet connection this might take some time. also, if you're running this on a HPC cluster, please ensure that the node from where you run nextflow has indeed access to the internet.

To check if it the pipeline works otherwise, you could also use a smaller OMAmer database, e.g. https://omabrowser.org/All/Primates.h5 (~100MB).

@Song-10-YF
Copy link
Author

[2790.780s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
executor > local (52)
[0a/aa740b] check_input (1) | 1 of 1 ✔
[07/b2443c] omamer_run (TCP.fa) | 3 of 3 ✔
[9f/0cc29f] infer_roothogs (1) | 1 of 1 ✔
[39/ad775d] batch_roothogs (1) | 1 of 1 ✔
[b2/8feefb] hog_big (11) | 0 of 13
[28/050d34] hog_rest (40) | 0 of 43
[- ] collect_subhogs -
[- ] ext…airwise_ortholog_relations -
[- ] fastoma_report -
[2790.839s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[2790.848s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[2790.855s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 136k, guardsize: 0k, detached.
[2790.861s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.

executor > local (53)
[0a/aa740b] check_input (1) | 1 of 1 ✔
[07/b2443c] omamer_run (TCP.fa) | 3 of 3 ✔
[9f/0cc29f] infer_roothogs (1) | 1 of 1 ✔
[39/ad775d] batch_roothogs (1) | 1 of 1 ✔
[d4/d39ff0] hog_big (1) | 0 of 13
[28/050d34] hog_rest (40) | 0 of 43
[- ] collect_subhogs -
[- ] ext…airwise_ortholog_relations -
[- ] fastoma_report -
ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details

executor > local (53)
[0a/aa740b] check_input (1) | 1 of 1 ✔
[07/b2443c] omamer_run (TCP.fa) | 3 of 3 ✔
[9f/0cc29f] infer_roothogs (1) | 1 of 1 ✔
[39/ad775d] batch_roothogs (1) | 1 of 1 ✔
[d4/d39ff0] hog_big (1) | 0 of 13
[28/050d34] hog_rest (40) | 0 of 43
[- ] collect_subhogs -
[- ] ext…airwise_ortholog_relations -
[- ] fastoma_report -
ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details
ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details

Completed at : 2025-01-07T13:14:12.143116+08:00
Duration : 46m 22s
Processes : 7 (success), 0 (failed)
Output in : Cpal_out
Nextflow report : Cpal_out/stats
Oops .. something went wrong

executor > local (53)
[0a/aa740b] check_input (1) | 1 of 1 ✔
[07/b2443c] omamer_run (TCP.fa) | 3 of 3 ✔
[9f/0cc29f] infer_roothogs (1) | 1 of 1 ✔
[39/ad775d] batch_roothogs (1) | 1 of 1 ✔
[f7/b4345a] hog_big (12) | 1 of 13
[28/050d34] hog_rest (40) | 0 of 43
[- ] collect_subhogs -
[- ] ext…airwise_ortholog_relations -
[- ] fastoma_report -
ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details
ERROR ~ Execution aborted due to an unexpected error

-- Check '.nextflow.log' file for details
WARN: Killing running tasks (46)

@alpae
Copy link
Member

alpae commented Jan 8, 2025

Hi, this looks like a problem with Nextflow itself. what os system are you using? and which profile?
what is reported in the .nextflow.log file?

@Song-10-YF
Copy link
Author

I'm using the command: nextflow run FastOMA.nf --input_folder Cpal --output_folder Cpal_out --report
My system is CentOS. The version of nextflow is 24.10.3.5933. I suspect the error might be due to insufficient threads.

Thread[process reaper,10,system]
[email protected]/java.lang.ProcessHandleImpl.waitForProcessExit0(Native Method)
[email protected]/java.lang.ProcessHandleImpl$1.run(ProcessHandleImpl.java:138)
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[email protected]/java.lang.Thread.run(Thread.java:829)

1月-07 13:14:12.104 [main] DEBUG nextflow.Session - Session await > all processes finished
1月-07 13:14:12.114 [main] DEBUG nextflow.Session - Session await > all barriers passed
1月-07 13:14:12.159 [main] INFO nextflow.script.BaseScript -
1月-07 13:14:12.160 [main] INFO nextflow.script.BaseScript - Completed at : 2025-01-07T13:14:12.143116+08:00
1月-07 13:14:12.161 [main] INFO nextflow.script.BaseScript - Duration : 46m 22s
1月-07 13:14:12.162 [main] INFO nextflow.script.BaseScript - Processes : 7 (success), 0 (failed)
1月-07 13:14:12.163 [main] INFO nextflow.script.BaseScript - Output in : Cpal_out
Nextflow report : Cpal_out/stats
1月-07 13:14:12.163 [main] INFO nextflow.script.BaseScript - Oops .. something went wrong
1月-07 13:14:12.184 [main] WARN n.processor.TaskPollingMonitor - Killing running tasks (46)
1月-07 13:14:12.277 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 23; name: hog_rest (4); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/19/c9b0cb8a8f6446ef23795a43188ce4] -- cause: Cannot run program "bash": error=11, 资源暂时不可用
1月-07 13:14:12.279 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 55; name: hog_rest (36); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/4b/094b791afe49b11d1aa9479f823bc6] -- cause: Cannot run program "bash": error=11, 资源暂时不可用
1月-07 13:14:12.280 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 58; name: hog_rest (39); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/ee/b30551c715a9e154973b1c06f4c24e] -- cause: Cannot run program "bash": error=11, 资源暂时不可用
1月-07 13:14:12.281 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 39; name: hog_rest (20); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/64/0547406d1e0a8322eecf9d5a0f50a6] -- cause: Cannot run program "bash": error=11, 资源暂时不可用
1月-07 13:14:12.282 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 7; name: hog_rest (1); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/76/43ee1bcaac41291d4ad3c97a275816] -- cause: Cannot run program "bash": error=11, 资源暂时不可用
1月-07 13:14:12.314 [main] DEBUG n.processor.TaskPollingMonitor - Failed to kill pending tasks: TaskHandler[id: 49; name: hog_rest (30); status: RUNNING; exit: -; error: -; workDir: /home/songyf/software/FastOMA/work/66/3b3f49cd341f6d4c5bffed22b0f90d] -- cause: Cannot run program "bash": error=11, 资源暂时不可用
1月-07 13:14:13.197 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=7; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=9; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=46; succeedDuration=53m 43s; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=20 GB; peakRunning=47; peakCpus=59; peakMemory=592 GB; ]
1月-07 13:14:13.197 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file
1月-07 13:14:13.200 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
1月-07 13:14:14.563 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
1月-07 13:14:14.890 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
1月-07 13:14:14.924 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

@sinamajidian
Copy link
Collaborator

Thanks for sharing the logs.
It seems that the system ran out of threads, which should not happen normally.

  1. Is the system you are using shared with many users? and many processes are running (you can check with top or htop)
  2. Could you run it again? It might be on time issue with the system.
  3. Also, can you run this cat /proc/sys/kernel/threads-max for me the output is 12382340.
  4. btw have you tried running on the test dataset provided on the github?

You can add -resume to nextflow run to resume your previous run. Sometimes it is better to start over from an empty folder. Note that nextflow creates work folder and some hidden files (you can check with ls -a).

Btw, you can add --omamer_db LUCA.h5 or the primates.h5 to command line to tell nextflow not download the file again (avoid staging step).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants