Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snakemake? error? regarding batch file #74

Open
karinlag opened this issue Aug 27, 2024 · 6 comments
Open

Snakemake? error? regarding batch file #74

karinlag opened this issue Aug 27, 2024 · 6 comments

Comments

@karinlag
Copy link

Hi!

I am grying out this tool now. I installed via conda, and am using it on a slurm run cluster with srun. Have asked for 10 cpus. I have a file list containing 8 plasmid fasta seqs. The plasmids are hybrid assemblied. The error I get is:


(pling) [[email protected] /cluster/projects/nn9305k/active/karinlag/2024-iconic]$ pling filelist.txt testout align --cores 10 --sourmash --batch_size 2
Batching...

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Job stats:
job count


all 1
get_batches 1
total 2

Select jobs to execute...

[Tue Aug 27 17:42:14 2024]
rule get_batches:
output: testout/batches
jobid: 1
reason: Missing output files: testout/batches
resources: tmpdir=/tmp, mem_mb=10000, mem_mib=9537

Traceback (most recent call last):
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py", line 103, in
main()
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py", line 89, in main
run_smash(args.genomes_list, sig_path, matrixpath)
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py", line 63, in run_smash
raise e
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py", line 59, in run_smash
subprocess.run(f"sourmash sketch dna --from-file {genome_list} -o {sig_path}", shell=True, check=True, capture_output=True)
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'sourmash sketch dna --from-file filelist.txt -o testout/sourmash/all_plasmids.sig' returned non-zero exit status 1.
[Tue Aug 27 17:42:24 2024]
Error in rule get_batches:
jobid: 1
output: testout/batches
shell:

    PYTHONPATH=/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages python /cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py             --genomes_list filelist.txt             --batch_size 2             --outputpath testout             --sourmash             --smash_threshold 0.85             --containmentpath testout/containment/not_pairs_containment_distance.tsv
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-08-27T174211.327052.snakemake.log

Command 'snakemake --snakefile /cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/Snakefile --configfile testout/tmp_files/config.yaml --cores 10 --rerun-incomplete --nolock ' returned non-zero exit status 1.
Traceback (most recent call last):
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/bin/pling", line 10, in
sys.exit(main())
^^^^^^
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/run_pling.py", line 183, in main
pling(args)
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/run_pling.py", line 141, in pling
raise e
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/run_pling.py", line 136, in pling
subprocess.run(f"snakemake --snakefile {get_pling_path()}/batching/Snakefile {snakemake_args}", shell=True, check=True, capture_output=True)
File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'snakemake --snakefile /cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/Snakefile --configfile testout/tmp_files/config.yaml --cores 10 --rerun-incomplete --nolock ' returned non-zero exit status 1.
(pling) [[email protected] /cluster/projects/nn9305k/active/karinlag/2024-iconic]$

I am running your test dataset in the same manner, and that has not failed (so far, is still running, but got past batching).

Any idea what is wrong? I am not very familiar with snakemake, so sorry if I am making obvious mistakes of one kind or another.

@karinlag
Copy link
Author

I am a bit hesitant to download stuff onto the cluster from the internet. Can you tell me what this is?

Also, impressive response time!

@iqbal-lab
Copy link
Contributor

closing this issue for now until Daria and i can talkl

@babayagaofficial
Copy link
Collaborator

Hi Karin!

In your file list, are the paths relative or absolute? It looks like pling is erroring out when trying to run sourmash, and I vaguely remember it being finnicky about file paths at some point.

If it's not that, can you please either run the command sourmash sketch dna --from-file filelist.txt -o testout/sourmash/all_plasmids.sig and tell me what happens, or send me the plasmids you're testing on so I can try debug?

@karinlag
Copy link
Author

karinlag commented Aug 28, 2024

The files in the filelist have absolute paths, and I can ls them on the command line.

FYI I have run through your example file set and that ran without issues.

Sourmash ran well. Here is the output:

(pling) [[email protected] /cluster/projects/nn9305k/active/karinlag/2024-iconic/pling]$ less testout/sourmash/all_plasmids.sig
(pling) [[email protected] /cluster/projects/nn9305k/active/karinlag/2024-iconic/pling]$ cat testout/sourmash/all_plasmids.sig
{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2011-01-3991-4_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13071799269209422,13422641720227020,14502510590605305,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"38340f5022f1b4b06235b3706677d0cd","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2011-01-4277-6_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,11297382768904819,12495754348681416,12497891601490898,12775684893959947,13071799269209422,13422641720227020,14502510590605305,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"8fd92e18f88a3c353e3858fb27d03f39","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2002-01-856_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,1876382433387916,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13071799269209422,13422641720227020,14502510590605305,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"6d6e6e9efbf88bb71c6322a0331b1cab","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2013-01-3776_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5321635288652507,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8913670508770807,9922754642423950,10141409509109256,10604418598382448,10685131983240125,11297382768904819,11619965257402279,12775684893959947,13071799269209422,13422641720227020,13561309383030245,14468452007203572,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17585142582113419,17795910564485931,17876852118436816,18060306135304036,18081646073001083],"md5sum":"c7cdbd12a571c07668a30de1d3c952f8","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2004-01-295_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,410160685533855,1280411105032480,1548659820748513,1600759877581307,1809502862701755,3338939268924438,3625589219571383,4260766948629996,4948886978307551,5321635288652507,5775701240304176,7250362828267870,8405070370237353,10141409509109256,10685131983240125,11215808376261839,11297382768904819,11619965257402279,12495754348681416,12775684893959947,13071799269209422,13422641720227020,13561309383030245,14086504856082619,14468452007203572,14502510590605305,16722650050088156,17223053851400242,17365235757707201,17585142582113419,17876852118436816,18260973711158596],"md5sum":"e005938c091674fb429acf735c3b1df5","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2002-01-750_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1600759877581307,1809502862701755,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4121414412210452,4260766948629996,5321635288652507,5775701240304176,6759409368061580,7250362828267870,8209140697418008,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,10685131983240125,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13422641720227020,13561309383030245,14468452007203572,16119940313576173,17223053851400242,17365235757707201,17585142582113419,17795910564485931,17876852118436816,18081646073001083],"md5sum":"31884faa66759a5348f80e1a5d47a791","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2004-01-1570_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1600759877581307,1809502862701755,1908644393763148,2637738332492691,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8209140697418008,8405070370237353,8410686004908645,8913670508770807,10141409509109256,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13422641720227020,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"c98ad460af789c431d457066fba694b0","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2009-01-1808-4_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13071799269209422,13422641720227020,14502510590605305,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"38340f5022f1b4b06235b3706677d0cd","molecule":"DNA"}],"version":0.4}

Hope some of it makes sense for you!

@babayagaofficial
Copy link
Collaborator

That's pretty weird, looking at your error message it seems almost certain that pling errored out while trying to run sourmash, but then it's fine when you do it separate -- I'm very sorry about the pfaff, but I'm afraid I'm going to need some more help/information from you to debug this. Did you run sourmash in the same environment as pling? If not, can you check with which version you ran it? Can you also please check which version is installed in your pling environment?

Also, can you try running on the 8 plasmids without the sourmash flag, and let me know if those run okay?

If it's alright to share, is there any chance you can send me the fasta files for your 8 plasmids?

@ayoraind
Copy link

ayoraind commented Oct 2, 2024

@karinlag,
Is it possible that the snakemake error might be due to issues with permission for the output directory or working directory (particularly for the hidden .snakemake file) or input file directory? I noted the same error today and I simply navigated to a working directory (where I was certain that there won't be any issue with permissions), copied the plasmid fasta files into this directory, edited the input.txt file accordingly, changed the file path to the output directory (where I was certain that there won't be any issue with permissions), and re-ran Pling. It ran successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@karinlag @iqbal-lab @ayoraind @babayagaofficial and others