Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instructions for local install? #8

Open
kubu4 opened this issue Jul 11, 2022 · 5 comments
Open

Instructions for local install? #8

kubu4 opened this issue Jul 11, 2022 · 5 comments

Comments

@kubu4
Copy link

kubu4 commented Jul 11, 2022

I'm trying to run the pipeline test via the Singularity image on our university's computing cluster, which doesn't have internet access when executing jobs.

I've downloaded all the of the input files listed in test.config. I've also downloaded the Singularity image (singularity pull docker://epidiverse/wgbs:1.0) and changed the nextflow.config file to specify the Singularity image location, like so:

// -profile singularity
	singularity {
		includeConfig "${baseDir}/config/base.config"
		singularity.enabled = true
		process.container = '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/work/singularity/wgbs_1.0.sif'
	}

That seemed like that should be all that was needed, but when I execute the test command (NXF_VER=20.07.1 /gscratch/srlab/programs/nextflow-21.10.6-all run /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/wgbs-1.0 -profile test,singularity), it fails with this error:

executor >  local (10)
[c4/79070c] process > INDEX:erne_bs5_indexing        [100%] 1 of 1 ✔
[30/202688] process > INDEX:segemehl_indexing        [100%] 1 of 1 ✔
[07/dc2230] process > WGBS:read_trimming (sampleB)   [100%] 8 of 8, failed: 8...
[-        ] process > WGBS:read_merging              -
[-        ] process > WGBS:fastqc                    -
[-        ] process > WGBS:erne_bs5                  -
[-        ] process > WGBS:segemehl                  -
[-        ] process > WGBS:erne_bs5_processing       -
[-        ] process > WGBS:segemehl_processing       -
[-        ] process > WGBS:bam_merging               -
[-        ] process > WGBS:bam_subsetting            -
[-        ] process > WGBS:bam_filtering             -
[-        ] process > WGBS:bam_statistics            -
[-        ] process > CALL:bam_processing            -
[-        ] process > CALL:Picard_MarkDuplicates     -
[-        ] process > CALL:MethylDackel              -
[-        ] process > CALL:conversion_rate_estima... -

Pipeline execution summary
---------------------------
Name         : infallible_mccarthy
Profile      : test,singularity
Launch dir   : /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test
Work dir     : /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/work
Status       : failed
Error report : Error executing process > 'WGBS:read_trimming (sampleA)'

Caused by:
  Process `WGBS:read_trimming (sampleA)` terminated with an error exit status (1)

Command executed:

  mkdir fastq fastq/logs
  cutadapt -j 2 -a AGATCGGAAGAGC -A AGATCGGAAGAGC \
  -q 20 -m 36 -O 3 \
  -o fastq/merge.null \
  -p fastq/merge.g null g \
  > fastq/logs/cutadapt.sampleA.merge.log 2>&1

Command exit status:
  1

Command output:
  (empty)

Work dir:
  /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/work/12/6ee9cc7a7372a97f34f21a4f79efb3

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Error executing process > 'WGBS:read_trimming (sampleA)'

Caused by:
  Process `WGBS:read_trimming (sampleA)` terminated with an error exit status (1)

Command executed:

  mkdir fastq fastq/logs
  cutadapt -j 2 -a AGATCGGAAGAGC -A AGATCGGAAGAGC \
  -q 20 -m 36 -O 3 \
  -o fastq/merge.null \
  -p fastq/merge.g null g \
  > fastq/logs/cutadapt.sampleA.merge.log 2>&1

Command exit status:
  1

Command output:
  (empty)

Work dir:
  /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/work/12/6ee9cc7a7372a97f34f21a4f79efb3

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

When I look at the Cutadapt log file, this is what is shown:

cat cutadapt.sampleA.merge.log 
This is cutadapt 2.10 with Python 3.6.7
Command line parameters: -j 2 -a AGATCGGAAGAGC -A AGATCGGAAGAGC -q 20 -m 36 -O 3 -o fastq/merge.null -p fastq/merge.g null g
Processing reads on 2 cores in paired-end mode ...
ERROR: Traceback (most recent call last):
  File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/pipeline.py", line 477, in run
    with xopen(self.file, 'rb') as f:
  File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/xopen/__init__.py", line 407, in xopen
    return open(filename, mode)
IsADirectoryError: [Errno 21] Is a directory: 'null'

ERROR: Traceback (most recent call last):
  File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/pipeline.py", line 477, in run
    with xopen(self.file, 'rb') as f:
  File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/xopen/__init__.py", line 407, in xopen
    return open(filename, mode)
IsADirectoryError: [Errno 21] Is a directory: 'null'

ERROR: Traceback (most recent call last):
  File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/pipeline.py", line 540, in run
    raise e
IsADirectoryError: [Errno 21] Is a directory: 'null'

Traceback (most recent call last):
  File "/opt/conda/envs/wgbs/bin/cutadapt", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/__main__.py", line 855, in main
    stats = r.run()
  File "/opt/conda/envs/wgbs/lib/python3.6/site-packages/cutadapt/pipeline.py", line 770, in run
    raise e
IsADirectoryError: [Errno 21] Is a directory: 'null'

Did I miss something that needs to be setup for a local install to run properly?

@bio15anu
Copy link
Member

Thanks for opening this issue! There seems to be something going on with the "Command executed:" section in the error message. Specifically here:

  -o fastq/merge.null \
  -p fastq/merge.g null g \

where "null" should reflect the reads variable from L48-L49 in wgbs.nf

 -o fastq/${params.merge ? "${readtype}." : ""}${reads[0]} \\
 -p fastq/${params.merge ? "${readtype}." : ""}${reads[1]} ${reads} \\

I suspect the issue here is that we need to create a new test.config file for running the test profile offline. Can you provide some more information as to what you did here, exactly? Did you modify the paths in the existing test.config file?

@bio15anu
Copy link
Member

As an aside to this issue, I just wanted to point out that during a typical pipeline run it is not necessary to have an open internet connection. If your intention is to submit to a queuing system, for example, which perhaps sends the job to another node where there is no internet connection, it should be enough to have already pulled the pipeline normally from the login node. You will get a local copy of the pipeline in ~/.nextflow/assets which is the first place nextflow will look for the pipeline whenever you run it.

Is that relevant for your use case at all?

@kubu4
Copy link
Author

kubu4 commented Jul 18, 2022

Thanks for looking into this. It is much appreciated!

Did you modify the paths in the existing test.config file?

Gah! Yes! Sorry for not including that!! Here's what the modified test.config file looks like:

/*
 * -------------------------------------------------
 *  Nextflow config file for running tests
 * -------------------------------------------------
 * Defines bundled input files and everything required
 * to run a fast and simple test. Use as follows:
 *   nextflow run epidivere/wgbs -profile test
 */


params {

    // enable all steps
    input = "test profile"
    merge = true
	INDEX = true
    trim = true
    fastqc = true
    unique = true

	// genome reference
	reference = "/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/genome/genome.fa"

    // set readPaths parameter (only available in test profile)
    readPaths = [
    ['sampleA', 'input', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_2.fastq.gz'],
    ['sampleB', 'input', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_2.fastq.gz']
    ]

    // set mergePaths parameter (only available in test profile)
    mergePaths = [
    ['sampleA', 'merge', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/merge/sampleA_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/merge/sampleA_2.fastq.gz'],
    ['sampleB', 'merge', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/merge/sampleB_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/merge/sampleB_2.fastq.gz']
    ]
}

As an aside to this issue, I just wanted to point out that during a typical pipeline run it is not necessary to have an open internet connection. If your intention is to submit to a queuing system, for example, which perhaps sends the job to another node where there is no internet connection, it should be enough to have already pulled the pipeline normally from the login node. You will get a local copy of the pipeline in ~/.nextflow/assets which is the first place nextflow will look for the pipeline whenever you run it.

Is that relevant for your use case at all?

Yeah, we'd be running on a high performance computing cluster (uses SLURM job manager). Was just trying to confirm that the install and using Singularity on the computing nodes would work properly. Figured troubleshooting would be easier if test ran successfully.

@bio15anu
Copy link
Member

In this new test.config file for running offline, it looks like you've lost the nested tuples in both readPaths and mergePaths.

So for example this:

readPaths = [
    ['sampleA', 'input', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_2.fastq.gz'],
    ['sampleB', 'input', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_2.fastq.gz']
    ]

should be changed to this:

readPaths = [
    ['sampleA', 'input', ['/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_2.fastq.gz']],
    ['sampleB', 'input', ['/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_2.fastq.gz']]
    ]

@bio15anu
Copy link
Member

By the way, I am very happy to assist you in writing a configuration profile for running your nextflow pipelines with SLURM. Nextflow is able to integrate very nicely with such resource management software, where it can automatically submit each process as a job in your queue system for example. Please feel free to post a new issue requesting help with this and I will try to tailor it for your system as best as I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants