Skip to content

Quick Start

Scott Hazelhurst edited this page May 30, 2017 · 4 revisions

Running the pipeline

This page illustrates how to run the pipeline on a small sample data file with default parameters. For real runs, the data to be analysed and the various parameters to be used are specified in the nextflow.config file. The details will be explained in another section.

The sample data to be used is in the input directory (in PLINK format as sampleA.bed, sampleA.bim, sampleA.fam). The default nextflow.config file uses this, and so you can run the workflow through with this example.

Running on your local computer

This requires that all software dependancies have been installed. This wiki assumes you are running nextflow in the main directory of the h3agwas repo.

nextflow run plink-qc.nf

The workflow runs and output goes to the output directory. In the sampleA.pdf file, a record of the analysis can be found.

In order, to run the workflow on another PLINK data set, say mydata.{bed,bim,fam}, say

nextflow run plink-qc.nf --input_pat mydata

If the data is another directory, and you want to the data to go elsehwere:

nextflow run plink-qc.nf --input_pat mydata --input_dir /data/project10/ --output_dir ~/results

There are many other options that can be passed on the the command-line. Options can also be given in the config file (explained below). We recommend putting options in the configuration file since these can be archived, which makes the workflow more portable

Running with Docker on your local computer

Execute

nextflow run plink-qc.nf -profile docker

Please note that the first time you run the workflow using Docker, the Docker images will be downloaded. Warning: This will take about 1GB of bandwidth which will consume bandwidth and will take time depending on your network connection. It is only the first time that the workflow runs that the image will be downloaded.