Skip to content

Association testing pipeline

Scott.Hazelhurst edited this page May 26, 2017 · 2 revisions

#Introduction

An association study is a complex analysis and each analysis has to consider

  • the disease/phenotype being studied and its mode of inheritance
  • population structure
  • other covariates

For this reason it is difficult to build a high quality, generic pipeline to do an association study.

The purpose of this pipeline is to perform a very superficial initial analysis that can be used as one piece of information to guide a rigorous analysis. Of course, we would encourage users to build their own Nextflow script for their rigorous analysis, perhaps using our script as a start.

Our script, plink-assoc.nf takes as input PLINK files that have been through quality control and

  • does a principal component analysis on the data, and produces pictures from that;
  • performs a simple association test giving odds ratio and raw and adjusted p values

Running

The pipeline is run: nextflow run plink-assoc.nf

The key options are:

  • input_dir, output_dir: where input and output goes to and comes from;
  • input_pat: the base of set of PLINK bed,bim and fam files (this should only match one);

By default a chi2 test for association is done. But you can do multiple different tests in one run by settintg the appropriate parameter to 1. Note at least one must be set to 1

  • chi2 : should a chi2 test be used (0 or 1)
  • fisher: Fisher exact test
  • linear: should linear regreession be used?
  • logistic: should linear regression be used?
  • gemma: should gemma be used?

and then for all the tests except gemma, do you want to adjust for multiiple testing using Bonferroni correction

  • adjust

For example

```nextflow run plink-assoc --input_pat raw-GWA-data --chi2 1 --logistic 1 --adjust 1``

analyses the files raw-GWA-data bed, bim, fam files and performs a chi2 and logistic regression test, and also does multiple testing correction.

#Output

The output can be found in the specified output directory. The key outputs are

  • A PDF report of the QC that was done;
  • A set of PLINK files

The PDF report will explain what was done and describe the other output files.