Skip to content

Converting from Illumina Top Bottom

Scott.Hazelhurst edited this page May 26, 2017 · 3 revisions

This workflow is run by

nextflow run topbottom.nf

and converts from an Illumina TOP/BOTTOM call file. Together with auxiliary input data, this file is first converted into a raw PLINK file and then the PLINK file is aligned to a strand, and then convered into binary PLINK format. This process can take a very long time.

This workflow is also integrated as an option in the main plink-qc.nf workflow and can be called with the right parameters. However, because it takes a really long time we've ffactored it out because in our experience in most cases the QC workflow will run many times as you tweak parameters but you'll probably only want to do the conversion once. Now, in principle this isn't necessary because with the -resume flag Nextflow is brilliant at detecting what parts of the of the workflow need to be re-run. But it's easy to forget to put the flag in and you'll then recompute everything.

So our recommended behaviour is run the conversion separately, produce your PLINK files and then run the plink-qc.nf workflow.

This process is expensive because:

  • the top/bottom file is a very bulky and inefficient format
  • we convert first to PLINK using the ineffienct lgen format
  • then we align the a single strand using the reference genome

Even for small data sets this can take a very long time (hours)

Input

You require the following input