-
Notifications
You must be signed in to change notification settings - Fork 65
Converting from Illumina Top Bottom
This workflow is run by
nextflow run topbottom.nf
and converts from an Illumina TOP/BOTTOM call file. Together with auxiliary input data, this file is first converted into a raw PLINK file and then the PLINK file is aligned to a strand, and then convered into binary PLINK format. This process can take a very long time.
This workflow is also integrated as an option in the main plink-qc.nf
workflow and can be called with the right parameters. However, because it takes a really long time we've ffactored it out because in our experience in most cases the QC workflow will run many times as you tweak parameters but you'll probably only want to do the conversion once. Now, in principle this isn't necessary because with the -resume flag Nextflow is brilliant at detecting what parts of the of the workflow need to be re-run. But it's easy to forget to put the flag in and you'll then recompute everything.
So our recommended behaviour is run the conversion separately, produce your PLINK files and then run the plink-qc.nf
workflow.
This process is expensive because:
- the top/bottom file is a very bulky and inefficient format
- we convert first to PLINK using the ineffienct lgen format
- then we align the a single strand using the reference genome
Even for small data sets this can take a very long time (hours)
You require the following input
-
the actual call file from Illumina
-
vcf_merge_table
Over the years SNPs have been renamed, and merhed. You need to provide this information.
An example is: https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/data/organism_data/RsMergeArch.bcp.gz
-
reference
A FASTA file of the reference genome for the build of your arrayFor example, if you are using GrCh37 a good source might be: http://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa
-
dbsnp_all_vcf
: a recent version of dbSNP in VCF format. You will want to download something like ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/All_20170403.vcf.gz -
manifest
: the Illumina Manifest file. Download from https://support.illumina.com/downloads.html. You'll want to something like /data/aux/HumanOmni5-4-v1-0-D.csv"
About h3aGWAS
Getting started
Running pipelines
- Quick Start
- The nextflow config file
- Pipeline options
- The Pipelines
- Affy Calling
- Converting from Illumina Top-Bottom
- PLINK QC pipeline
- Association testing pipeline
- Post-GWAS analysis
Extending pipelines
Getting help