Scripts and workflows for use analyzing UK Biobank data from the DNANexus Research Analysis Platform
Most will be written in bash and will interact with the dx tools. unless stated otherwise, these scripts will be executed on your local machine.
- You will need dx toolkit installed on your local workstation https://documentation.dnanexus.com/downloads
- Must have a RAP project with data dispensed
- Must be logged in using dx login
- Must have RAP project selected with dx select
- Must create a folder in your RAP project called
/data/
for storing data - I have a folder where I store the phenotype and covariate text files seperate from the data directory in case I want to delete all my working files and recreate them later.
/gwas_cohort_textfiles/
- Must have already created your own phenotype and optionally a covariates file. (more on those files below)
- I also have a
/scripts
folder in my UKB RAP project for storing and combination scripts that I choose to execute within the dx instance.
The phenotype file should be a tab or space delimited text file with a minimum of 3 columns. For plink, missing values should be coded "-9" for regenie "NA"
FID IID pheno1 pheno2 pneno3
The covariate file will look similar with "-9" for missing data for regenie "NA"
FID IID Sex Age BMI pca1 pca2 pca3 ... pca10
In both cases, FID and IID are duplicates of the EID column from the UKB.