-
Notifications
You must be signed in to change notification settings - Fork 2
Configuration
- To run any analysis, you need to set up the configuration file,
LoF.config
. Below I will take you through the setup of the configuration file and break it down for you. - You can use the #-sign to comment out lines in the configuration file, these will not be read by the bash-based scripts.
You will need to set directories to all supporting tools on the server, for instance:
LOFTOOLKIT=/path/to/LOFTK
PERL=/path/to/perl
VEP=/path/to/ensembl-vep/vep
ENSEMBL=/path/to/ensembl-vep
LOFTEE=/path/to/loftee
Since we are using the SLURM system we need to provide memory and time settings for various steps in the process. For instance, for predicting LoF variants/genes we split the analyses per chromosome, which works for us with n = 2,000 samples. You will probably have to edit this, if your sample is larger. In our case the time and memory for the analysis of one chromosome is as follows:
## Convert allele probes files (IMPUTE2 output) to VCF.
QUEUE_PROB2VCF_CONFIG="05:00:00"
VMEM_PROB2VCF_CONFIG="40G"
## LoF annotation (VEP, LOFTEE).
QUEUE_ANNOTATION_CONFIG="10:00:00"
VMEM_ANNOTATION_CONFIG="90G"
## Calculation of LoF genes.
QUEUE_LOF_GENE_CONFIG="15:00:00"
VMEM_LOF_GENE_CONFIG="90G"
## Calculation of LoF variants.
QUEUE_LOF_SNP_CONFIG="15:00:00"
VMEM_LOF_SNP_CONFIG="90G"
## Statistical output
QUEUE_STAT_DESC_CONFIG="02:00:00"
VMEM_STAT_DESC_CONFIG="90G
You can be notified when (sub)-analytical steps are beginning or ending, or worse, are aborted. You can set this as follows:
# REQUIRED: mailing settings
# you're e-mail address; you'll get an email when the job has ended or when it was aborted
# 'BEGIN' Mail is sent at the beginning of the job;
# 'END' Mail is sent at the end of the job;
# 'FAIL' Mail is sent when the job is aborted or rescheduled.
# 'REQUEUE' Mail is sent when the job is suspended;
# 'ALL' equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT;
# 'NONE' No mail is sent.
YOUREMAIL="your_email"
MAILSETTINGS="END,FAIL"
The top part of your configuration file should look like the one below.
### CONFIGURATION FILE FOR LoF TOOLKIT ###
# Precede your comments with a #-sign.
# Set the directory variables, the order doesn't matter.
# Don't end the directory variables with '/' (forward-slash)!
### --- SYSTEM SETTINGS --- ###
# REQUIRED: Path_to where LoFToolKit resides on the server.
LOFTOOLKIT=/path/to/LOFTK
# REQUIRED: Path_to support programs on the server.
PERL=/path/to/perl
VEP=/path/to/ensembl-vep/vep
ENSEMBL=/path/to/ensembl-vep
LOFTEE=/path/to/loftee
### --- SLURM SETTINGS --- ###
## Convert allele probes files (IMPUTE2 output) to VCF.
QUEUE_PROB2VCF_CONFIG="05:00:00"
VMEM_PROB2VCF_CONFIG="40G"
## LoF annotation (VEP, LOFTEE).
QUEUE_ANNOTATION_CONFIG="10:00:00"
VMEM_ANNOTATION_CONFIG="90G"
## Calculation of LoF genes.
QUEUE_LOF_GENE_CONFIG="15:00:00"
VMEM_LOF_GENE_CONFIG="90G"
## Calculation of LoF variants.
QUEUE_LOF_SNP_CONFIG="15:00:00"
VMEM_LOF_SNP_CONFIG="90G"
## Statistical output
QUEUE_STAT_DESC_CONFIG="02:00:00"
VMEM_STAT_DESC_CONFIG="90G
# REQUIRED: mailing settings
# you're e-mail address; you'll get an email when the job has ended or when it was aborted
# 'BEGIN' Mail is sent at the beginning of the job;
# 'END' Mail is sent at the end of the job;
# 'FAIL' Mail is sent when the job is aborted or rescheduled.
# 'REQUEUE' Mail is sent when the job is suspended;
# 'ALL' equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT;
# 'NONE' No mail is sent.
YOUREMAIL="your_email"
MAILSETTINGS="END,FAIL"
You have probably organized your work in folders, here you can set these. You should set a ROOTDIR
and provide a PROJECTNAME
. These two variables are used to create two new folders in the ROOTDIR
; [PROJECTNAME]_Files_for_VCF_LoF
and [PROJECTNAME]_LoF_output
.
You can add this to the configuration file:
### --- ANALYSIS SETTINGS --- ###
# REQUIRED: Path_to a directory where the main analysis directory resides.
ROOTDIR=/path/to/your_input_data
PROJECTNAME="progect_name"
there are some specific settings that depend on the type of analysis you will run:
- Data type
- Input file format -- IF you set FILE_FORMAT to IMPUTE2, please set the INFO score cutoff and Probability cutoff -- If you set FILE_FORMAT to VCF, is the data phased? [yes/no]
- Select the assembly version
- Set chromosomes range
# REQUIRED: Set data type and input file format:
# Set data type, choose one of these options [genotype/exome]
DATA_TYPE="genotype"
# Set input file format, choose one of these options [IMPUTE2/VCF] # IMPUTE2 must includes required files (hap|allele_probe|info|sample)
FILE_FORMAT="VCF"
# IF you set FILE_FORMAT to IMPUTE2, please set the INFO score cutoff and Probability cutoff
INFO=0.8
PROB=0.05 # e.g. 0.05 (round 0.05 down to 0 and 0.95 up to 1, in between will be set as missing) #@#
# If you set FILE_FORMAT to VCF, is the data phased? [yes/no]
PHASE_STATUS="yes"
# REQUIRED: Select the assembly version, choose one of these options [GRCh37/GRCh38]
ASSEMBLY="GRCh37"
#REQUIRED: Set chromosomes range, e.g. CHROMOSOMES='$(seq 1 22)'
CHROMOSOMES="$(seq 1 22)"
CC-BY-SA-4.0 License