-
Notifications
You must be signed in to change notification settings - Fork 2
Configuration
- To run any analysis, you need to set up the configuration file,
LoF.config
. Below I will take you through the setup of the configuration file and break it down for you. - You can use the #-sign to comment out lines in the configuration file, these will not be read by the bash-based scripts.
You will need to set directories to all supporting tools on the server, for instance:
LOFTOOLKIT=/path/to/LOFTK
PERL=/path/to/perl
VEP=/path/to/ensembl-vep/vep
ENSEMBL=/path/to/ensembl-vep
LOFTEE=/path/to/loftee
Since we are using the SLURM system we need to provide memory and time settings for various steps in the process. For instance, we predicted LoF variants/genes by splitting the analyses per chromosome, which works for us with ~2,000 samples and . You will probably have to edit this, if your sample is larger. In our case the time and memory for the analysis of one chromosome is as follows:
## Convert allele probes files (IMPUTE2 output) to VCF.
QUEUE_PROB2VCF_CONFIG="05:00:00"
VMEM_PROB2VCF_CONFIG="40G"
## LoF annotation (VEP, LOFTEE).
QUEUE_ANNOTATION_CONFIG="10:00:00"
VMEM_ANNOTATION_CONFIG="90G"
## Calculation of LoF genes.
QUEUE_LOF_GENE_CONFIG="15:00:00"
VMEM_LOF_GENE_CONFIG="90G"
## Calculation of LoF variants.
QUEUE_LOF_SNP_CONFIG="15:00:00"
VMEM_LOF_SNP_CONFIG="90G"
## Statistical output
QUEUE_STAT_DESC_CONFIG="02:00:00"
VMEM_STAT_DESC_CONFIG="90G
You can be notified when (sub)-analytical steps are beginning or ending, or worse, are aborted. You can set this as follows:
# REQUIRED: mailing settings
# you're e-mail address; you'll get an email when the job has ended or when it was aborted
# 'BEGIN' Mail is sent at the beginning of the job;
# 'END' Mail is sent at the end of the job;
# 'FAIL' Mail is sent when the job is aborted or rescheduled.
# 'REQUEUE' Mail is sent when the job is suspended;
# 'ALL' equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT;
# 'NONE' No mail is sent.
YOUREMAIL="your_email"
MAILSETTINGS="END,FAIL"
The top part of your configuration file should look like the one below.
### CONFIGURATION FILE FOR LoF TOOLKIT ###
# Precede your comments with a #-sign.
# Set the directory variables, the order doesn't matter.
# Don't end the directory variables with '/' (forward-slash)!
### --- SYSTEM SETTINGS --- ###
# REQUIRED: Path_to where LoFToolKit resides on the server.
LOFTOOLKIT=/path/to/LOFTK
# REQUIRED: Path_to support programs on the server.
PERL=/path/to/perl
VEP=/path/to/ensembl-vep/vep
ENSEMBL=/path/to/ensembl-vep
LOFTEE=/path/to/loftee
### --- SLURM SETTINGS --- ###
## Convert allele probes files (IMPUTE2 output) to VCF.
QUEUE_PROB2VCF_CONFIG="05:00:00"
VMEM_PROB2VCF_CONFIG="40G"
## LoF annotation (VEP, LOFTEE).
QUEUE_ANNOTATION_CONFIG="10:00:00"
VMEM_ANNOTATION_CONFIG="90G"
## Calculation of LoF genes.
QUEUE_LOF_GENE_CONFIG="15:00:00"
VMEM_LOF_GENE_CONFIG="90G"
## Calculation of LoF variants.
QUEUE_LOF_SNP_CONFIG="15:00:00"
VMEM_LOF_SNP_CONFIG="90G"
## Statistical output
QUEUE_STAT_DESC_CONFIG="02:00:00"
VMEM_STAT_DESC_CONFIG="90G
# REQUIRED: mailing settings
# you're e-mail address; you'll get an email when the job has ended or when it was aborted
# 'BEGIN' Mail is sent at the beginning of the job;
# 'END' Mail is sent at the end of the job;
# 'FAIL' Mail is sent when the job is aborted or rescheduled.
# 'REQUEUE' Mail is sent when the job is suspended;
# 'ALL' equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT;
# 'NONE' No mail is sent.
YOUREMAIL="your_email"
MAILSETTINGS="END,FAIL"
You have probably organized your work in folders, here you can set these. You should set a ROOTDIR
and provide a PROJECTNAME
. These two variables are used to create two new folders in the ROOTDIR
; [PROJECTNAME]_Files_for_VCF_LoF
and [PROJECTNAME]_LoF_output
.
You can add this to the configuration file:
### --- ANALYSIS SETTINGS --- ###
# REQUIRED: Path_to a directory where the main analysis directory resides.
ROOTDIR=/path/to/your_input_data
PROJECTNAME="progect_name"
There are some specific settings that depend on the type of analysis you will run:
-
Data type
- Here you must choose the type of your input data from one of these in
DATA_TYPE
; genotype, exome and genome
- Here you must choose the type of your input data from one of these in
-
Input file format
Only 2 file formats are accepted to run LoFTK:- IMPUTE2 output format
❗ If you set
FILE_FORMAT
to IMPUTE2, please set the INFO score cutoff (INFO
) and Probability cutoff (PROB
).- VCF
❗ If you set FILE_FORMAT to VCF, do your input data have been phased? Answer with
yes
orno
-
Select the assembly version
LoFTK supports both Homo sapiens (human) genome assembly GRCh37 and GRCh38. you have to choose one of them. -
Set chromosomes range
We highly recommend splitting data per chromosome or even chunks per chromosome. But here, you need to set the range of chromosomes. for instance, analyzing chromosome 1 to 14, setCHROMOSOMES="$(seq 1 22)"
, while if you need to analyze specific chromosomes (not in range), such as chr 1, 4, 7 and 22, setCHROMOSOMES="1 4 7 22"
.
# REQUIRED: Set data type and input file format:
# Set data type, choose one of these options [genotype/exome/genome]
DATA_TYPE="genotype"
# Set input file format, choose one of these options [IMPUTE2/VCF] # IMPUTE2 must includes required files (hap|allele_probe|info|sample)
FILE_FORMAT="VCF"
# IF you set FILE_FORMAT to IMPUTE2, please set the INFO score cutoff (default: 0.4) and Probability cutoff (default: 0.05)
INFO=0.8
PROB=0.05
# If you set FILE_FORMAT to VCF, do your input data have been phased? [yes/no]
PHASE_STATUS="yes"
# REQUIRED: Select the assembly version, choose one of these options [GRCh37/GRCh38]
ASSEMBLY="GRCh37"
#REQUIRED: Set chromosomes range, e.g. CHROMOSOMES='$(seq 1 22)'
CHROMOSOMES="$(seq 1 22)"
CC-BY-SA-4.0 License