IFRisk is a tool for calculating risk scores based on functional genetic variation, for example gene expression risk scores. This tool was primarily designed to calculate risk scores based on the output of FUSION, which infers functional changes based on previously derived SNP-weights.
As FUSION say, 'FUSION is a suite of tools for performing a transcriptome-wide (or any other ome-wide) association study by predicting functional/molecular phenotypes into GWAS using only summary statistics.' More information can be found here.
-
R and the required packages:
install.packages(c('data.table','optparse'))
-
Perform TWAS using FUSION:
-
Instructions on how to perform a TWAS are available here.
-
Feature (e.g. gene expression) predictions in target sample levels in the target sample:
- Instructions on how to impute gene expression levels are here.
The output of FUSION.assoc_test.R or a file containing the following columns: FILE, P0, P1, TWAS.Z, TWAS.P. Per chromosome files should be combined into a single file. An example is available here. The file can whitespace or comma delimited. If the file name ends .gz, the file will be assumed to be gzipped.
A file containing feature predictions in the target sample. This is output of the FeaturePred script. The first two columns are FID and IID, then each column contains feature predictions for each individual. An example is available here. The gene expression column names must match the values in the FILE column in the --twas_results file. IFRisk ignores the substring before the last '/' and the '.wgt.RDat' string when matching. For example, the column name for the gene expression corresponding to the first value of the example TWAS results should be 'CMC.LOC643837'. The file can whitespace or comma delimited. If the file name ends .gz, the file will be assumed to gzipped.
R-squared threshold for clumping genes. Clumping will retain the most significant feature within each region.
Default value = 0.9
Window for deriving pruning blocks in bases.
Default value = 5e6
The p-value thresholds used to derive the risk scores. There must not be spaces between the values.
Default value = '5e-1,1e-1,5e-2,1e-2,1e-3,1e-4,1e-5,1e-6'
Option to retain only the most significant feature within the MHC region.
Default value = T
This comma delimited file will contain the feature-based risk scores in the target sample specified. The first two columns are FID and IID, and then each column will contain scores based on the different p-value thresholds specified.
This comma delimited file will contain information on the number of genes surpassing the different p-value threshold specified before and after clumping.
This is a log file containing general information on the time taken, any errors, the number of genes at different stages and more.
Rscript IFRisk.V1.0.R \
--twas_results ukbiobank-2017-1160-prePRS-fusion.tsv.GW \
--target_gene_exp CMC.BRAIN.RNASEQ_GeneX_all_MINI.csv \
--output demo
Rscript IFRisk.V1.0.R \
--twas_results ukbiobank-2017-1160-prePRS-fusion.tsv.GW \
--target_gene_exp CMC.BRAIN.RNASEQ_GeneX_all_MINI.csv \
--pTs 1e-5,0.01,0.5 \
--output demo
This script was written by Dr Oliver Pain.
If you have any questions or comments use the google group.