Skip to content
This repository has been archived by the owner on Feb 28, 2023. It is now read-only.

What does POSTGAP do?

Daniel Zerbino edited this page Jan 9, 2018 · 4 revisions

1. Extracting significant GWAS peaks

If provided a diseases description(s), it attempts to map it to the Experimental Factor Ontology (EFO) using EMBL-EBI's Zooma service.

It then uses whatever diseases descriptions and ontology terms available and queries a number of public GWAS databases, in particular:

All SNPs with a p-value association below 1e-5 are retained as GWAS SNPs.

2. Computing SNP clusters of interest

Using the 1000 Genomes genotypes it creates a cluster around each GWAS SNP, by selecting all SNPs with LD r^2 > 0.7.

3. Extracting regulatory information

It queries a few databases for any evidence of regulatory activity on all of the SNPs:

4. Extracting cis-regulatory information

It queries a few databases for any evidence of cis-regulatory interactions on all of the SNPs:

  • GTEx eQTLs
  • VEP for transcript overlaps
  • Fantom5 for CAGE-tag activity correlation
  • ENCODE for DNAse Hypersensitivity Correlation
  • CHiCAGO for Promoter Capture Hi-C links

5. Synthesising results

For each relevant (Gene, SNP) pair:

  • The v2g_score for that (Gene, SNP) pair is the sum of the above scores.

  • The gene_score for that (Gene, SNP) pair is the v2g_score multiplied by the LD r2 between that SNP and the most significant nearby GWAS SNP.

  • The PICS score for that SNP is (TODO)

  • The total score for that (Gene, SNP) pair is:

((f(gene_score) + f(PICS)) / 2)^3

where f is an ad hoc weighting function:

f(X) = X * X^(1/3)