Genome-wide association studies (GWAS) generate large volumes of associations between genomic variations and phenotypic traits. However, assessing strength, specificity and relevance of associations, and semantically valid aggregation of associations, for applications such as drug target prioritization, is challenging. This project addresses this challenge.
The NHGRI-EBI GWAS Catalog is itself an expertly designed and curated aggregation of GWAS results and metadata, and the primary data source for this project. Our effort builds upon the GWAS Catalog with more specific applications and use cases, focused on protein-coding genes and well defined traits semantically related to disease states relevant to discovery of drugs and druggable targets.
- "TIGA: Target illumination GWAS analytics", Jeremy Yang, Dhouha Grissa, Christophe Lambert, Cristian Bologa, Stephen Mathias, Anna Waller, David Wild, Lars Juhl Jensen, Tudor Oprea, Bioinformatics, btab427, https://doi.org/10.1093/bioinformatics/btab427, published 04 June 2021.
- Poster presented at February 2021 Annual IDG Meeting
- R 4.2+; readr, data.table, igraph, muStat, RMySQL (Webapp: shiny, DT, shinyBS, shinysky, plotly)
- Python 3.9+; pandas, BioClients
- Java 8+; Jena, IU_IDSL_JENA
- GWAS Catalog studies each have a
study_accession
. Also are associated with a publication (PubMedID), but not uniquely. OR_or_BETA
: Reported odds ratio or beta-coefficient associated with strongest SNP risk allele. Note that if an OR <1 is reported this is inverted, along with the reported allele, so that all ORs included in the Catalog are >1. Appropriate unit and increase/decrease are included for beta coefficients.MAPPED_GENE
: Gene(s) mapped to the strongest SNP. If the SNP is located within a gene, that gene is listed. If the SNP is intergenic, upstream and downstream genes are listed. May be chromosomal location or range (e.g. "LOC102723594 - LOC285043").- Documentation: methods; curation; fileheaders
- Reference: Buniello, A. et al. (2019) The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res., 47, D1005–D1012.
- Beta coefficients require units and thus are not comparable between
non-convertible units (e.g. mg vs mm). Nor are beta
coefficients comparable with OR, so it is questionable that these values
are combined in one field
OR_or_BETA
. Current TIGA workaround is to use simple count of beta values supporting gene-trait association. - GWAS Catalog developers have devoted major effort to more precisely mapping traits to EFO, and EFO has increasingly aligned with MONDO. This represents a major improvement with regard to semantic precision and scientific rigor. However, this also means that results from the Catalog and TIGA have changed from release to release, which can be confusing, and presents a challenge for aggregating studies by trait.
- Protein-coding gene to disease association focus.
- Evidence assessment based on confirmatory statistics.
- iCite annotations from iCite API, via PMIDs from GWAS Catalog.
- Visualization of associations for a given disease by scatter plot of effect size versus meanRankScore, inverse multivariate mean rank of benchmark-validated variables.
See WORKFLOW.md for details describing how to update the TIGA dataset from sources.
Latest release and archives of full dataset and utility files.