Skip to content

BioinfoMachineLearning/gate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GATE: Graph Transformers for Estimating Protein Model Accuracy

Table of Contents

  1. Introduction
  2. Installation
  3. Configuration
  4. Usage
  5. Citing This Work

Introduction

GATE is a tool designed for estimating protein model accuracy using advanced graph transformers. This repository contains the code, pre-trained models, and instructions for setup and usage.

Program workflow

The overall performance of GATE (MULTICOM_GATE) in CASP16 EMA competition in terms of Z-scores

CASP16 result

The overall performance of GATE (MULTICOM_GATE) in CASP16 EMA competition in terms of per-target average

Table 1. Average per-target evaluation metrics (Pearson's correlation, Spearman's correlation, ranking loss and AUC) of 23 CASP16 predictors in terms of TM-score and Oligo-GDT-TS. The best performance for each metric is shown in bold, the second-best is underlined, and the third-best is underlined and italicized. The methods are ordered by the CASP16 Assessors' score.

Predictor Name Corrᵖ (TM-score) Corrˢ (TM-score) Ranking Loss (TM-score) AUC (TM-score) Corrᵖ (Oligo-GDT-TS) Corrˢ (Oligo-GDT-TS) Ranking Loss (Oligo-GDT-TS) AUC (Oligo-GDT-TS)
MULTICOM_LLM 0.6836 0.4808 0.1230 0.6685 0.6722 0.4656 0.1252 0.6603
MULTICOM_GATE 0.7076 0.4514 0.1221 0.6680 0.7235 0.4399 0.1328 0.6461
AssemblyConsensus 0.6367 0.4661 0.1824 0.6584 0.7701 0.5163 0.1753 0.6702
ModFOLDdock2 0.6542 0.4640 0.1371 0.6859 0.6547 0.4143 0.1530 0.6588
MULTICOM 0.6156 0.4380 0.1207 0.6660 0.6413 0.4319 0.1368 0.6536
MIEnsembles-Server 0.6072 0.4498 0.1325 0.6670 0.6084 0.4091 0.1451 0.6671
GuijunLab-QA 0.6480 0.4149 0.1195 0.6328 0.6524 0.3972 0.1406 0.6377
GuijunLab-Human 0.6327 0.4148 0.1477 0.6368 0.6404 0.3976 0.1499 0.6483
MULTICOM_human 0.5897 0.4260 0.1518 0.6576 0.6149 0.4217 0.1498 0.6572
GuijunLab-PAthreader 0.5309 0.3744 0.1331 0.6237 0.6360 0.4353 0.1371 0.6382
ModFOLDdock2R 0.5724 0.3867 0.1375 0.6518 0.6339 0.3724 0.1483 0.6355
GuijunLab-Assembly 0.5439 0.3280 0.1636 0.6191 0.5809 0.3135 0.1611 0.6182
ChaePred 0.4548 0.3971 0.1580 0.6534 0.4875 0.3673 0.1563 0.6331
ModFOLDdock2S 0.5285 0.3116 0.1806 0.6084 0.5819 0.3335 0.1648 0.6129
MQA_server 0.4326 0.2913 0.1468 0.6120 0.5617 0.3708 0.1521 0.6323
MQA_base 0.4331 0.2897 0.1462 0.6085 0.5533 0.3597 0.1509 0.6281
GuijunLab-Complex 0.4889 0.3019 0.1792 0.6054 0.5693 0.3310 0.1772 0.6077
AF_unmasked 0.4015 0.2731 0.1595 0.6052 0.4354 0.2875 0.1815 0.6113
MQA 0.4410 0.2425 0.2183 0.5858 0.4911 0.2631 0.2499 0.5874
COAST 0.3840 0.2297 0.2091 0.6072 0.4484 0.2678 0.2204 0.6078
MULTICOM_AI 0.3281 0.2623 0.1913 0.6057 0.3843 0.2834 0.1963 0.6111
VifChartreuse 0.2921 0.2777 0.1440 0.6149 0.2982 0.2469 0.1641 0.5956
VifChartreuseJaune 0.3421 0.1756 0.1630 0.5951 0.3300 0.1548 0.1915 0.5811
PIEFold_human 0.1929 0.1451 0.2306 0.5497 0.2599 0.1759 0.2409 0.5541

Table 2: Performance of GATE on In-House MULTICOM4 CASP16 models

Method Corrᵖ (TM-score) Corrˢ (TM-score) Ranking Loss (TM-score) AUC (TM-score) Corrᵖ (Oligo-GDT-TS) Corrˢ (Oligo-GDT-TS) Ranking Loss (Oligo-GDT-TS) AUC (Oligo-GDT-TS)
PSS 0.3947 0.2523 0.1388 0.6384 0.3385 0.2495 0.1582 0.6282*
AlphaFold plDDT_norm 0.3806 0.2731 0.1334 0.6557 0.3663 0.2557 0.1206 0.6587
DProQA_norm -0.0507* 0.0112* 0.1942* 0.5689* 0.0319* 0.0709* 0.2225 0.5874
VoroIF-GNN-score_norm 0.0648* 0.1157* 0.1929* 0.5995 0.1143* 0.1704 0.2066 0.6222
Avg-VoroIF-GNN-res-pCAD_norm 0.0729* 0.1046* 0.1669 0.5887* 0.0744* 0.1374* 0.2044 0.6155
VoroMQA-dark global_norm 0.0385* 0.1443 0.1286 0.6094 -0.0126* 0.1456 0.1626 0.6220
GCPNet-EMA_norm 0.3597 0.2491 0.1345 0.6431 0.3555 0.2642 0.1691 0.6476
GATE-Ensemble 0.4083 0.2774 0.1327 0.6469 0.3801 0.2989 0.1626 0.6475

Table 3: Performance of GATE models and other methods on the CASP15 dataset

Method Corrᵖ (TM-score) Corrˢ (TM-score) Ranking Loss (TM-score) AUC (TM-score) Corrᵖ (DockQ) Corrˢ (DockQ) Ranking Loss (DockQ) AUC (DockQ)
CASP15 EMA Predictors
VoroMQA-select-2020 0.3944* 0.3692* 0.1735* 0.6663* 0.4322* 0.4044 0.2682 0.6741
ModFOLDdock 0.5161* 0.4356* 0.1841 0.6721* 0.5622 0.5185 0.2181 0.7022
ModFOLDdockS 0.4717* 0.3614* 0.2199* 0.6333* 0.4068* 0.4073 0.3119* 0.6632
MULTICOM_qa 0.6678* 0.5260 0.1472 0.7059 0.5256 0.4668 0.2661 0.6748
MULTICOM_egnn 0.1437* 0.1179* 0.2611* 0.5956* 0.2158* 0.2283* 0.2943* 0.6302
VoroIF 0.4645* 0.3069* 0.1568* 0.6472* 0.5039 0.3455* 0.2297 0.6447
ModFOLDdockR 0.5333* 0.4040* 0.2160* 0.6626* 0.5357 0.4673 0.2623 0.6787
Bhattacharya 0.3803* 0.3438* 0.2220* 0.6495* 0.3581* 0.3190* 0.3475* 0.6392*
MUFold2 0.5370* 0.2662* 0.2374* 0.6168* 0.3846* 0.1839* 0.3850* 0.5913*
MUFold 0.5435* 0.2714* 0.2267* 0.6252* 0.3856* 0.1356* 0.3457* 0.5865*
ChaePred 0.4706* 0.3507* 0.2311* 0.6592* 0.4381* 0.3545* 0.3565* 0.6615
Venclovas 0.4677* 0.3828* 0.1249 0.6756* 0.5288 0.4506 0.1828 0.6890
Other Methods (normalized if applicable)
PSS 0.7292 0.5755 0.1406 0.7137 0.5118 0.4469 0.2648 0.6660
AlphaFold plDDT_norm 0.2578* 0.2611* 0.1793 0.6399* 0.1710* 0.1886* 0.2615* 0.6165*
DProQA_norm 0.1598* 0.1174* 0.2555* 0.5942* 0.2109* 0.2255* 0.3162* 0.6248
VoroIF-GNN-score_norm 0.1972* 0.0966* 0.2092* 0.5695* 0.2283* 0.1335* 0.2935* 0.5704*
Avg-VoroIF-GNN-res-pCAD_norm 0.1335* -0.0027* 0.1737 0.5525* 0.1049* -0.0030* 0.2284 0.5522*
VoroMQA-dark global_norm 0.0253* 0.0037* 0.1265 0.5580* -0.0670* -0.0316* 0.2191 0.5476*
GCPNet-EMA_norm 0.3216* 0.2696* 0.2052* 0.6379* 0.1862* 0.1803* 0.2830* 0.6198*
GATE Models
GATE-Basic 0.7447 0.5722 0.1127 0.7181 0.5330 0.4345 0.2348* 0.6703
GATE-GCP 0.7453 0.5788 0.1186 0.7191 0.5358 0.4389* 0.2083 0.6715
GATE-Advanced 0.7224* 0.5416* 0.1018 0.6981* 0.5142 0.4298 0.2112 0.6618
GATE-Ensemble 0.7480 0.5754 0.1191 0.7194 0.5353 0.4477 0.2140 0.6756
GATE Ablation Variants
GATE-Basic (w/o subgraph sampling) 0.7169 0.5478* 0.1266 0.7067 0.5063* 0.4145* 0.2620* 0.6528*
GATE-GCP (w/o subgraph sampling) 0.7503 0.5771 0.1363 0.7278 0.5253 0.4394 0.2545* 0.6773
GATE-Advanced (w/o subgraph sampling) 0.7158* 0.5403* 0.1224 0.7043* 0.4975* 0.4286 0.2478* 0.6616*
GATE-Basic (w/o pairwise loss) 0.6881* 0.5534 0.1329 0.7183 0.5226 0.4498 0.2451 0.6796
GATE-GCP (w/o pairwise loss) 0.6923* 0.5392* 0.1516* 0.7051 0.4974* 0.4062* 0.2604* 0.6582*
GATE-Advanced (w/o pairwise loss) 0.6756* 0.5176* 0.1588* 0.6961* 0.4982 0.4170 0.2538* 0.6617
GATE-NoSingleEMA 0.6570* 0.4832* 0.1511* 0.6927 0.4987* 0.3967* 0.2986* 0.6681

Installation

Clone the Repository

git clone -b public https://github.com/BioinfoMachineLearning/gate
cd gate

Install Mamba

wget "https://github.com/conda-forge/miniforge/releases/download/23.1.0-3/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh 
rm Mambaforge-$(uname)-$(uname -m).sh
source ~/.bashrc  

Install tools

cd tools

# Install GCPNet-EMA
git clone https://github.com/BioinfoMachineLearning/GCPNet-EMA
mkdir GCPNet-EMA/checkpoints
wget -P GCPNet-EMA/checkpoints/ https://zenodo.org/record/10719475/files/structure_ema_finetuned_gcpnet_i2d5t9xh_best_epoch_106.ckpt

# Install EnQA
git clone https://github.com/BioinfoMachineLearning/EnQA
chmod -R 755 EnQA/utils

# Install DProQA
git clone https://github.com/jianlin-cheng/DProQA

# Install Venclovas QAs
git clone https://github.com/kliment-olechnovic/ftdmp

# Install CDPred
git clone https://github.com/BioinfoMachineLearning/CDPred

# Install openstructure
docker pull registry.scicore.unibas.ch/schwede/openstructure:latest
# or
singularity pull docker://registry.scicore.unibas.ch/schwede/openstructure:latest

Set Up Python Environments

# Install python enviorment for gate
mamba install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
mamba install -c dglteam dgl-cuda11.0
mamba install pandas biopython

# Install python enviorment for GCPNet-EMA
mamba env create -f tools/GCPNet-EMA/environment.yaml
mamba activate GCPNet-EMA
pip3 install -e tools/GCPNet-EMA
pip3 install prody==2.4.1
pip3 uninstall protobuf
mamba deactivate

# Install python enviorment for EnQA
mamba env create -f envs/enqa.yaml

# Install python enviorment for DProQA
mamba env create -f envs/dproqa.yaml

# Install python enviorment for VoroMQA
mamba env create -f envs/ftdmp.yaml

# Install python enviorment for CDPred
mamba env create -f envs/cdpred.yaml

Download databases (~2.5T)

mkdir databases

# Create virtual links if the databases are stored elsewhere
sh scripts/download_bfd.sh databases/
sh scripts/download_uniref90.sh databases/

Configuration

* Replace the contents for the ROOTDIR in gate/feature/config.py with your installation path

* Set use_docker to False if using Singularity instead of Docker.

Usage

To run the GATE tool for estimating protein multimer structure accuracy, use the inference_multimer.py script with the following arguments:

Required Arguments:

  • --fasta_path FASTA_PATH

    The path to the input FASTA file containing the protein sequences.

  • --input_model_dir INPUT_MODEL_DIR

    The directory containing the input protein models.

  • --output_dir OUTPUT_DIR

    The directory where the output results will be saved.

Optional Arguments:

  • --pkldir PKLDIR

    The directory where intermediate pickle files will be stored.

  • --use_af_feature USE_AF_FEATURE

    Specify whether to use AlphaFold features. Accepts True or False. Default is False.

  • --sample_times SAMPLE_TIMES Number of times to sample the models. Default is 5.

Example Commands:

Here are examples of how to use the inference_multimer.py script with different settings:

  1. Not using AlphaFold Features (default)

    python inference_multimer.py --fasta_path $FASTA_PATH --input_model_dir $INPUT_MODEL_DIR --output_dir $OUTPUT_DIR
    
  2. Using AlphaFold Features

    python inference_multimer.py --fasta_path $FASTA_PATH --input_model_dir $INPUT_MODEL_DIR --output_dir $OUTPUT_DIR --pkldir $PKLDIR --use_af_feature True

Citing This Work

If you find this work useful, please cite:

Liu, J., Neupane, P., & Cheng, J. (2025). Estimating Protein Complex Model Accuracy Using Graph Transformers and Pairwise Similarity Graphs. bioRxiv, 2025-02 (https://www.biorxiv.org/content/10.1101/2025.02.04.636562v1)

@article {Liu2025.02.04.636562,
	author = {Liu, Jian and Neupane, Pawan and Cheng, Jianlin},
	title = {Estimating Protein Complex Model Accuracy Using Graph Transformers and Pairwise Similarity Graphs},
	elocation-id = {2025.02.04.636562},
	year = {2025},
	doi = {10.1101/2025.02.04.636562},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://doi.org/10.1101/2025.02.04.636562},
	journal = {bioRxiv}
}

About

Graph transformer for estimating protein model accuracy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •