Skip to content

A4Bio/FoldGPT_open

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Logo

FoldGPT: Conditional Protein Structure Generation with GPT model

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Downstream Tasks
  5. Dataset
  6. License
  7. Contact
  8. Citation

About The Project

This project aims to generate protein structures using FoldLanguage via a GPT model. Here's why we introduce FoldGPT:

  • Condition Token: encoding full information (seq, struct, and func) of the known residues.
  • Prompt Token: encoding partial information (seq or func) of residues.
  • Mask Token: used for learning the feature of unkown residues.

Currently, we only encode the structural vq_id as conditional features, while leaving sequence and function conditions as future work to serve as a multimodal generative model.

(back to top)

Getting Started

conda env create -f environment.yml

Usage

Unconditional Structure Generation

export PYTHONPATH=project_path

CUDA_VISIBLE_DEVICES=0 python sampling.py --save_path results/unconditional --config model_zoom/unconditional/config.yaml --checkpoint model_zoom/unconditional/params.ckpt --temperature 0.5 --length 150 --nums 20 --mask_mode unconditional

One can use this script to generate protein structures from noise. The molel will save nums generated pdbs in save_path, the sampling temperature is 0.5, the protein contains 150 residues. The maximum generative length is PAD//2=256. Thanks to the RoPE, we can slightly extend the length.

Length Fig1 Fig2 Fig3
50 ref ref ref
100 ref ref ref
200 ref ref ref
300 ref ref ref

Conditional Structure Generation

CUDA_VISIBLE_DEVICES=0 python sampling.py --save_path results/conditional --config model_zoom/conditional/config.yaml --checkpoint model_zoom/conditional/params.ckpt --temperature 0.5 --length 150 --nums 20 --mask_mode conditional --template 8vrwB.pdb --mask 39-51,85-98

One can use this script to generate protein structures from noise. The molel will save nums generated pdbs in save_path, the sampling temperature is 0.5. The structure template is xxx.pdb, where residues in 39-51 and 85-98 are masked.

Name Fig Comment
8vrwB_ref ref reference structure
8vrwB_inpaint1 ref inpainting residues from 20 to 30
8vrwB_inpaint2 ref inpainting residues from 60 to 80
8vrwB_inpaint3 ref inpainting residues from 110 to 140
8vrwB_loop_design ref loop design
scaffolding1 ref scaffolding
scaffolding2 ref scaffolding
scaffolding3 ref scaffolding

(back to top)

Dataset & Model

TODO

License

Distributed under the Apache 2.0 license License. See LICENSE.txt for more information.

(back to top)

Contact

Zhangyang Gao - [email protected]

(back to top)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages