ProtGNN: Learned biomedical context for proteins in a knowledge graph.

This repository hosts the official implementation of ProtGNN, a model for learning protein representations that encode biomedical domain information about proteins using a biomedical knowledge graph.

Training

training_script.py is the main script to pretrain and finetune ProtGNN. Be sure to changes file paths and wandb login parameters within this file before running. Arguments:

-p/--pretrain: True or False (False by default)
-f/--finetune: True or False (False by default)
-e/--eval: True or False (False by default)
-h/--hyperparameter_tuning: True or False (False by default)
--n_inp: number of input dimensions (None by default)
--n_hid: number of hidden dimensions (None by default)
--n_out: number of output dimensions (None by default)

The script can be run from the command line or in a bash script for pretraining like this:

python training_script.py -p

or for finetuning like this:

python training_script.py -f

Embedding Space Visualization

Function to visualize the embedding spaces are found in visualize_utils.py. Label options include 'biological_process' and 'molecular_function'.

An example of visualizing the embedding space by 'biological process' labels:

from txgnn import TxData
from visualize_utils import GO, Embeddings, visualize_pipeline

embed_path = '/PATH/TO/embeddings.pkl'

TxData_inst = TxData(data_folder_path = '/PATH/TO/PrimeKG/')
TxData_inst.prepare_split(split = 'random', seed = 42, no_kg = False)

visualize_pipeline(embed_path=embed_path, node_type = 'biological_process', TxData_inst=TxData_inst, kmeans=True)

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
embeddings		embeddings
protgnn		protgnn
training_logs		training_logs
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
mapping_file.csv		mapping_file.csv
training_script.py		training_script.py
visualize_utils.py		visualize_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProtGNN: Learned biomedical context for proteins in a knowledge graph.

Training

Embedding Space Visualization

About

Releases

Packages

Languages

License

emmatysinger/ProtGNN

Folders and files

Latest commit

History

Repository files navigation

ProtGNN: Learned biomedical context for proteins in a knowledge graph.

Training

Embedding Space Visualization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages