-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pocket representation #103
Labels
Comments
Building the pocket datasetAssumes that we have a normal dataset built already. 1. get and mask for pockets with KLIFSThis should be done first on the login node since it queries the KLIFS database for the sequences and caches them locally. Then we can use the # building pocket datasets:
from src.utils.pocket_alignment import pocket_dataset_full
import shutil
import os
data_dir = '/cluster/home/t122995uhn/projects/data/'
db_type = ['kiba', 'davis']
db_feat = ['nomsa_binary_original_binary', 'nomsa_aflow_original_binary',
'nomsa_binary_gvp_binary', 'nomsa_aflow_gvp_binary']
for t in db_type:
for f in db_feat:
print(f'\n---{t}-{f}---\n')
dataset_dir= f"{data_dir}/DavisKibaDataset/{t}/{f}/full"
save_dir = f"{data_dir}/v131/DavisKibaDataset/{t}/{f}/full"
pocket_dataset_full(
dataset_dir= dataset_dir,
pocket_dir = f"{data_dir}/{t}/",
save_dir = save_dir,
skip_download=True
)
2. resplit the database:import os
from src.data_prep.init_dataset import create_datasets
from src import cfg
import logging
cfg.logger.setLevel(logging.DEBUG)
dbs = [cfg.DATA_OPT.davis, cfg.DATA_OPT.kiba]
splits = ['davis', 'kiba']
splits = ['/cluster/home/t122995uhn/projects/MutDTA/splits/' + s for s in splits]
print(splits)
#%%
for split, db in zip(splits, dbs):
print('\n',split, db)
create_datasets(db,
feat_opt=cfg.PRO_FEAT_OPT.nomsa,
edge_opt=[cfg.PRO_EDGE_OPT.binary, cfg.PRO_EDGE_OPT.aflow],
ligand_features=[cfg.LIG_FEAT_OPT.original, cfg.LIG_FEAT_OPT.gvp],
ligand_edges=cfg.LIG_EDGE_OPT.binary, overwrite=False,
k_folds=5,
test_prots_csv=f'{split}/test.csv',
val_prots_csv=[f'{split}/val{i}.csv' for i in range(5)],)
# data_root=os.path.abspath('../data/test/')) 3. test inference#%%
from src import cfg
from src.utils.loader import Loader
# db2 = Loader.load_dataset(cfg.DATA_OPT.davis,
# cfg.PRO_FEAT_OPT.nomsa, cfg.PRO_EDGE_OPT.aflow,
# path='/cluster/home/t122995uhn/projects/data/',
# subset="full")
db2 = Loader.load_DataLoaders(cfg.DATA_OPT.davis,
cfg.PRO_FEAT_OPT.nomsa, cfg.PRO_EDGE_OPT.aflow,
path='/cluster/home/t122995uhn/projects/data/v131',
training_fold=0,
batch_train=2)
for b2 in db2['train']: break
# %%
m = Loader.init_model(cfg.MODEL_OPT.DG, cfg.PRO_FEAT_OPT.nomsa, cfg.PRO_EDGE_OPT.aflow,
dropout=0.3480, output_dim=256,
)
#%%
# m(b['protein'], b['ligand'])
m(b2['protein'], b2['ligand']) |
jyaacoub
added a commit
that referenced
this issue
Aug 7, 2024
… index renumbering #103 - Had to make some modifications since edge index needs to be updated after applying the mask so that it still points to the right nodes and we dont get something like an "IndexError" for being out of bounds - Also error due to not removing all proteins without pocket sequences (line 216 saved the old dataset instead of the new one). - Successfully built pocket datasets for davis and kiba #131 #103
jyaacoub
added a commit
that referenced
this issue
Aug 16, 2024
jyaacoub
added a commit
that referenced
this issue
Aug 20, 2024
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Pocket-only representation
To make sure we dont have to build entirely seperate datasets for the pocket representation, this implementation should just get index positions for our binding pocket and then just apply a mask to the original graph (similar to how it is done with the
dropout_node
function inpytorch_geometric
).Task list:
KLIFS Database
This is used by KBDNet to get the binding pockets for davis and kiba "pocket is 85 residues long".
The sequence given from KLIFS is not contiguous and only contains relevant pocket residues.
After we get our list of index positions for the binding pocket AA we can modify our existing graph by applying a mask
Getting pockets for Kiba
/kinase_ID
API.Getting pockets for davis:
Same as for kiba, but we use the raw Gene Name code (need to remove any mutation or phosphorylation information):
ABL1(F317I)p
->ABL1
1. For example: https://klifs.net/api/kinase_ID?kinase_name=ABL1&species=HUMAN returns:
However for mutated genes we must be careful with the sequence alignment, and must follow the following procedure to get the right amino acid index positions:
then we just use these positions for our mask on the original (mutated) graph.
The text was updated successfully, but these errors were encountered: