-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About bioactivity anotations #11
Comments
Hi Cesar, thanks for your questions!
The preprocessed models and datasets are located on Zenodo. Let me know, if you are interested in a script which lets you apply the models to a (e.g. mol2) file of pocket and ligand.
The activity value for a
In principle you need to map the pytorch geometric objects to the corresponding entry in the kinodata-3D dataframe. On this point I am not quite sure how the idents relate. Maybe @joschka-gross can help out with this point. Best, |
Hi @mbackenkoehler, Thanks a lot for the kind response. About the script, yes, please. I would like to do the inference with other molecules and/or binding sites. An for the rest of questions I think your answers clarified them. Would be nice to have a direct way to track the native protein and ligands in the database. Best wishes, Cesar |
Hi Cesar! I changed the dataset processing such that the dataset will now include the chembl_activity ID and the KLIFS structure ID. If you wish to add this information to an older processed version of the dataset, you can follow the new example in examples/patch_dataset_with_chembl_ids.ipynb. PR #12 will add this fix on the main branch. Best, |
Hi @joschka-gross, Thanks for this. Now it shoud be easier to track everything. I have tried the example notebook but I got this error, not sure if i did it correctly. I just reeplaced the dataset.py and patch_with_data_source.py scripts. However, I got this error which I am not sure where it comes from. Any help would be very appreciated and again, thanks for having a look on my request. This error comes in the part where I pull the dataframe. df = dataset.df ####### Reading data frame from /kinodata-3D-affinity-prediction/data/raw/kinodata_docked_v2.sdf.gz... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3244/3244 [00:08<00:00, 387.83it/s] ValueError Traceback (most recent call last) File [/miniconda3/envs/kinodata/lib/python3.10/functools.py#line=980), in cached_property.get(self, instance, owner) File [/kinodata-3D-affinity-prediction/kinodata/data/dataset.py#line=342), in KinodataDocked.df(self) File [/kinodata/data/dataset.py#line=148), in process_raw_data(raw_dir, file_name, remove_hydrogen, pocket_dir, pocket_sequence_file, activity_type_subset) File [/kinodata-3D-affinity-prediction/kinodata/data/dataset.py#line=149), in (.0) ValueError: invalid literal for int() with base 10: '.ipynb' Best wishes, |
I am following the tutorial for inference with some pretrained models, however I struggle to see the test dataset, where is this file and what type of infromation is neded for inference?
In the other notebook for data splitting, it is loaded the KinodataDocked dataset which I understand has been already curated by RMSD, however when I look the structure of the data set I can not see where is the anotation of the bioactivity, the only thing there is the class of bioactivity which is pIC50. Just wondering if the actual value os hidden in the nodes or edges. I am just trying to understand the whole data.
Finally, I was wondering if those datasets have some information about where the pocket comes from, which protein belongs to.
Thanks a lot
Cesar
The text was updated successfully, but these errors were encountered: