Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing molecules #42

Open
dominiquesydow opened this issue Aug 26, 2020 · 3 comments
Open

Parsing molecules #42

dominiquesydow opened this issue Aug 26, 2020 · 3 comments
Labels
module-io Concerns opencadd.io module

Comments

@dominiquesydow
Copy link
Contributor

dominiquesydow commented Aug 26, 2020

I am wondering if we want to set up a module that allows us to parse structural data (proteins, ligands, ...) from different file formats (mol2, pdb, ...) to different output formats (mdanalysis universes, biopandas DataFrames, rdkit molecules, ...).

What we currently have:

https://github.com/volkamerlab/opencadd/blob/master/opencadd/structure/core.py
mdanalysis universes from a plethora of input formats (super powerful).

https://github.com/volkamerlab/opencadd/blob/databases_klifs_api/opencadd/databases/klifs/parser.py
biopandas DataFrames and rdkit molecules from mol2 files or from mol2 text (mol2 file content, e.g. when queried from a database).
I use this data structure for the opencadd.databases.klifs and opencadd.structure.subpocket modules as well as in the kissim and ratar projects.

Such a module could live at the top level, e.g. opencadd.parser.

@jaimergp - have you thought of something like this already?
Definitely something we could discuss with a whole group to collect everybody's needs.

@jaimergp
Copy link
Contributor

Yep, we will need opencadd.io eventually. We could explicitly load with the parsers there in, or subtly delegate the calls from the main core objects (structure, compound, dataset, etc). Ideally, we don't need to use many different objects so we can all converge in a single object model that works across the different levels. More realistically, we will have to set-up exporters to deal with the different package needs...

We can schedule this as part of the discussion, possibly roping in the members of @openkinome/kinoml as well.

@dominiquesydow
Copy link
Contributor Author

That sounds fantastic!

I am unsure how to proceed until we have the envisioned quite powerful opencadd.io module.

I was planning on using the classes Mol2ToDataFrame and lateron PdbToDataFrame from here quite a lot now in the modules opencadd.databases.klifs, opencadd.structure.subpocket, kissim, and ratar.

Shall I already set up an opencadd.io module with that minimum set of classes, which will be refactored/generalized as soon as we get started with the real deal?

@jaimergp
Copy link
Contributor

Yes, please add them there so at least they are in the same place. Something like opencadd/io/dataframes.py should work to contain those functions!

@dominiquesydow dominiquesydow added the module-io Concerns opencadd.io module label Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module-io Concerns opencadd.io module
Projects
None yet
Development

No branches or pull requests

2 participants