refactor: `Query` classes and related code #492

DaniBodor · 2023-09-06T13:45:28Z

In this PR:

Query class has been converted into a dataclass and all common arguments are defined in the parent class instead of the children.
ProteinProteinInterfaceResidueQuery and ProteinProteinInterfaceAtomicQuery were merged into a single child dataclass of Query, as were SingleResidueVariantResidueQuery and SingleResidueVariantAtomicQuery
- resolution is now given as an attribute instead of being a separate class
Moved as much code as possible into parent class
- unbound functions were unified and made internal methods of parent.
- unused/obsolete functions were removed:
  - code related to adding hydrogens
  - _get_atom_node_key function
  - potentially more that I've lost track of
- build function is largely defined in parent class, with specifics defined in child class as internal _build_helper methods.
  - it feels like build function could be further unified, but my tests doing this kept failing.
Reorganized module
- QueryCollection at the bottom now
Update external modules to match this
- tests
- docs
- notebooks
- etc

Not yet done:

Unify build_atomic_graph and build_residue_graph from buildgraph.py (Refactor buildgraph module #506)
Separate radius from distance_cutoff for PPI Queries (feature: separate max edge distance from interaction radius throughout #504)

fixes: #480 and #490

Final TO DO list:

DaniBodor · 2023-11-01T13:23:15Z

I fixed the attribute issue above in 3cd2a8c. Can you take a look and see if this is what you were thinking?

Can you also take a look at the TODOs I put in lines 250 and 335 (same issue) and 462 of query.py and give me your opinion on those.

When that is done, I can finally merge this PR :)

Note:

re the TODO in line 38: I will leave as is now and remove the #TODO
re the TODO in line 525: I made a new issue (Make MapMethod an attribute of GridSettings in deeprank2.grid module #516) for this and will remove the #TODO

DaniBodor · 2023-11-01T14:13:01Z

Also, should we remove codacy with this PR?

no, let's remove it in a separate PR? This one is already so big and unwieldly, that I don't want to add anything that doesn't need to be here.

gcroci2 · 2023-11-01T15:52:29Z

Also, should we remove codacy with this PR?

no, let's remove it in a separate PR? This one is already so big and unwieldly, that I don't want to add anything that doesn't need to be here.

Done in #517

gcroci2 · 2023-11-01T16:45:29Z

I fixed the attribute issue above in 3cd2a8c. Can you take a look and see if this is what you were thinking?

Can you also take a look at the TODOs I put in lines 250 and 335 (same issue) and 462 of query.py and give me your opinion on those.

When that is done, I can finally merge this PR :)

Note:

re the TODO in line 38: I will leave as is now and remove the #TODO

re the TODO in line 525: I made a new issue (Make MapMethod an attribute of GridSettings in deeprank2.grid module #516) for this and will remove the #TODO

line 250: I think it's actually fine to throw an error since only 1 chain identifier is needed to identify the chain where the variant is. If more are given then the user hasn't understood the purpose of chain_ids.
line 335: same
line 462: I don't see any TODO.

One final thing: looking at those lines made me think about something I commented earlier about, but was lost someway. I think it should be clearer that chain_ids refers to the chains involved in the definition of whatever query (e.g., the chain identifier of the variant), but does not limit the usage of other eventual chains present in the pdb files. It could be indeed that the radius or the cutoff distance defined is enough to include other chains, not considered in the chain_ids variable because its purpose is another one. So, for example in SingleResidueVariantQuery, the doc string says the chain identifier(s) in the pdb file (generally a single capital letter) but I think it should be something like the chain identifier of the variant residue in the pdb file (generally a single capital letter). For ProteinProteinInterfaceQuery, it could be something like the two chain identifiers of the interface in the pdb file (generally single capital letters).

Also, if there are more things to discuss let's just have a meeting because I have the impression that finding stuff in this PR is becoming very complex and inefficient ;)

DaniBodor · 2023-11-03T10:40:36Z

I fixed the attribute issue above in 3cd2a8c. Can you take a look and see if this is what you were thinking?

Have you looked at this also?

Can you also take a look at the TODOs I put in lines 250 and 335 (same issue) and 462 of query.py and give me your opinion on those.

line 462: I don't see any TODO.

This is about error catching in _process_one_query:
Right now, if there's an error, it's caught and that query is skipped. My guess is that this was put in place so that it didnt crash due to a handfull of faulty pdb files.
My question was whether we want to make this optional, and by default raise an error. I think generally, users should have clean data to work with, and if not pro-actively set this to ignore errors. What do you think?

One final thing: looking at those lines made me think about something I commented earlier about, but was lost someway. I think it should be clearer that chain_ids refers to the chains involved in the definition of whatever query (e.g., the chain identifier of the variant), but does not limit the usage of other eventual chains present in the pdb files. It could be indeed that the radius or the cutoff distance defined is enough to include other chains, not considered in the chain_ids variable because its purpose is another one. So, for example in SingleResidueVariantQuery, the doc string says the chain identifier(s) in the pdb file (generally a single capital letter) but I think it should be something like the chain identifier of the variant residue in the pdb file (generally a single capital letter). For ProteinProteinInterfaceQuery, it could be something like the two chain identifiers of the interface in the pdb file (generally single capital letters).

Clarified in 23357f0

prospector suddenly flagged a lot of code that wasnt touched here. Probably because I updated my local versions for prospector and pylint. I now also updated the versions in the toml to match my versions.

also list internal `QueryCollection` attributes in init also change "atomic" to "atom" to be consistent with "residue" when setting `resolution` also remove some TODOs and made some style improvements

DaniBodor force-pushed the 480_new branch 4 times, most recently from db19653 to fdddf7a Compare September 7, 2023 12:58

DaniBodor closed this Sep 7, 2023

DaniBodor reopened this Sep 8, 2023

DaniBodor mentioned this pull request Sep 8, 2023

refactor: Query classes and related code #489

Closed

DaniBodor linked an issue Sep 8, 2023 that may be closed by this pull request

Is _get_atom_node_key obsolete? #490

Closed

DaniBodor force-pushed the 480_new branch 18 times, most recently from a1f99f3 to 9fe94a5 Compare September 18, 2023 14:33

DaniBodor linked an issue Sep 18, 2023 that may be closed by this pull request

Refactor Query classes #480

Closed

DaniBodor force-pushed the 480_new branch from 9fe94a5 to 0ac01c5 Compare September 18, 2023 15:07

DaniBodor mentioned this pull request Sep 21, 2023

calculate atom pairwise energies (combination between LJ and Coulomb) using OpenMM #501

Closed

DaniBodor force-pushed the 480_new branch from b6b6203 to 0ac01c5 Compare September 22, 2023 11:30

DaniBodor force-pushed the 480_new branch 2 times, most recently from 47b97ee to 3cd2a8c Compare November 1, 2023 13:21

cbaakman approved these changes Nov 2, 2023

View reviewed changes

gcroci2 mentioned this pull request Nov 2, 2023

Regenerate class diagrams #519

Merged

gcroci2 approved these changes Nov 3, 2023

View reviewed changes

DaniBodor marked this pull request as draft November 3, 2023 14:02

DaniBodor added 7 commits November 3, 2023 15:38

make _load_pssm_data a method of DeepRankQuery

959f113

make _check_pssm a method of DeepRankQuery

ac42626

update type hinting in query.py to 3.10 format

8519280

move QueryCollection to end of module

99a2c24

refactor QueryCollection

519a960

define separate parent and child build methods

24af527

refactor child specific helper functions of build

3c6df95

DaniBodor force-pushed the 480_new branch from 9158dce to df6477b Compare November 3, 2023 14:38

DaniBodor added 4 commits November 3, 2023 15:51

update linter

b417b9d

prospector suddenly flagged a lot of code that wasnt touched here. Probably because I updated my local versions for prospector and pylint. I now also updated the versions in the toml to match my versions.

update docstrings and error messages

a2640b6

also list internal `QueryCollection` attributes in init also change "atomic" to "atom" to be consistent with "residue" when setting `resolution` also remove some TODOs and made some style improvements

make distance cutoff uniform for query types

6480644

rename DeepRankQuery back to Query

c2151e7

DaniBodor force-pushed the 480_new branch from df6477b to c2151e7 Compare November 3, 2023 14:54

DaniBodor marked this pull request as ready for review November 3, 2023 16:22

gcroci2 linked an issue Nov 6, 2023 that may be closed by this pull request

Update prospector and pylint in pyproject.toml #505

Closed

DaniBodor merged commit 97db708 into dev Nov 7, 2023

DaniBodor deleted the 480_new branch November 7, 2023 10:30

DaniBodor mentioned this pull request Nov 14, 2023

feat: improve Trainer and DeeprankDataset logic for production testing #515

Merged

gcroci2 added the SS label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: `Query` classes and related code #492

refactor: `Query` classes and related code #492

DaniBodor commented Sep 6, 2023 •

edited

Loading

DaniBodor commented Nov 1, 2023 •

edited

Loading

DaniBodor commented Nov 1, 2023

gcroci2 commented Nov 1, 2023

gcroci2 commented Nov 1, 2023 •

edited

Loading

DaniBodor commented Nov 3, 2023

refactor: Query classes and related code #492

refactor: Query classes and related code #492

Conversation

DaniBodor commented Sep 6, 2023 • edited Loading

DaniBodor commented Nov 1, 2023 • edited Loading

DaniBodor commented Nov 1, 2023

gcroci2 commented Nov 1, 2023

gcroci2 commented Nov 1, 2023 • edited Loading

DaniBodor commented Nov 3, 2023

refactor: `Query` classes and related code #492

refactor: `Query` classes and related code #492

DaniBodor commented Sep 6, 2023 •

edited

Loading

DaniBodor commented Nov 1, 2023 •

edited

Loading

gcroci2 commented Nov 1, 2023 •

edited

Loading