-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to remove sites and connections from a topology #761
Conversation
Okay, so my main worry in this is it may render a lot of GMSO operations extremely slow. I'm going to do some time testing in a bit here, but just as an idea of how this may affect some of the GMSO operations. We're essentially storing a lot of redundant information of the topology, so we'll have to keep our eyes peeled if this is a significant impact to other operations. For instance, in a typical polyethylene structure: My suggestion is to add a method to the topology that you can call on. def get_connections_by_site(self, site, connsList=None):
if connsList is None:
connsList = ["bonds", "angles", "dihedrals", "impropers"]
for connStr in connsList:
countsDict[connectStr] = []
for conn in getattr(self, connStr):
if site in conn.connection_members:
countsDict[connStr].append(conn] From here, if you wanted to delete a site, then you just get its connections and do self.remove_connection(connection). I'm happy to hear more feedback, but I think as a whole it is much preferred to modify the topology on the mBuild side, where we don't have a ton of other overhead attached to the topology. Then once things are in GMSO, modifying the structure should be a rare occurrence. I also want to avoid slowing down building up the topology as much as possible, so we can keep the writers as streamlined and efficient as possible. |
Time testing:
So the big issue timewise seems to be a 10X reduction in speed for the polymer case when going from mBuild to GMSO. Although that may get worse because site.connections is not being updated after identify_connections is run. |
These are good points @CalCraven! Thanks for doing some performance analysis as well. While I like the idea of having quick access to connections from a site's perspective, I'm not totally set on doing it the way the PR currently does (automatically doing the book-keeping behind the scenes). I also like your idea of using something like the
It should be getting updated after running For example in the below.
|
I agree about avoiding modifying the topology on the GMSO side, but I think one common use case of doing this is removing hydrogen atoms. If we do that on the mBuild side before converting to a GMSO topology, then I believe SMARTS matching no longer works. |
Ok, I just removed the automatic book-keeping on a site's connections and instead added the Here is the updated gist showing what the workflow might look like if someone wants to remove sites and their connections. Let me know what you think. Once we all agree to the general direction of this, I can finish unit tests, doc strings, etc.. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #761 +/- ##
==========================================
+ Coverage 91.85% 91.89% +0.03%
==========================================
Files 66 66
Lines 6702 6733 +31
==========================================
+ Hits 6156 6187 +31
Misses 546 546
☔ View full report in Codecov by Sentry. |
Generally speaking this looks great.
I see this as a possible workflow, but maybe not a preferred one. A better way would be to use a forcefield with no hydrogens. SMARTS should work fine if there are no hydrogens. For instance, the SMARTS string for methane may go from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe you could put some tests together now. We definitely need a check that site is in self._sites, and raise an Error if it's already not there. I also think we could make some of these methods private, so users aren't accessing them all the time. I'd like to hear some feedback on that though. I'm generally of the opinion that Topology
is too bulky in terms of methods and attributes, and probably could be cleaned up so only the most important user methods are exposed.
That's a good point about changing the SMARTS strings to not include hydrogens, but this may be unwieldy for someone using a large built in forcefield like oplsaa, or gaff via GAFF-Foyer. I just don't think the barrier to running united atom simulations with an "off-the-shelf" forcefield should be that high. Whether or not that is a good idea or not is up to the user IMO. I think the complete workflow as intended works out nicely, at least with Hoomd. In the example of the gist, if I apply a forcefield, remove hydrogens, then write out the hoomd forcefield objects, I get exactly what I would expect. i.e. Only the carbon atom types are in the LJ parameters, and the angle and dihedral potentials that would have been there in the all-atom system are not written out after removing hydrogens. |
@chrisjonesBSU That's fair, I'm not very familiar with coarse-graining. I don't think I would trust an off the shelf force field developed WITH hydrogens, and expect it to still work when removing the hydrogens and setting all of the charges to 0. But maybe I'm paranoid and this has been well tested somewhere? |
Well, I'm not trying to make the argument that using AA force fields with UA systems is generally a good practice, and tested. But, I think having a workflow of starting with an AA forcefield and modifying parameters on the fly to fit whatever UA system and/or properties you're studying is a lot more approachable than first having to create a new xml file and SMARTS strings. I've come across a few papers where they essentially do this, and it is a great example of how something like gmso can be used as opposed to editing parameters in text files. |
@CalCraven After re-reading your comment in the issue, I replaced the I'll start working on some unit tests and doc strings. |
You can disregard this (see comment below) This should be good to go. There is still possibly one thing to consider. How do we want to handle it when someone uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you Chris!
I added a minor comment and some questions.
I must have overlooked this (regarding my last comment about raising errors). I made it so that an error is raise by these I agree about not making the Topology class too bulky, but IMO it seems inconsistent to make the |
I have a question regarding expected behavior of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Regarding my previous question, I think since the Bond is not a pre-requisite for the creation of Angle/Dihedral/Improper, I think the current behavior should be fine (but it's open for discussion if anyone have any opinion about it).
I am rerunning the failed bleeding test, need to see if that's something we need to address before merging this PR. |
I think it's a good question. Really, the goal of these new methods isn't to encourage people to start modifying the topology in GMSO. My thought behind the Ultimately, I agree with @CalCraven that maybe these methods should be private along with the |
This is still a work in progress, but this PR does a couple things:
1. Within the
Site
class, we can access all the connections a site is directly involved in.Site.connections
property is updated anytimeTopology.add_connection
is used as well as any timeTopology.remove_connection
is used.EDIT: After some conversation; this PR instead adds a method to
Toplogy
that returns a dict of all the connections that belong to a site.2. Adds methods for removing a site and a connection from a topology.
This addresses #736 and #161 and should replace PR #370
This is still a WIP, I need to add unit tests and doc strings, and definitely need some feedback in terms of syntax, functionality and to make sure I'm following good pydantic practices. I'm interested what everyone else thinks.
Here is a gist showing an example.