Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecule numbers from AMBER parmtop/inpcrd start with 2 #369

Open
halx opened this issue Feb 15, 2022 · 7 comments
Open

Molecule numbers from AMBER parmtop/inpcrd start with 2 #369

halx opened this issue Feb 15, 2022 · 7 comments

Comments

@halx
Copy link
Collaborator

halx commented Feb 15, 2022

In the code below molecule numbers start from 2. OpenMMMD.py makes the assumption in centerSolute() that the solute is molnum 1. I get this with devel (commit 3371660).

import sys

import Sire.IO


amber = Sire.IO.Amber()
molecules, space = amber.readCrdTop(sys.argv[1], sys.argv[2])

for molnum in molecules.molNums():
    print(molnum.value())

inputs.tar.gz

@lohedges
Copy link
Member

lohedges commented Feb 15, 2022

Thanks @halx, I'll let @jmichel80 comment on this. One other thing that is worrying is that we should be able to specify the perturbable molecule using the perturbed residue number configuration option, so it doesn't need to be the first molecule in the system. (We have switched to using this in BioSImSpace due to some complications with some molecule re-ordering by SOMD.) If this isn't accounted for before calling centerSolute, then the wrong molecule will be centered.

I see the same thing if I let load the files using Sire.IO.MoleculeParser, which will use the more recent Sire.IO.AmberPrm7 and Sire.IO.AmberRst parsers behind the scenes, i.e.:

import Sire.IO

system = Sire.IO.MoleculeParser.read([sys.argv[1], sys.argv[2]])

for molnum in system.molNums():
    print(molnum)

which gives:

MolNum(2)
MolNum(3)
MolNum(4)
MolNum(5)
MolNum(6)
MolNum(7)
MolNum(8)
MolNum(9)
MolNum(10)
...

This means that it's not an issue with the old Sire.IO.Amber parser itself.

I'll check the 2021.1.0 release of Sire to see if the numbering offset is a present there too.

Cheers.

@halx
Copy link
Collaborator Author

halx commented Feb 15, 2022

Thanks. You will find that 2021.1.0 works just fine i.e. start at 1.

As far as OpenMMMD.py goes I understand there is a plan of rewriting it anyway. It suffers from code duplication too (see #324 ).

@chryswoods
Copy link
Member

The MolNum is an internal number that counts the order in which molecules are created in Sire. We now create a new test molecule when we first load Sire. This is MolNum(1). This means that the next molecule loaded (e.g. the first from an Amber file) will be MolNum 2.

The MolNum must be unique for each molecule in Sire. If you load a file twice, then the first molecule will have a MolNum of 1 higher than the number of molecules already loaded.

This means that you should not rely on the MolNum having any specific value (e.g. 1). Instead, if you need a number "1" for an external tool, you must create a dictionary that maps from MolNum to the number you want, and then use that number.

@chryswoods
Copy link
Member

In your case, a better approach would be to assume that the solute is the molecule with the lowest MolNum, as this would have been loaded first from the file.

@lohedges
Copy link
Member

Thanks, @chryswoods. I had assumed that it was the new test molecule that was causing the offset. I also agree that you should never rely on a molecule having a specific number. As you say, remapping is the way to go and is what I've needed to do to preserve molecule ordering in BioSimSpace.

I still think that it shouldn't be using MolNum(1) at all, rather using the value of the MolNum that matches the specified perturbed residue number, which is what is now used by BioSimSpace.

As for refactoring: Yes, there is a lot of duplication in SOMD (both the C++ and Python code) and it would be great if it is possible to consolidate some of this during you PME-FEP approach. Unfortunately, there has never been time to do this, and since I was never involved in the original development of SOMD I have been concerned about changing too many things for fear of unintended consequences. As you say, it means that you currently need to edit code in many places. This was a pain when adding triclinic box support, for example, since I couldn't just implement the code in a single base class.

@lohedges
Copy link
Member

Looking at OpenMMD.py, the centerSolute function is called after `createSystemFreeEnergy., which reorders molecules, I think placing the solute first, i.e.:

Sire/wrapper/Tools/OpenMMMD.py

Lines 1412 to 1415 in 9e035fa

system = createSystem(molecules)
if center_solute.val:
system = centerSolute(system, space)

It would be good to check if the system that is passed to centerSolute is sane, i.e. solute first with MolNum(1), rather than the system that was directly loaded from the Amber parser. (Perhaps you have changed things in your branch, so this is no longer the case.)

@jmichel80
Copy link
Contributor

tagging @annamherz as I think she has been running into this issue as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants