Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lammps Writer Changes For Updated Topology Class #701

Merged
merged 44 commits into from
Jul 7, 2023

Conversation

CalCraven
Copy link
Contributor

@CalCraven CalCraven commented Nov 29, 2022

This PR aims to refactor the current lammpswriter, which was questionably functional in its previous state and not entirely compatible with modern changes to a topology.

In terms of making sure the writer is nice and flexible. I'm pulling out some of the functions for conversions and making those available in the Topology class. That way users can make sure they know what functional forms are being used before writing, and there isn't a lot of conversions happening under the hood, which causes mistakes when people aren't sure how something is working.

See the relevant tests in test_lammpswriter.py for the unit_styles and atom_styles we aim to support. Lot's of different functional forms are technically possible, most of which are actually set in the input lammps file, not in the data file. So we don't currently have in place a lot of testing for them.

Current workflow:

1. Read in topology and forcefield.
2. Apply forcefield to topology.
3. Massage topology using unit and functional form conversions.
4. Write to relevant engine (raising errors if that engine doesn't support the given style).

PR Checklist


  • Includes appropriate unit test(s)
  • Fix lammpswriter reader
  • Add for different atom styles
  • Add in testing for different unit styles
  • Validate scaling with topology sizes
  • Address TODOs
  • Appropriate docstring(s) are added/updated
  • Code is (approximately) PEP8 compliant
  • Code raises useful errors and warnings
  • Issue(s) raised/addressed?

@CalCraven CalCraven self-assigned this Nov 29, 2022
@CalCraven CalCraven added WIP work in progress - do not merge core feature labels Nov 29, 2022
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Nov 29, 2022

This pull request introduces 4 alerts when merging 086dbe4 into 4f00c99 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import
  • 1 for Unreachable code

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@CalCraven
Copy link
Contributor Author

Possible method suggested by @daico007 for iterating through atomtypes and connection types. I will test this for the speedups compared to just iterating through all of the connections during the writing step.

# Create an atom type map to do indexing

atypes = list(topology.atom_types())
atypes_map = dict()
for i, uatype in enumerate(unique_atypes):
    matched_idx = list()
    for idx, atype in enumerate(atypes):
        if atype == uatype:
            atypes_map[atype] = i
            matched_idx.append(idx)
    for idx in reversed(matched_idx):
        atypes.pop(idx)

# Make connection types map to extract index later
uc_types = topology.connection_types(PotentialFilters.UNIQUE_NAME_CLASS)
c_types = [ctype for ctype in topology.connection_types]
c_typesmap = dict()
for uc_type in uc_types:
    matched_idx = list()
    for idx, c_type in enumerate(c_types):
        if c_type == uc_type:
            c_typesmap[c_type] = uc_type
            matched_idx.append(idx)
    for idx in reversed(matched_idx):
        c_types.pop(idx)

@codecov
Copy link

codecov bot commented Nov 29, 2022

Codecov Report

Patch coverage: 91.45% and project coverage change: -0.07 ⚠️

Comparison is base (2f3c7f3) 91.98% compared to head (27df258) 91.91%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #701      +/-   ##
==========================================
- Coverage   91.98%   91.91%   -0.07%     
==========================================
  Files          67       67              
  Lines        6460     6818     +358     
==========================================
+ Hits         5942     6267     +325     
- Misses        518      551      +33     
Impacted Files Coverage Δ
gmso/formats/mol2.py 100.00% <ø> (ø)
gmso/formats/top.py 88.04% <ø> (ø)
gmso/utils/decorators.py 75.00% <10.00%> (-19.12%) ⬇️
gmso/core/views.py 89.24% <25.00%> (-6.11%) ⬇️
gmso/core/topology.py 96.07% <50.00%> (-0.94%) ⬇️
gmso/utils/misc.py 96.36% <66.66%> (-1.75%) ⬇️
gmso/external/convert_parmed.py 95.13% <88.88%> (-0.36%) ⬇️
gmso/utils/conversions.py 96.40% <94.87%> (-2.15%) ⬇️
gmso/formats/lammpsdata.py 95.94% <95.14%> (+0.51%) ⬆️
gmso/abc/abstract_site.py 94.66% <100.00%> (+0.07%) ⬆️
... and 3 more

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Nov 29, 2022

This pull request introduces 4 alerts when merging a1d8ba8 into 6bcdfcc - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import
  • 1 for Unreachable code

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

gmso/utils/decorators.py Fixed Show fixed Hide fixed
gmso/core/views.py Fixed Show fixed Hide fixed
gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Dec 8, 2022

This pull request introduces 4 alerts when merging 9413d01 into fe321e2 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import
  • 1 for Unreachable code

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

@CalCraven CalCraven marked this pull request as draft February 3, 2023 17:59
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
@CalCraven CalCraven requested a review from jaclark5 May 18, 2023 16:02
Copy link
Member

@daico007 daico007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! I only have a few comments about code style for consistency (between units on comment line and actually being written out), and maybe some missing using conversion.
I pulled this PR and reviewed/tested it and things seem to be functional, so I am in favor of merging it soon. There are a few things, as we discussed last time, regarding the necessary of LAMMPS specific harmonic bonds/angles, but I think that can be addressed in a different. Another thing would be using the centralize unit conversing units (since HOOMD and GROMACS is using same/similar functionality.

gmso/utils/conversions.py Outdated Show resolved Hide resolved
Comment on lines +1463 to +1464
for potStr in exp_unitsDict:
potContainer = getattr(self, potStr + "_types")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this method only convert all the potentials. Should we also convert the atoms' position too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the initial goal for this method was just to be used on the parameters of the potential expressions, which is why it's missing things. Pairpotentials come to mind as an example of that. However, I ended up converting the expressions on writing out, instead of upfront in the lammpswriter, and mostly relying on methods that are living in lammpsdata.py because of that. I still think the best way to do the conversion would be:

  1. Grab the set of base UnitSystem. Presumably, this would be from a library, or passed directly to whatever function by the user. I can see a useful added input of also passing units specific to parameters in the topology, such as: {"angle_types":{"theta":u.degrees}}, which would override the value for that UnitSystem.
  2. Determine if the unit changes should happen in place or to a copy of the topology. This would be mostly for speedups, and prevent issues with overwriting whatever you pass to the function.
  3. Iterate through all components of the topology. Each component should be passed to a generic unit_conversion method with the base units and the conversion should be done. These components include: [box, sites, atomtypes, bondtypes, angletypes, dihedraltypes, impropertypes, pairpotentials]. Each of these may have their own specific methods, but should be accessible through a single api.
  4. Once the conversions are made, then the topology attributes can be written out directly as floats with the to_value() method. Rounding should also occur on writing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize, I think this function is not used currently at all. I can either delete it, or leave it as a starting place for the next PR that looks to normalize and standardize all of the unit handling in GMSO.

Comment on lines 87 to -97
for site in top.sites:
site.atom_type = ff.atom_types[site.name]

top.update_atom_types()
site.atom_type = ff.atom_types[site.atom_type.name]

for bond in top.bonds:
bond.bond_type = bond.connection_type = ff.bond_types[
"opls_111~opls_112"
]

top.update_bond_types()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could remove this section completely at this point. It was there when the apply method was not yet available.

gmso/tests/test_conversions.py Outdated Show resolved Hide resolved
Comment on lines +42 to +88
reg = UnitRegistry()
dim = u.dimensions.current_mks * u.dimensions.time
conversion = 1 * getattr(u.physical_constants, "elementary_charge").value
reg.add(
"elementary_charge",
base_value=conversion,
dimensions=dim,
tex_repr=r"\rm{e}",
)
conversion = 1 * getattr(u.physical_constants, "boltzmann_constant_mks").value
dim = u.dimensions.energy / u.dimensions.temperature
reg.add(
"kb", base_value=conversion, dimensions=dim, tex_repr=r"\rm{kb}"
) # boltzmann temperature
conversion = (
4
* np.pi
* getattr(u.physical_constants, "reduced_planck_constant").value ** 2
* getattr(u.physical_constants, "eps_0").value
/ (
getattr(u.physical_constants, "electron_charge").value ** 2
* getattr(u.physical_constants, "electron_mass").value
)
)
dim = u.dimensions.length
reg.add(
"a0", base_value=conversion, dimensions=dim, tex_repr=r"\rm{a0}"
) # bohr radius
conversion = (
getattr(u.physical_constants, "reduced_planck_constant").value ** 2
/ u.Unit("a0", registry=reg).base_value ** 2
/ getattr(u.physical_constants, "electron_mass").value
)
dim = u.dimensions.energy
reg.add(
"Ehartree", base_value=conversion, dimensions=dim, tex_repr=r"\rm{Ehartree}"
) # Hartree energy
conversion = np.sqrt(
10**9 / (4 * np.pi * getattr(u.physical_constants, "eps_0").value)
)
dim = u.dimensions.charge
reg.add(
"Statcoulomb_charge",
base_value=conversion,
dimensions=dim,
tex_repr=r"\rm{Statcoulomb_charge}",
) # Static charge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to put this inside _unit_style_factory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon chatting with @daico007, I think all of this is going to get moved to the units module in an upcoming PR. So, I'll leave this here for now, as it's not worth the effort to move this right before it will all get moved again.

gmso/formats/lammpsdata.py Outdated Show resolved Hide resolved
gmso/formats/lammpsdata.py Outdated Show resolved Hide resolved
gmso/formats/lammpsdata.py Outdated Show resolved Hide resolved
gmso/formats/lammpsdata.py Outdated Show resolved Hide resolved
gmso/formats/lammpsdata.py Outdated Show resolved Hide resolved
Copy link
Member

@daico007 daico007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! I only have a few comments about code style for consistency (between units on comment line and actually being written out), and maybe some missing using conversion.
I pulled this PR and reviewed/tested it and things seem to be functional, so I am in favor of merging it soon (after addressing the failing test and comments above). There are a few things, as we discussed last time, regarding the necessity of LAMMPS specific harmonic bonds/angles json, but I think that can be addressed in a different. Another thing would be using the centralize unit conversing units (since HOOMD and GROMACS is using same/similar functionality.

@daico007
Copy link
Member

Note: we still have one failing tests at

___ TestLammpsWriter.test_lammps_vs_parmed_by_mol[typed_methylnitroaniline] ____
  
      def compare_lammps_files(fn1, fn2, skip_linesList=[]):
          """Check for line by line equality between lammps files, by any values."""
          with open(fn1, "r") as f:
              line1 = f.readlines()
          with open(fn2, "r") as f:
              line2 = f.readlines()
          for lnum, (l1, l2) in enumerate(zip(line1, line2)):
              print(l1, l2)
              print("###############")
              for arg1, arg2 in zip(l1.split(), l2.split()):
                  try:
                      comp1 = float(arg1)
                      comp2 = float(arg2)
                  except ValueError:
                      comp1 = str(arg1)
                      comp2 = str(arg2)
                  if isinstance(comp1, float):
  >                   assert np.isclose(
                          comp1, comp2, 1e-3
                      ), f"The following two lines have not been found to have equality {l1} and {l2}"
  E                   AssertionError: The following two lines have not been found to have equality 18 dihedral types
  E                      and 23 dihedral types
  E                     
  E                   assert False
  E                    +  where False = <function isclose at 0x7ff1877b4d30>(18.0, 23.0, 0.001)
  E                    +    where <function isclose at 0x7ff1877b4d30> = np.isclose
  
  /Users/runner/work/gmso/gmso/gmso/tests/test_lammps.py:37: AssertionError

_parameter_converted_to_float(
improper_type.parameters["k"], base_unyts, cfactorsDict
),
improper_type.parameters["chieq"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
improper_type.parameters["chieq"]
improper_type.parameters["phi_eq"]

Currently, in the PeriodicImproperPotential json, the three variables are k (energy), n (dimensionless) and phi_eq (angle). I think the chieq could be a typo. Also, why don't the n get written out somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, this is supposed to write out harmonic improper, in which case, we need to adjust the list in _accepted_potential

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the docs (https://docs.lammps.org/improper_style.html), looks like our PeriodicImproper is called fourier in lammps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so the naming is slightly weird, that's on of the reasons I was tagging things with LAMMPS as the potential template name.
As far as I can tell:
GMSO PeriodicImproperPotential = lammps modified improper_style CVFF (not sure if there's a 1-1 conversion here
GMSO HarmonicImproperPotential = 1/2* LAMMPS improper_style harmonic
GMSO FourierTorsionPotential != LAMMPS improper_style fourier (we would probably just need a separate template for it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equations for PeriodicImproper and and lammps improper_style fourier, are similar, but missing the phi_eq term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am looking back here: https://github.com/mosdef-hub/mbuild/blob/main/mbuild/formats/lammpsdata.py.
It seems like that the current mbuild version support LAMMPS improper harmonic (coming fromstructure.impropers) and cvff (coming from structure.dihedrals where dihedral.improper == True). So I think, in GMSO, that would map GMSO harmonic --> LAMMPS harmonic and GMSO periodic --> LAMMPS cvff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a test for this, for some reason it's not being caught at all.

Copy link

@jaclark5 jaclark5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I only read through the code and made comments. I'll download and perform testing when it passes the tests.

gmso/core/box.py Outdated Show resolved Hide resolved
Parameters
----------
expressionMap : dict, default={}
map with keys of the potential type and the potential to change to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing and only made sense after looking at the example. Maybe:
Map where the keys represent the current potential type and the corresponding values represent the desired potential type.

"""
from gmso.utils.conversions import convert_topology_expressions

return convert_topology_expressions(self, expressionMap)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make a method for convert_potential_styles instead of just using convert_topology_expressions. Regardless of whether this method remains, the import should be at the top of the module considering that it is an in-package import and not an optional external dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @daico007, all of this will get refactored so that the 3 engine writers are using similar methods. This method is analagous to what HOOMD and GROMACS Top writers are currently using locally, so that code will end up being extracted to this topology method.

gmso/core/topology.py Show resolved Hide resolved
gmso/core/topology.py Show resolved Hide resolved
gmso/formats/lammpsdata.py Outdated Show resolved Hide resolved
) * get_units(unit_style, "length")
type_list[i].parameters["epsilon"] = float(
pair.split()[1]
) * get_units(unit_style, "energy")
elif len(pair.split()) == 4:
warnings.warn("Currently not reading in mixing rules")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this should be added then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there seems to be no direct way to get this from the lammpsdata file. It is usually set in the lammps simulation input file.
pair_style lj/cut rcut
pair_modify mix combining_rule
read_data data.lammps

So I think we're handcuffed here, I think this warning could be changed though to tell the user they need to manually specify the mixing rule as topology.combining_rule = "geometric" or "lorentz". The default in GMSO is lorentz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This len(pair.split()) == 4 bit is actually referring to the pair_style lj/cut if a specific cutoff is given for the pair https://docs.lammps.org/pair_lj.html. GMSO does not keep track of rcut values as of now, as that is typically part of the input file, not the topology files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to raise this as an issue, to see if someone would want to make a site specific or pair potential with these parameters for them.

)
)
out_file.write("{:d} atoms\n".format(top.n_sites))
if atom_style in ["full", "molecular"]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this handle hybrid atom_styles that add bonds and angles to styles such as DPD or ellipsoid (the former from personal use and the latter is in the example directory for lammps).
Also "bond", "angle", and "amoeba" have native bond and angle capability so the lists here need to be updated.

Copy link
Contributor Author

@CalCraven CalCraven Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "bond", "angle", and "amoeba" have native bond and angle capability so the lists here need to be updated.

The only atom styles supported at this point are the ones we supported in the mbuild lammpswriter. So for now, we just have these from the doc string:

atom_style : str, optional, default='full'
        Defines the style of atoms to be saved in a LAMMPS data file.
        The following atom styles are currently supported: 'full', 'atomic', 'charge', 'molecular'
        see http://lammps.sandia.gov/doc/atom_style.html for more information on atom styles.

How will this handle hybrid atom_styles that add bonds and angles to styles such as DPD or ellipsoid (the former from personal use and the latter is in the example directory for lammps).

Hybrid is going to be interesting, to say the least. According to the lammps read_data docs, hybrid will look like:
atom-ID atom-type x y z sub-style1 sub-style2
So the way to handle that would be for something in the full topology:

if len(atom_style) > 0 and atom_style[0] == "hybrid":
    style_formatsList = get_styleformats(atom_style[1:], top, section_type)
                sectionStr = "" # or put default info here
    for style_format in style_formatsList:
        sectionStr+=style_format(top)
    out_file.write(sectionStr)

Something that you iterate through the topology for might look like:

if len(atom_style) > 0 and atom_style[0] == "hybrid":
    style_formatsList = get_styleformats(atom_style[1:], top, section_type)
    for site in top.sites:
        sectionStr = ""
        for style_format in style_formatsList:
            sectionStr+=style_format(site)
    out_file.write(style_format(site))

where atom_style isn't a string, but a list of styles you pass to the writer function.
i.e. atom_style = ["hybrid", "charge", "full"]

I think this is something that can be added as we add some of these specific styles into the writer.

def _write_site_data(out_file, top, atom_style, base_unyts, cfactorsDict):
"""Write atomic positions and charges to LAMMPS file.."""
out_file.write("\nAtoms\n\n")
if atom_style == "atomic":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about other styles (e.g. DPD)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be addressed in later PR's on a case by case basis.

expected_units_dim = dict(
zip(["c0", "c1", "c2", "c3", "c4", "c5"], ["energy"] * 6)
)
base_units = u.UnitSystem("atomic", "Å", "mp", "fs", "nK", "rad")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a user ever need to type the symbol, Å? if so, using this nonstandard character could be a concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string: "angstrom" should work just as well here and provide equivalency. So we shouldn't run into the issue of being forced to use the Å to create the unyt.Unit.

gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
gmso/formats/lammpsdata.py Fixed Show fixed Hide fixed
gmso/external/convert_parmed.py Fixed Show fixed Hide fixed
gmso/tests/test_lammps.py Fixed Show fixed Hide fixed
gmso/tests/test_lammps.py Fixed Show fixed Hide fixed
Copy link
Member

@daico007 daico007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is ready for now. There several changes that would optimize/clean things up a bit, but those make more sense in follow up PRs per ongoing discussions.

@daico007 daico007 merged commit 689321b into mosdef-hub:main Jul 7, 2023
11 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants