-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose properties from model other than forces and energies? #155
Comments
The model you provide to OpenMM-Torch has energy as its only output, or possibly also forces if you request that. By construction there are no other outputs. You might have implemented the model using code that can also compute other quantities, but they aren't returned. |
Thank you, good point. My question was a bit ill-phrased. If the underlying I'd love to use OpenMM and for my use case having access to other model outputs will be important. class ForceModule(torch.nn.Module):
"""A central harmonic potential that computes energy and writes my_property."""
def forward(self, positions):
self.my_property = torch.mean(positions)
return torch.sum(positions**2) |
Currently there's no feature to expose that. It's an interesting idea. Can you describe your use case and how you would want it to work? What would you imagine the API looking like? |
I'm working on materials science applications where magnetic, dielectric, and electronic properties are often essential. Incorporating these properties directly into MLFFs is still new, CHGnet is an early example, but it's increasingly common. I'd like to be able to observe and record those properties as the trajectory evolves. The core problems that I foresee are allowing for asynchronous IO and supporting custom outputs, which is necessary since we don't know what features the model will be predicting. There are a couple approaches I can imagine, these are just loose thoughts and could be mixed and matched. 1. Just leaving it to the user.I could just write files in the forward pass of the ForceModule. pros:
2. Saving additional attributes as properties of the ForceModule and exposing the ForceModule through the TorchForce with a getter method.class ForceModule(torch.nn.Module):
"""A central harmonic potential that computes energy and writes my_property."""
def forward(self, positions):
self.my_property_list = self.my_property_list.append(torch.mean(positions))
return torch.sum(positions**2) simulation.step(100)
torch_force = system.getForce(0)
module = torch_force.getForceModule()
my_property = module.my_property_list
len(my_property_list) == 100 Instead of appending we could instead overwrite the property at each step but then we'd need to do something like this my_property_list = []
for _ in range(100):
simulation.step(10)
torch_force = system.getForce(0)
module = torch_force.getForceModule()
my_property_list.append(module.my_property) Pros:
3. Exporting the properties in the forward pass and exposing them with a TorchReporterclass ForceModule(torch.nn.Module):
"""A central harmonic potential that computes energy and writes my_property."""
def forward(self, positions):
# should still support returning energy or force tensors instead of
# a dict for backwards compatibility / simplicity
return {energy: torch.sum(positions ** 2), my_property: torch.mean(positions)}
class MyReporter(TorchReporter):
def report(self, file, reportInterval):
self.forward_pass_output = # would need to somehow get access to
# the output of the forward pass
# my custom IO code
This is just a loose idea, but having some way to asynchronously write the data would be nice. pros:
|
The complication is that TorchForce is a C++ class. The Python class is just a thin wrapper around it. When you create a TorchForce, it serializes your module to a stream of bytes and reconstructs a new module on the C++ side. From that point on there's no longer any connection to the original module. It also has been compiled to TorchScript and has no access to the Python interpreter. It can only contain operations that are supported by TorchScript. So we need to think of this in terms of the C++ API. For example, we might allow the module to return extra outputs, and add a |
Got it. I admit C++ is not my expertise. In CHGnet, magnentic moments are calculated by the forward pass of the model so having a separate Returning a numpy array would make sense to me. OpenMM already returns numpy arrays elsewhere in it's API and it can be converted to a Tensor if needed. Well I personally don't see a use case for backpropagating through the additional property tensors. EDIT: on second thought, if it's possible to return the tensors, it's probably best not to throw away the derivative information. Alternatively, additional outputs could be periodically written to an H5 file with some sort of |
In support of @orionarcher's request, I think it would be very beneficial to have the ability to access extra properties through OpenMM. Various dielectric properties can be predicted from a machine learning model trained on energy, forces, polarization, polarizability, and Born charges. For instance: The inclusion of the electric-field contributions could be done directly in the OpenMM interface. Let U be the energy, E the electric field, P the polarization, α the polarizability, F the forces, e the electron charge, and Z the Born charges. The model is trained to predict quantities in the absence of the field, namely U(0), P(0), F(0), α, Z. Then, the field-dependent electronic structure is determined as follows: It would be ideal to have the flexibility to define the electric field E in OpenMM at each time step of the MD simulation with an arbitrary time-dependent expression. This, for instance, can be used to study ferroelectric hysteresis using a sinusoidal electric field. More can be found in this paper, which describes the ML model and the LAMMPS interface following this idea. Having the flexibility in OpenMM to handle extra quantities goes beyond just the scope of dielectric and ferroelectric properties of materials; it could be applied to a variety of response functions to external perturbations. It would be a fantastic addition to the code, and I hope you will consider this point relevant and urgent. |
Can you suggest what an API for that might look like? What parts of the calculation would be done by OpenMM, what parts would be internal to the PyTorch model, and what parts would be done at a higher level external to both of them (such as a Python script)? |
class ForceModule(torch.nn.Module):
"""Example of how a user would implement their model with additional properties"""
def __init__(self, model: torch.nn.Module, electric_field: torch.Tensor):
self.model = model
self.electric_field = electric_field
def forward(self, positions):
# Calculate energy (required)
forces, born_charges = self.model(positions)
electric_forces = self.model.compute_forces_based_on_field(forces, born_charges, self.electric_field)
total_forces = forces + electric_forces
return total_forces
def compute_outputs(
self, positions, calculate_polarizability: bool,calculate_polarization: bool
) -> dict[str, torch.Tensor]:
polarizability, polarization = self.model.calculate_electric_properties(
positions, calculate_polarizability=calculate_polarizability, calculate_polarization=calculate_polarization
)
return {"polarizability": polarizability, "polarization": polarization}
from openmmtorch import computeOutputs
# 3. New Reporter for Property Recording
class TorchElectricReporter:
"""Reporter for recording model properties during simulation"""
def __init__(self, file: str, reportInterval: int, calculate_polarization: bool):
self.file = file
self._reportInterval = reportInterval
self.calculate_polarization = calculate_polarization
def describeNextReport(self, simulation):
"""Returns information about the next report"""
steps = self._reportInterval - simulation.currentStep % self.reportInterval
return (steps, False, False, False, False, True)
def report(self, simulation):
"""Records the specified properties to the output file"""
outputs: dict[str, torch.Tensor] = computeOutputs(simulation.context, self.calculate_polarization)
# user defined code for writing out the outputs to a file What about something like the above? The key API modifications are:
This allows us to define a new reporter that calls Tracking the born charges and the electric field can all be handled in the forward pass without any modification to the current API. The calculation of additional outputs is handled by |
Thanks for sharing this @orionarcher , some API like the one you suggested should work! To your questions @peastman : |
@orionarcher an API like what you suggest would be very easy to implement. In that design, OpenMM doesn't really do anything with the extra outputs except return them. It's entirely up to you to write your own code to do something with them. Is that sufficient for your needs? @falletta it sounds like you're asking for something more than that. I'm not completely clear on the details.
What does "contains" mean? Are they inputs to the model? Outputs from it? Tensors that are stored inside the model but can be modified? Something else? If you mean they're tensors, how would their values be set?
What piece of code computes the contributions from them? Is it part of the PyTorch model, or the TorchForce class? If the latter, what is the reason for not doing it inside the model? How would you tell TorchForce what calculations to do?
Does that mean the electric field is an input to the model? How would the user specify it? |
That would meet my needs. I think it's fine to ask the user to perform additional operations as long as the necessary information can be exposed. The main downside is that any IO in the |
Hi @peastman. If the proposed solution is acceptable, is there any chance it'll be implemented in the next couple months? |
I was waiting for @falletta to reply. It sounds like his needs are different than yours. |
Hi @peastman, apologies for my late reply. In response to your questions:
I hope this helps clarifying the idea. The solution proposed by @orionarcher should work. More details can be found in this paper (https://arxiv.org/abs/2403.17207). Happy to catch up over a zoom call to further clarify the implementation details. |
Is that given by equation 2 in the paper, with the polarization given by equation 8? Does it require any outputs of the model to compute the energy, or are all the quantities in that equation precomputed, time-invariant values? |
Say E is the OpenMM electric field, which is set by the user (constant or sinusoidal, etc..). At each frame, the ML model gives in output energy(E=0), forces(E=0), born_charges(E=0), polarizability(E=0) in the limit of zero electric field. These quantities at E=0 are time dependent, as different atomic configurations would result in different electronic structure. Then, the equations that need to be implemented in the OpenMM interface are:
In the presence of the electric field E, the forces(E) are used to evolve the OpenMM dynamics at each MD step, and require the calculation of Born charges (Eq. 7). The calculated Polarization(E) can be used, for instance, to generate ferroelectric hysteresis plots and requires the calculation of polarizability. In the absence of the electric field, the dynamics of polarization(E=0) can be used to determine the IR activity, the dynamics of polarizability(E=0) can be used for Raman activity. Note that, in the case of E=0, even if the model has been trained on Born charges, the OpenMM interface does not need to process the model's Born charges to study IR and Raman activity. Being able to toggle the extraction of Born charges from the model's output on or off when needed can significantly enhance the efficiency of OpenMM MD simulations. |
What about computing the extra energy term in pytorch? Once you've called the main model to compute the charges and polarizations, it should be very little additional code to add it in before returning the energy. If you want the electric field to be time varying, you can make it a global parameter. |
Some MLFF models expose properties other than the forces and energies. For example, CHGnet also predicts the magnitude of the magnetic moment. Is there a way to access these values through OpenMM or OpenMM-torch while the simulation is running?
The text was updated successfully, but these errors were encountered: