Skip to content

Latest commit

 

History

History
212 lines (145 loc) · 9.42 KB

4.4MLForce.md

File metadata and controls

212 lines (145 loc) · 9.42 KB

SGNNForce/EANNForce

1. Theory

SGNNForce/EANNForce provides a support to message-passing GNN model (called SGNN) and descriptor machine-learning model with a Gaussian-type orbital based density vector (called EANN).

1.1 SGNN

SGNN assume the remaining bonding energy can be written as a sum over different local fragments of the molecule. These fragments are defined as “subgraphs” (labeled as g):

$$ E_{sGNN}=\sum {E_{g}} $$

Each subgraph defines the local environment of a central bond, and $E_g$ represents the intramolcular energy attributed to that bond. This leads to a rigorously localized representation of the molecule, warranting the extendibility of the resulting model.

The other detail can be found in References.

1.2 EANN

EANN framework born out from the EAM idea. This physically inspired embedded atom neuralnetworks (EANN) representation is not only conceptually andnumerically simple but also very efficient and accurate, as discussed below. EANN assume that the impurity experiences a locally uniform electron density, the embedding energy can be approximated as a function of the scalar local electron density at the impurity site plus an electrostatic interaction. Considering all atoms in the system as impurities embedded in the electron gas created by other atoms, in the EAM framework, the total energy of an $N$ atom system is just the sum over all individual impurity energies.

$$ E=\sum_{i=1}^{N} E_{i}=\sum_{i=1}^{N}\left[F_{i}\left(\rho_{i}\right)+\frac{1}{2} \sum_{j \neq i} \phi_{i j}\left(r_{i j}\right)\right] $$

where $F_i$ is the embedding function, $ρ_i$ is the embedded electron density at the position of atom $i$ given by the superposition of the densities of surrounding atoms, and $\phi_{ij}$ is the short-range repulsive potential between atoms $i$ and $j$ depending on their distance $r_{ij}$. As the exact forms of these functions are generally unknown, they are often taken from electron gas computations or fit to experimental properties with semiempirical expres-sions. Given these intrinsic approximations, EAM or even its modified version has a limited accuracy and is mainly suitablefor metallic systems.

To go beyond the EAM, we need to improve both expressions of the embedded density and the function $F$. To this end, EANN start from the commonly used Gaussian-type orbitals (GTOs) centered at each atom,

$$ \phi_{l_{x} y_{l} y_{z}}^{\alpha, r_{s}}=x^{l_{x}} y^{l_{y}} z^{l_{z}} \exp \left(-\alpha\left|r-r_{s}\right|^{2}\right) $$

where each atom is taken as the origin, $r=(x,y,z)$ constitutes the coordinate vector of an electron, $r$ is the norm of the vector, $α$ and $r_s$ are parameters that determine radial distributions of atomic orbitals, ${l_x+l_y+l_z=L}$ specifies the orbital angular momentum ($L$), e.g., $L$ = 0, 1, and 2, correspond to the s, p, and d orbitals, respectively. In this representation, the embedded density of atom $i$ can be taken as the square of the linear combination of atomic orbitals from neighboring atoms, in a similar spirit as that in Hartree−Fock (HF) and densityfunctional theory (DFT). This would generate a scalar $ρ^i$ value for the embedding atom $i$, as used in the EAM, which has been proven to offer insufficient representability for the total energyand can be improved by including the gradients of density.

The other detail can be found in References.

References

  1. SGNN in JPCL: J. Phys. Chem. Lett. 2021, 12, 7982−7987
  2. EANN in JPCL: J. Phys. Chem. Lett. 2019, 10, 17, 4962–4967

2. Frontend

A typical xml file to define an SGNNForce frontend is:

<SGNNForce file="model1.pickle" pdb="peg4.pdb" nn="1"/>

The important tags in the frontend includes:

  • file: the SGNN parameters file.
  • pdb: the pdb file to provide the topology information.
  • nn: the size of the subgraphs, i.e., how many neighbors to include around the central bond.

To create a SGNN potential function, one needs to do the following in python:

H = Hamiltonian('peg.xml')
app.Topology.loadBondDefinitions("residues.xml")
pdb = app.PDBFile("peg4.pdb")
pots = H.createPotential(pdb.topology, nonbondedCutoff=0.6*unit.nanometer, ethresh=5e-4)

Then the potential function, the parameters, and the generator can be accessed as:

efunc_sgnn = pots.dmff_potentials["SGNNForce"]
params_sgnn = H.getParameters()["SGNNForce"]
generator_sgnn = H.getGenerators()[0] # if SGNNForce is the first force node in xml

A typical xml file to define an EANNForce frontend is:

<EANNForce file="eann_model.pickle" pdb="peg4.pdb" ngto="16" nipsin="2" rc="4"/>

The important tags in the frontend includes:

  • file: the EANN parameters file.
  • pdb: the pdb file to provide the element and mass information.
  • ngto: the number of GTOs used in EANN.
  • nipsin: the largest L in angular channel. Default 2
  • rc: the cutoff distances, used to determine initial rs and inta.

To create a EANN potential function, one needs to do the following in python:

H = Hamiltonian('peg.xml')
app.Topology.loadBondDefinitions("residues.xml")
pdb = app.PDBFile("peg4.pdb")
pots = H.createPotential(pdb.topology, nonbondedCutoff=0.6*unit.nanometer, ethresh=5e-4)

Then the potential function, the parameters, and the generator can be accessed as:

efunc_eann = pots.dmff_potentials["EANNForce"]
params_eann = H.getParameters()["EANNForce"]
generator_sgnn = H.getGenerators()[0] # if SGNNForce is the first force node in xml

3. SGNNGenerator/EANNGenerator Doc

ATTRIBUTES:

The following attributes are accessible via the SGNNGenerator:

  • ffinfo: the force field parameters registered in the generator.
  • MolGNNForce: the backend SGNNForce object

METHODS

The following methods are accessible via SGNNGenerator:

  • overwrite(paramset): overwrite the parameters registered in the generator (i.e., the ffinfo object) using the values in paramset.

ATTRIBUTES:

The following attributes are accessible via the EANNGenerator:

  • ffinfo: the force field parameters registered in the generator.
  • EANNForce: the backend EANNForce object

METHODS

The following methods are accessible via EANNGenerator:

  • overwrite(paramset): overwrite the parameters registered in the generator (i.e., the ffinfo object) using the values in paramset.

4. SGNNForce/EANNForce Doc

The backend of the SGNN energy is an MolGNNForce object. It contains the following attributes and methods:

ATTRIBUTES:

  • G: TopGraph object, the topological graph object, created using dmff.sgnn.graph.TopGraph
  • n_layers: int tuple, optional, the number of hidden layers before and after message passing, default = (3, 2)
  • sizes: [tuple, tuple], optional, the sizes (numbers of hidden neurons) of the network before and after message passing, default = [(40, 20, 20), (20, 10)]
  • nn: int, optional size of the subgraphs, i.e., how many neighbors to include around the central bond, default = 1
  • sigma: float, optional, final scaling factor of the energy. default = 162.13039087945623
  • mu: float, optional, a constant shift, the final total energy would be ${(E_{NN} + \mu)}*{\sigma}$
  • seed: int, optional, the seed for random number generator, default 12345

METHODS

  • forward:

    This function returns the SGNN energy.
    
    Input:
      positions:
          Na * 3: positions
      box:
          3 * 3: box
      pairs:
          Np * 3: interacting pair indices and topology distance
      params: dict
        The parameter dictionary, including the following keys:
        w: weights of messaging-pass
        fc.weight: weights of NN, list of (n_elem, dim_in, dime_out) array, with a length of n_layer
        fc.weight: bias of NN, list of (n_elem, dim_out) array, with a length of n_layer
    
    Output:
        energy: total SGNN energy
    

The backend of the EANN energy is an EANNForce object. It contains the following attributes and methods:

ATTRIBUTES:

  • n_elem: int, the number of elements in the model.
  • elem_indices: int array, the element type of each atom in the system.
  • n_gto: int, the number of GTOs used in EANN.
  • rc: float, the cutoff distances, used to determine initial rs and inta.
  • nipsin: int, optional, the largest L in angular channel. Default 2
  • beta: float, optional, used to determine initial $\Delta$ rs. Default 0.2
  • sizes: tupple, ints, optional, the number of hidden neurons in the model, the length is number of layers. Default (64, 64)
  • seed: int, optional, the seed for random number generator, default 12345

METHODS

  • get_energy:

    This function returns the EANN energy.
    
    Input:
      positions:
          Na * 3: positions
      box:
          3 * 3: box
      pairs:
          Np * 3: interacting pair indices and topology distance
      params: dict
        The parameter dictionary, including the following keys:
        c: ${c_{ij}}$ of all exponent prefactors, (n_elem, n_elem)
        rs: distance shifts of all radial gaussian functions, (n_gto,)
        inta: the exponents, (n_gto,)
        w: weights of NN, list of (n_elem, dim_in, dime_out) array, with a length of n_layer
        b: bias of NN, list of (n_elem, dim_out) array, with a length of n_layer
    
    Output:
        energy: total EANN energy