Skip to content

New Species Representation Idea

Richard West edited this page Nov 14, 2020 · 2 revisions

Current Species Representation

A Species as defined in the models is a representation of a chemical species with a unique set of structural isomers and resonance structures.

Theoretically, a species will have precisely 1 isomer and 1 structure, as it represents a specific state of matter with its own unique properties. An example of two different species with the same chemical formula is n-butane and isobutane. They are both comprised of the same distribution of atoms, but they have different bonding structure. This would be represented in the database as 2 Species objects with the same formula field, but each will link to different Isomer objects, each with a different inchi field. An example that goes down to the resonance structure level is the oxygen atom species. An oxygen atom can have different quantum states, two of which are the singlet and triplet states. This would be represented in the database as 2 Species objects with the same formula field: "O". Each Species would link to two different Isomer objects with the same inchi field. Each of these Isomers would then link to a unique Structure object with different adjacency_list field values.

In practice, some sources reference "lumped" species, which are ambiguous chemical species that contain many isomers and electronic structures. For instance, there may be a "butane" species that consists of every isomer of butane. This would be represented by a Species object with links to every Isomer object for butane.

Proposed New Species Representation

There should be a single source of truth for every formula. For example, there should be 1 object in the database with formula C4H10. That object should have a list of all Isomer objects that also have that formula. Each of those Isomers should link to each resonance structure that is in the database that corresponds with the formula C4H10. A different object should represent a chemical species, and should have a single link to a chemical formula object. This object will effectively be a "filter" of various isomers and resonance structures that it represents. This should make importing a lot simpler, as we can more easily use uniqueness constraints to determine if formulas, isomers, and resonance structures are already in the database. We will also separate the chemical identities from the representations that are used by kinetic models.

Discussion (from Richard)

In terms of database structure (for want of a better way to describe things) are you suggesting a table for Formula, with links to Species and Isomer? Why not just a Formula field or column in the Species and Isomer tables? you can do a query WHERE Formula='C4H10' as easily as joining it to the formula table and checking the formula that way, looking up the PK of the formula you want, or whatever. I'm not seeing the benefit of a Formula model that just contains a string (the formula) and a bunch more complex joins.