Copyright © 2007, Craig A. James
Content is available under GNU Free Documentation License 1.2
Contributors: Richard Apodaca, Noel O’Boyle, Andrew Dalke, John van Drie, Peter Ertl, Geoff Hutchison, Craig A. James, Greg Landrum, Chris Morley, Egon Willighagen, Hans De Winter, Tim Vandermeersch
"… we cannot improve the language of any science, without, at the same time improving the science itself; neither can we, on the other hand, improve a science, without improving the language or nomenclature which belongs to it …"
This document formally defines an open specification version of the SMILES language, a typographical line notation for specifying chemical structure. It is hosted under the banner of the Blue Obelisk project, with the intent to solicit contributions and comments from the entire computational chemistry community.
SMILES was originally developed as a proprietary specification by
Daylight Chemical Information Systems
Since the introduction of SMILES in the late 1980’s, it has
become widely accepted as a defacto standard for exchange of molecular structures. Many independent
SMILES software packages have been written in C
, C++
, Java
, Python
, LISP
, and probably even
FORTRAN
.
At this point in the history of SMILES, is appropriate for the chemistry community to develop a new, non-proprietary specification for the SMILES language. Daylight’s SMILES Theory Manual has long been the "gold standard" for the SMILES language, but as a proprietary specification, it limits the universal adoption of SMILES, and has no mechanism for contributions from the chemistry community. We salute Daylight for their past contributions, and the excellent SMILES documentation they provided free of charge for the past two decades.
This document is intended for developers designing or improving a SMILES parser or writer. Readers are expected to be acquainted with SMILES. Due to the formality of this document, it is not a good tutorial for those trying to learn SMILES. This document is written with precision as the primary goal; readability is secondary.
Before defining the SMILES language, it is important to state the physical model on which it is based: the valence model of chemistry, which uses a mathematician’s graph to represent a molecule. In a chemical graph, the nodes are atoms, and the edges are semi-rigid bonds that can be single, double, or triple according to the rules of valence bond theory.
This simple mental model has little resemblance to the underlying quantum-mechanical reality of electrons, protons and neutrons, yet it has proved to be a remarkably useful approximation of how atoms behave in close proximity to one another. However, the valence model is an imperfect representation of molecular structure, and the SMILES language inherits these imperfections. Chemical bonds are often tautomeric, aromatic or otherwise fractional rather than neat integer multiples. Delocalized bonds, bond-centered bonds, hydrogen bonds and various other inter-atom forces that are well characterized by a quantum-mechanics description simply don’t fit into the valence model.
"If you can build a molecule from a modeling kit, you can name it."
McLeod and Peters' quip captures the deficiencies of SMILES well: if you can’t build a molecule from a modeling kit, the deficiencies of SMILES and other connection-table formats become apparent.
This SMILES specification is divided into two distinct parts: A syntactic specification specifies how the atoms, bonds, parentheses, digits and so forth are represented, and a semantic specification that describes how those symbols are interpreted as a sensible molecule. For example, the syntax specifies how ring closures are written, but the semantics require that they come in pairs. Likewise, the syntax specifies how atomic elements are written, but the semantics determines whether a particular ring system is actually aromatic.
For this specification, the syntax and semantics are explained separately; in practice, the syntax and semantics are usually mixed together in the code that implements a SMILES parser. This chapter is only concerned with syntax.
Section | Formal Grammar |
---|---|
ATOMS |
|
atom ::= bracket_atom | aliphatic_organic | aromatic_organic | |
|
ORGANIC SUBSET ATOMS |
|
aliphatic_organic ::= |
|
aromatic_organic ::= |
|
BRACKET ATOMS |
|
bracket_atom ::= |
|
symbol ::= element_symbols | aromatic_symbols | |
|
isotope ::= NUMBER |
|
element_symbols ::= |
|
aromatic_symbols ::= |
|
CHIRALITY |
|
chiral ::= |
|
HYDROGENS |
|
hcount ::= |
|
CHARGES |
|
charge ::= |
|
ATOM CLASS |
|
class ::= |
|
BONDS AND CHAINS |
|
bond ::= |
|
ringbond ::= bond? DIGIT | bond? |
|
branched_atom ::= atom ringbond* branch* |
|
branch ::= |
|
chain ::= branched_atom | chain branched_atom | chain bond branched_atom | chain dot branched_atom |
|
dot ::= |
|
SMILES STRINGS |
|
smiles ::= terminator | chain terminator |
|
terminator ::= SPACE | TAB | LINEFEED | CARRIAGE_RETURN | END_OF_STRING |
An atom is represented by its atomic symbol, enclosed in square brackets, []
.
The first character of the symbol is uppercase and the second (if any) is lowercase,
except that for aromatic atoms (see below), the first character is lowercase. There are
114 valid atomic symbols
, as defined by IUPAC (see also
Web Elements).
The symbol '*'
is also accepted as a valid atomic symbol, and represents a "wildcard" or unknown atom.
Examples:
SMILES | Atomic Symbol |
---|---|
|
Uranium |
|
Lead |
|
Helium |
|
Unkwown atom |
Hydrogens inside of brackets are specified as Hn
where n
is a number such as H3
. If no
Hn
is specified, it is identical to H0
. If H
is
specified without a number, it is identical to H1
. For example, [C]
and
[CH0]
are identical, and [CH]
and [CH1]
are identical.
Hydrogens that are specified in brackets with this notation have undefined isotope, no chirality, no other bound hydrogen, neutral charge, and an undefined atom class.
Examples:
SMILES | Name | Comments |
---|---|---|
|
methane |
|
|
hydrochloric acid |
|
|
hydrochloric acid |
A hydrogen atom cannot have a hydrogen count, for example, [HH1]
is illegal. Hydrogens connected
to other hydrogens must be represented as explicit atoms in square brackets. For example molecular
hydrogen must be written as [H][H]
.
Question: are more than 9 hydrogens possible? Should they be supported?
Charge is specified by a +n
or -n
where n
is a number; if the
number is missing, it means either +1
or -1
as appropriate.
For backwards compatibility, a general-purpose SMILES parser should
accept the symbols --
and ++
to mean charges of -2
and +2
, but this
is a deprecated form and should be avoided.
Examples:
SMILES | Name | Comments |
---|---|---|
|
chloride anion |
|
|
hydroxyl anion |
|
|
hydroxyl anion |
|
|
copper cation |
|
|
copper cation |
|
An implementation is required to accept charges in the range -15
to +15
.
Isotopic specification is placed inside the square brackets for an atom preceding the atomic symbol; for example:
SMILES | Atomic Symbol |
---|---|
|
methane |
|
deuterium ion |
|
Uranium 238 atom |
An isotope is interpreted as a number, so that [2H]
,
[02H]
and [002H]
all mean deuterium. If the isotope field
is not specified then the atom is assumed to have the naturally-occurring isotopic ratios. The
isotope value 0 also indicates an isotope of zero, that is
[0S]
is not the same as [S]
.
There is no requirement that the isotope is a genuine isotope of the element. Thus,
[36Cl]
is allowed even though 35Cl and
37Cl are the actual known stable isotopes of chlorine.
A general-purpose SMILES parser must accept at least three digits for the isotope and values from 0 to 999.
A special subset of elements called the "organic subset" of B, C, N, O, P, S, F, Cl, Br, I, and * (the "wildcard" atom) can be written using the only the atomic symbol (that is, without the square brackets, H-count, etc.). An atom is specified this way has the following properties:
-
"implicit hydrogens" are added such that valence of the atom is in the lowest normal state for that element
-
the atom’s charge is zero
-
the atom has no isotopic specification
-
the atom has no chiral specification
The implicit hydrogen count is determined by summing the bond orders of the bonds connected to the atom. If that sum is equal to a known valence for the element or is greater than any known valence then the implicit hydrogen count is 0. Otherwise the implicit hydrogen count is the difference between that sum and the next highest known valence.
The "normal valence" for these elements is defined as:
Element | Valence |
---|---|
B |
3 |
C |
4 |
N |
3 or 5 |
O |
2 |
P |
3 or 5 |
S |
2, 4 or 6 |
halogens |
1 |
* |
unspecified |
Examples:
SMILES | Name |
---|---|
|
methane |
|
ammonia |
|
hydrochloric acid |
Note: The remaining atom properties, chirality and ring-closures, are discussed in later sections.
The '*'
atom represents an atom whose atomic number is unknown or unspecified. If it occurs
inside brackets, it can have its isotope, chirality, hydrogen count and charge specified. If it
occurs outside of brackets, it has no assumed isotope, a mass of zero, unspecified chirality, a
hydrogen count of zero, and a charge of zero.
SMILES | Name |
---|---|
|
ortho-substituted phenol |
The '*'
atom does not have any specific electronic properties or
valence. If specified outside of square brackets, it takes on the valence
implied by its bonds. If it is inside square brackets, it takes on the
valence implied by its bonds, hydrogens and/or charge.
A '*'
atom can be part of an aromatic ring. When deducing the
aromaticity of a ring system, the ring system is considered aromatic if
there is an element which could replace the '*'
and make the ring system
meet the aromaticity rules (see Aromaticity, below).
An "atom class" is an arbitrary integer, a number that has no chemical meaning. It is used by applications to mark atoms in ways that are meaningful only to the application. Multiple atoms may be labeled with the same atom class.
The atom class is specified after all other properties in square brackets. For example:
SMILES | Name |
---|---|
|
methane, atom’s class is 2 |
If the atom class is not specified then the atom class is zero.
The atom class is interpreted as a number, so both [CH2:5]
and [NH4+:005]
have an atom class of 5.
Atoms that are adjacent in a SMILES string are assumed to be joined by a single or aromatic bond (see Aromaticity). For example:
SMILES | Name |
---|---|
|
ethane |
|
ethanol |
|
n-butylamine |
|
n-butylamine |
Double, triple and quadruple bonds are represented by '='
, '#'
, and '$'
respectively:
SMILES | Name |
---|---|
|
ethene |
|
hydrogen cyanide |
|
2-butyne |
|
propanol |
|
octachlorodirhenate (III) |
A single bond can be explicitely represented with '-'
, but it is rarely
necessary.
SMILES | |
---|---|
|
same as: |
|
same as: |
|
same as: |
Note: The remaining bond symbols, ':\/'
, are discussed in
later sections.
An atom with three or more bonds is called a branched atom, and is represented using parentheses.
Depiction | SMILES | Name |
---|---|---|
|
2-ethyl-1-butanol |
Branches can be nested or "stacked" to any depth:
Depiction | SMILES | Name |
---|---|---|
|
2,4-dimethyl-3-penthanone |
|
pic here |
|
2-propyl-3-isopropyl-1-propanol |
|
thiosulfate |
The SMILES branch/chain rules allow nested parenthetical expressions (branches) to an arbitrary depth. For example, the following SMILES, though peculiar, is legal:
SMILES | Formula |
---|---|
|
C22H46 |
In a SMILES string such as "C1CCCCC1", the first occurrence of a ring-closure number (an "rnum") creates an "open bond" to the atom that precedes the ring-closure number (the "rnum"). When that same rnum is encountered later in the string, a bond is made between the two atoms, which typically forms a cyclic structure.
Depiction | SMILES | Name |
---|---|---|
|
cyclohexane |
|
|
perhydroisoquinoline |
If a bond symbol is present between the atom and rnum, it can be present on either or both bonded atoms. However, if it appears on both bonded atoms, the two bond symbols must be the same.
Depiction | SMILES | Name |
---|---|---|
|
cyclohexene |
|
|
cyclohexene (preferred from) |
|
|
cyclohexene |
|
|
invalid |
Ring closures must be matched pairs in a SMILES string, for example, C1CCC
is not a valid SMILES.
It is permissible to re-use ring-closure numbers. Once a particular number has been encountered twice, that number is available again for subsequent ring closures.
Depiction | SMILES | Name | Comment |
---|---|---|---|
|
dicyclohexyl |
both SMILES are valid |
|
|
dicyclohexyl |
Note that the ring number zero is valid, for example cyclohexane can be
written C0CCCCC0
.
Two-digit ring numbers are permitted, but must be preceded by the percent
'%'
symbol, such as C%25CCCCC%25
for cyclohexane. Three-digit numbers and
larger are never permitted. However, note that three digits are not invalid; for
example, C%123
is the same as C3%12
, that is, an atom with two rnum
specifications.
The digit(s) representing a ring-closure are interpreted as a number, not a
symbol, and two rnums match if their numbers match. Thus, C1CCCCC%01
is a
valid SMILES and is the same as C1CCCCC1
. Likewise, C%00CCCCC%00
is a
valid SMILES.
A single atom can have several ring-closure numbers, such as this spiro atom:
Depiction | SMILES | Name |
---|---|---|
|
spiro[5.5]undecane |
Two atoms cannot be joined by more than one bond, and an atom cannot be bonded to itself. For example, the following are not allowed:
SMILES | Comments |
---|---|
|
illegal, two bonds between one pair of atoms |
|
illegal, two bonds between one pair of atoms |
|
illegal, atom bonded to itself |
"Aromaticity" in SMILES is primarily for cheminformatics purposes. In a cheminformatics system, we’d like to have a single representation for each molecule. The Kekule form masks the inherent uniformity of the bonds in an aromatic ring. SMILES uses a simplified definition of aromaticity that facilitates substructure and exact-structure searches, as well as Normalization and Canonicalization of SMILES.
The definition of "aromaticity" in SMILES is not intended to imply anything about the physical or chemical properties of a substance. In many or most cases, the SMILES definition of aromaticity will match the chemist’s notion of what is aromatic, but in some cases it will not.
Aromaticity can be represented in one of two ways in a SMILES.
-
In the Kekule form, using alternating single and double bonds, with uppercase symbols for the atoms.
-
An atomic symbol that begins with a lowercase letter is an aromatic atom, such as
'c'
for aromatic carbon. When aromatic symbols are used, no bond symbols are needed.
A lowercase aromatic symbol is defined as an atom in the sp2 configuration in an aromatic or anti-aromatic ring system. For example:
Depiction | SMILES | Name |
---|---|---|
|
benzene |
|
|
||
|
indane |
|
|
||
|
furan |
|
|
||
|
cyclobutadiene |
|
|
The Kekule form is always acceptable for SMILES input. For output, the aromatic form (using lowercase letters) is preferred. The lowercase symbols eliminate the arbitrary choice of how to assign the single and double bonds, and provide a normalized form that more accurately reflects the electronic configuration.
THIS SECTION IS UNDER MAJOR REVISION, AND AT THIS POINT IS ONLY FOR DISCUSSION PURPOSES.
This proposed section is an attempt to simplify the rule-based system by enumerating all atom/bond configurations that are known to participate in aromatic systems.
A single, isolated ring that meets the following criteria is aromatic:
-
All atoms must be sp2 hybridized.
-
The number of available "shared" π electrons must equal 4N+2 where N = 1, 2 or 3 (Huckel’s rule).
Each element that can participate in an aromatic ring is defined to have the following number of π electrons:
Configuration | π Electrons | Example |
---|---|---|
1 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
0-1 |
|
|
0 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
1-2 |
|
|
1 |
|
|
2 |
|
|
2 |
|
|
2 |
|
In an aromatic system, all of the aromatic atoms must be sp2 hybridized, and the number of π electrons must meet Huckel’s 4n+2 criterion When parsing a SMILES, a parser must note the aromatic designation of each atom on input, then when the parsing is complete, the SMILES software must verify that electrons can be assigned without violating the valence rules, consistent with the sp2 markings, the specified or implied hydrogens, external bonds, and charges on the atoms.
The aromatic-bond symbol ':'
can be used between aromatic atoms, but it is never necessary; a
bond between two aromatic atoms is assumed to be aromatic unless it is explicitly represented as a
single bond '-'
. However, a single bond (nonaromatic bond) between two aromatic atoms must
be explicitly represented. For example:
Depiction | SMILES | Name |
---|---|---|
|
biphenyl |
Note: Some SMILES parsers interpret a lowercase letter as sp2 anywhere it appears;
for example, CccccC
would be interpreted as CC=CC=CC
.
The OpenSMILES specification does not allow this interpretation unless
nonstandard parsing is explicitely allowed by the user.
Hydrogens in a SMILES can be represented in three different ways:
Method | SMILES | Name | Comments |
---|---|---|---|
implicit hydrogen |
|
methane |
h-count deduced from normal valence (4) |
atom property |
|
methane |
h-count specified for heavy atom |
explicit hydrogen |
|
methane |
hydrogens represented as normal atoms |
All three forms are equivalent. However, some situations require that one form must be used:
-
Implicit hydrogen count may only be used for elements of the organic elements subset.
-
Any atom that is specified with square brackets must have its attached hydrogens explicitly represented, either as a hydrogen count or as normal atoms.
A hydrogen that meets one of the following criteria must be represented as an explicit atom:
-
hydrogens with charge (
[H+]
) -
a hydrogen connected to another hydrogen (such as molecular hydrogen,
[H][H]
) -
hydrogens with more than one bond (bridging hydrogens)
-
Deuterium
[2H]
and tritium[3H]
It is permissible to use a mixture of an atom h-count and explicit hydrogen. In such a case, the atom’s hydrogen count is the sum of the atomic h-count property and the number of attached hydrogens. For example:
SMILES | Name |
---|---|
|
methane |
|
methane |
|
deuteroethane |
The dot '.'
symbol (also called a "dot bond") is legal most places where
a bond symbol would occur, but indicates that the atoms are not
bonded. The most common use of the dot-bond symbol is to represent
disconnected and ionic compounds.
Depiction | SMILES | Name |
---|---|---|
|
sodium chloride |
|
|
phenol, 2-amino ethanol |
|
|
diammonium thiosulfate |
The dot can appear most places that a bond symbol is allowed, for example, the phenol example above can also be written:
Depiction | SMILES | Name |
---|---|---|
|
phenol, 2-amino ethanol |
|
|
phenal, 2-amino ethanol |
The second example above is an odd, but legal, use of parentheses and the dot bond, since the syntax allows a dot most places a regular bond could appear (the exception is that a dot can’t appear before a ring-closure digit).
Although dot-bonds are commonly used to represent compounds with disconnected parts, a dot-bond does not in itself mean that there are disconnected parts in the compound. See the following section regarding ring digits for some examples that illustrate this.
The dot bond cannot be used in front of a ring-closure digit. For example, C.1CCCCC.1
is illegal.
A ring-number specifications ("rnum") is most commonly used to specify a ring-closure bond, but
when used with the '.'
dot-bond symbol, it can also specify a non-ring bond. Two rnums in a SMILES
mean that the two atoms that precede the rnums are bonded. A dot-bond '.'
means that the atoms to
which it is adjacent in the SMILES string are not bonded to each other. By combining these
two constructs, one can "piece together" fragments of SMILES into a whole molecule. The following
SMILES illustrate this:
SMILES/Depiction | Fragment SMILES | Name |
---|---|---|
|
|
ethane |
|
|
propane |
|
1-bromo-2,3-dichlorobenzene |
This feature of SMILES provides a convenient method of enumerating the molecules of a combinatorial library using string concatenation.
A SMILES string can specify the cis/trans configuration around a double bond, and can specify the chiral configuration of specific atoms in a molecule.
SMILES strings do not represent all types of stereochemistry. Examples of stereochemistry that cannot be encoded into a SMILES include:
-
Gross conformational left or right handedness such as helices
-
Mechanical interferences, such as rotatable bonds that are constrained by mechanical interferences
-
Gross conformational stereochemistry such as the shape of a protein after folding
SMILES uses an atom-centered chirality specification, in which the atom’s left-to-right order in the SMILES string itself is used as the basis for the chirality marking.
Tetrahedral Chirality | |
---|---|
look from N towards C (chiral center) |
list the neighbors anticlockwise |
|
|
…or clockwise |
|
|
For the structure above, starting with the nitrogen atom, one "looks" toward the chiral
center. The remaining three neighbor atoms are written by listing them in anticlockwise order using the '@'
chiral property on the atom, or in clockwise order using the '@@'
chiral property, as illustrated
above. The '@'
symbol is a "visual mnemonic" in that the spiral around the character goes in the
anticlockwise direction, and means "anticlockwise" in the SMILES string (thus, '@@'
can be thought of
as anti-anti-clockwise).
A chiral center can be written starting anywhere in the SMILES string, and the choice of whether to list the remaining neighbor in clockwise or anticlockwise order is also arbitrary. The following SMILES are all equivalent and all specify the exact same chiral center illustrated above:
Equivalent SMILES | |
---|---|
|
|
|
|
|
|
|
|
|
|
One exception to the atom order is when these atoms are bonded to the chiral center via a ring bond. In these cases, it is to order of the bonds to these atoms that should be considered. The two SMILES below are equivalent:
Equivalent SMILES | |
---|---|
|
|
If one of the neighbor atoms is a hydrogen and is represented as an atomic property of the
chiral center (rather than explicitly as [H]
), then it is considered to be the first atom in the
clockwise or anticlockwise accounting. For example, if we replaced the bromine in the illustration
above with a hydrogen atom, its SMILES would be:
Implicit Hydrogen |
---|
|
The configuration of atoms around double bonds is specified by the bond symbols '/'
and '\'
.
These symbols always come in pairs, and indicate cis or trans with a visual "same side" or
"opposite side" concept. That is:
Depiction | SMILES | Name |
---|---|---|
|
trans-difluoroethane (both SMILES are equivalent) |
|
|
||
|
cis-difluoroethane (both SMILES are equivalent) |
|
|
||
========================================================================================================================= The "visual interpretation" of the This notation can be confusing when parentheses follow one of the alkene carbons: [options="header",frame="topbot",grid="rows",width="40%"] |
=========================================== |
SMILES |
Name |
|
trans-difluoroethane |
|
|
|
cis-difluoroethane |
|
=========================================== The "visual interpretation" of the "up-ness" or "down-ness" of each single bond is relative to the carbon atom, not the double bond, so the sense of the symbol changes when the fluorine atom moved from the left to the right side of the alkene carbon atom. Note: This point was not well documented in earlier SMILES specifications, and several SMILES
interpreters are known to interpret the A SMILES with conflicting up/down specifications is invalid: [options="header",frame="topbot",grid="rows",width="70%",cols="1,3"] |
============================================================================================================= |
SMILES |
|
Comment |
|
Invalid SMILES: Both the methyl and fluorine are "down" relative to the first allenal carbon |
============================================================================================================= It is permissible, but not required, that every atom attached to a double bond be marked. As long as at least two neighbor atoms, one on each end of the double bond, is marked, the "up-ness" or "down-ness" of the unmarked neighbors can be deduced. [options="header",frame="topbot",grid="rows",width="75%",cols="1,3"] |
============================================================================ |
SMILES |
Comment |
|
trans-difluoro configuration, position of methyl is implied |
============================================================================ Extended cis and trans configurations can be specified for conjugated allenes with an odd number of double bonds: [options="header",frame="topbot",grid="rows",width="50%"] |
============================================== |
SMILES |
Name |
|
trans-difluorobutatriene |
|
cis-difluorobutatriene |
============================================== Tetrahedral Allene-like Systems ^^^^^^^^^^^ Extended tetrahedral configurations can be specified for conjugated allenes with an even number
of double bonds. The normal tetrahedral rules using [options="header",frame="topbot",grid="rows",width="70%",cols="2,1"] |
============================================================ |
Depiction |
SMILES |
|
============================================================ To determine the correct clockwise or anticlockwise specification, the allene is conceptually "collapsed" into a single tetrahedral chiral center, and the resulting chirality is marked as a property of the center atom of the extended allene system. Square Planar Centers ^^^^^^^^ There are three tags to represent square planar stereochemistry: Background: Also note that each shape starts and ends at specific positions. Both U and Z start from atoms that are successors or predecessors when arranging the atoms in the plane in anti-clockwise or clockwise order. The start and end atoms for the Z shape are never adjacent in such an ordering. For each shape there are 4 possible ways to start (and end) drawing the line. Also, for all the drawn lines, the start and end point can be exchanged. Thus 3 shapes, 4 ways to start/end and 2 ways to order the atoms for a shape results in 3 * 4 * 2 or 24 combinations. This is the same as the number of permutations that can be made with 4 numbers (i.e. P(n) = n!). This allows for canonical SMILES writers to use any ordering to output the atoms. Trigonal Bipyramidal Centers ^^^^^^^^^^ The chiral atom’s neighbors are labeled a, [options="header",frame="topbot",grid="rows",width="40%"] |
|
===================================== |
Viewing Axis |
|
TB Number |
Order |
From |
Towards |
||
|
|
TB1 |
@ |
||
TB2 |
@@ |
|
|
TB3 |
|
@ |
TB4 |
|
@@ |
|
|
TB5 |
||
@ |
TB6 |
@@ |
|
|
TB7 |
@ |
||
TB8 |
@@ |
|
|
TB9 |
|
@ |
TB11 |
|
@@ |
|
|
TB10 |
||
@ |
TB12 |
@@ |
|
|
TB13 |
@ |
||
TB14 |
@@ |
|
|
TB15 |
|
@ |
TB20 |
|
@@ |
|
|
TB16 |
||
@ |
TB19 |
@@ |
|
|
TB17 |
@ |
||
TB18 |
@@ |
===================================== The following SMILES are all equivalent: [options="header",frame="topbot",grid="rows",width="70%"] |
=================================================== |
Equivalent SMILES |
|
|
|
|
|
|
|
=================================================== A tool like Daylight’s depict match can help debugging Background: The trigonal Bipyramidal chirality is considerably more complex than any of the previous classes since the chiral atom has an extra neighbor. This increases the number of combinations to order the neighbors in a SMILES string from 24 to 120. Since every order of the atoms should be representable by a SMILES string, the 20 TB primitives suffice for this. In the trigonal bipyramidal geometry, 3 atoms lie in a plane and the remaining 2 atoms are perpendicular to this plane and are on the opposite sides of the plane forming an axis. The anti-clockwise and clockwise refers to the order of the 3 plane atoms when viewing along the axis in the specified direction. Unlike tetrahedral geometry, reordering the 3 atoms does not require that the axis be changed. Given an order of the axis atoms the 3 plane atoms are ordered either anti-clockwise or clockwise. Although there are P(3) = 3! or 6 possible permutations of 3 numbers, exchanging a pair inverts the parity and the 6 permutations are therefore divided in two groups (@, @@) containing 3 permutations each. Because there are now two atoms that determine the viewing direction along the axis, these atoms too can be in any of the 5 positions in a permutation. Given the atoms as the set {a, b, c, d, e}, there are C(5, 2) = 20 possible combinations of 5 things taken 2 at a time. However, the use of the @ and @@ symbols halve this to 10. These 10 combinations are the ordered sets (a, e), (a, d) (a, c), (a, b), (b, e), (b, d), (b, c), (c, e), (c, d) and (d, e). Each of these pairs correspond to an TB primitive. Octahedral Centers ^^^^^^ For 6 atoms, the unit permutation is [options="header",frame="topbot",grid="rows",width="40%"] |
===================================== |
Shape |
Viewing Axis |
OH Number |
|
Order |
From |
|
Towards |
||
|
|
|