Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move SMARTS strings into a dataset #112

Open
Aariq opened this issue Aug 6, 2024 · 1 comment · May be fixed by #114
Open

Move SMARTS strings into a dataset #112

Aariq opened this issue Aug 6, 2024 · 1 comment · May be fixed by #114
Assignees

Comments

@Aariq
Copy link
Collaborator

Aariq commented Aug 6, 2024

By moving the SMARTS strings in get_fx_groups() (

volcalc/R/get_fx_groups.R

Lines 100 to 141 in 80bed5b

# *_pattern are SMARTS strings: https://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html
carbon_dbl_bonds_pattern <- "C=C" #non-aromatic carbon double bonds
CCCO_pattern <- "C(C=C[AR1])(=O)[AR1]" #C=C-C=O in a non-aromatic ring
# ether_alkyl_pattern <- "[OD2]([C!R1])[C!R1]" #currently unused--ether_alkly calculated as total - other ethers
ether_alicyclic_pattern <- "[OD2]([C!R0])[C!R0]"
ether_aromatic_pattern <- "O(c)[C,c]" #only one of the carbons has to be aromatic
nitro_pattern <- "[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]"
hydroxyl_aromatic_pattern <- "[OX2H]c"
nitrate_pattern <- "[$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]"
#TODO need patterns for amines that don't pick up amides
amine_primary_pattern <- "[NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6X4]"
amine_secondary_pattern <- "[NX3H1!$(NC=[!#6])!$(NC#[!#6])]([#6X4])[#6X4]"
amine_tertiary_pattern <- "[NX3H0!$(NC=[!#6])!$(NC#[!#6])]([#6X4])([#6X4])[#6X4]"
amine_aromatic_pattern <- "[NX3;!$(NO)]c"
amide_primary_pattern <- "[CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3H2]"
amide_secondary_pattern <- "[CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3H1][#6;!$(C=[O,N,S])]"
amide_tertiary_pattern <-
"[CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3H0]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])]"
# amide_total_pattern <- "[CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]"
carbonylperoxynitrate_pattern <- "*C(=O)OO[N+1](=O)[O-1]"
peroxide_pattern <- "[OX2D2][OX2D2]" #this captures carbonylperoxynitrates too
hydroperoxide_pattern <- "[OX2][OX2H,OX1-]" #this captures peroxyacids too
carbonylperoxyacid_pattern <- "[CX3;$([R0][#6]),$([H1R0])](=[OX1])[OX2][$([OX2H]),$([OX1-])]"
nitroester_pattern <- "C(=O)(OC)C~[NX3](-,=[OX1])-,=[OX1]"
# This captures OH groups on a ring that also has a nitro group (para, ortho, or meta). Need to correct aromatic hydroxyl count later.
nitrophenol_pattern <-
"[OX2H][$(c1ccccc1[$([NX3](=O)=O),$([NX3+](=O)[O-])]),$(c1cccc(c1)[$([NX3](=O)=O),$([NX3+](=O)[O-])]),$(c1ccc(cc1)[$([NX3](=O)=O),$([NX3+](=O)[O-])])]"
phosphoric_acid_pattern <-
"[$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]"
phosphoric_ester_pattern <-
"[$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]"
sulfate_pattern <-
"[$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-])([OX2H,OX1H0-])[OX2][#6])]"
#sulfonate groups; sulfonate ions, and conjugate acid, sulfonic acids
sulfonate_pattern <-
"[#16X4](=[OX1])(=[OX1])([#6])[*$([O-1]),*$([OH1]),*$([OX2H0])]"
thiol_pattern <- "[#16X2H]"
carbothioester_pattern <- "S([#6])[CX3](=O)[#6]"
) into a dataframe included with the package, it would simplify the code, make the functional group definitions more transparent to users, and potentially make get_fx_groups() easier to expand to new methods besides SIMPOL.1.

Possible column names: method (?), functional_group, smarts, function. function would be useful for the groups that are captured by a ChemmineOB function other than a SMARTS search. For all the SMARTS strings, function would be "ChemmineR::smartsSearchOB"

@Aariq
Copy link
Collaborator Author

Aariq commented Aug 6, 2024

I feel like this is a superior option to #90 because it is less work and probably more useful.

@Aariq Aariq self-assigned this Aug 6, 2024
@Aariq Aariq linked a pull request Aug 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant