diff --git a/.Rbuildignore b/.Rbuildignore new file mode 100644 index 0000000..74f9ad9 --- /dev/null +++ b/.Rbuildignore @@ -0,0 +1,10 @@ +^.*\.Rproj$ +^\.Rproj\.user$ +^data-raw$ +^README\.Rmd$ +^CODE_OF_CONDUCT\.md$ +^LICENSE\.md$ +^docs$ +^pkgdown$ +^\.github$ +^CRAN-RELEASE$ diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..5b6a065 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +.Rproj.user +.Rhistory +.RData +.Ruserdata diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 0000000..b8d5f67 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,126 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +We as members, contributors, and leaders pledge to make participation in our +community a harassment-free experience for everyone, regardless of age, body +size, visible or invisible disability, ethnicity, sex characteristics, gender +identity and expression, level of experience, education, socio-economic status, +nationality, personal appearance, race, religion, or sexual identity and +orientation. + +We pledge to act and interact in ways that contribute to an open, welcoming, +diverse, inclusive, and healthy community. + +## Our Standards + +Examples of behavior that contributes to a positive environment for our +community include: + +* Demonstrating empathy and kindness toward other people +* Being respectful of differing opinions, viewpoints, and experiences +* Giving and gracefully accepting constructive feedback +* Accepting responsibility and apologizing to those affected by our mistakes, +and learning from the experience +* Focusing on what is best not just for us as individuals, but for the overall +community + +Examples of unacceptable behavior include: + +* The use of sexualized language or imagery, and sexual attention or +advances of any kind +* Trolling, insulting or derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or email +address, without their explicit permission +* Other conduct which could reasonably be considered inappropriate in a +professional setting + +## Enforcement Responsibilities + +Community leaders are responsible for clarifying and enforcing our standards +of acceptable behavior and will take appropriate and fair corrective action in +response to any behavior that they deem inappropriate, threatening, offensive, +or harmful. + +Community leaders have the right and responsibility to remove, edit, or reject +comments, commits, code, wiki edits, issues, and other contributions that are +not aligned to this Code of Conduct, and will communicate reasons for moderation +decisions when appropriate. + +## Scope + +This Code of Conduct applies within all community spaces, and also applies +when an individual is officially representing the community in public spaces. +Examples of representing our community include using an official e-mail +address, posting via an official social media account, or acting as an appointed +representative at an online or offline event. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported to the community leaders responsible for enforcement at CINTESIS. +All complaints will be reviewed and investigated promptly and fairly. + +All community leaders are obligated to respect the privacy and security of the +reporter of any incident. + +## Enforcement Guidelines + +Community leaders will follow these Community Impact Guidelines in determining +the consequences for any action they deem in violation of this Code of Conduct: + +### 1. Correction + +**Community Impact**: Use of inappropriate language or other behavior deemed +unprofessional or unwelcome in the community. + +**Consequence**: A private, written warning from community leaders, providing +clarity around the nature of the violation and an explanation of why the +behavior was inappropriate. A public apology may be requested. + +### 2. Warning + +**Community Impact**: A violation through a single incident or series of +actions. + +**Consequence**: A warning with consequences for continued behavior. No +interaction with the people involved, including unsolicited interaction with +those enforcing the Code of Conduct, for a specified period of time. This +includes avoiding interactions in community spaces as well as external channels +like social media. Violating these terms may lead to a temporary or permanent +ban. + +### 3. Temporary Ban + +**Community Impact**: A serious violation of community standards, including +sustained inappropriate behavior. + +**Consequence**: A temporary ban from any sort of interaction or public +communication with the community for a specified period of time. No public or +private interaction with the people involved, including unsolicited interaction +with those enforcing the Code of Conduct, is allowed during this period. +Violating these terms may lead to a permanent ban. + +### 4. Permanent Ban + +**Community Impact**: Demonstrating a pattern of violation of community +standards, including sustained inappropriate behavior, harassment of an +individual, or aggression toward or disparagement of classes of individuals. + +**Consequence**: A permanent ban from any sort of public interaction within the +community. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], +version 2.0, +available at . + +Community Impact Guidelines were inspired by [Mozilla's code of conduct +enforcement ladder](https://github.com/mozilla/diversity). + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see the FAQ at +. Translations are available at . diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..4c80325 --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,39 @@ +Package: grantham +Type: Package +Title: Grantham distance +Version: 0.1.0 +Authors@R: c( + person(given = "Ramiro", family = "Magno", + email = "ramiro.magno@gmail.com", + role = c("aut", "cre"), + comment = c(ORCID = "0000-0001-5226-3441")), + person(given = "Isabel", family = "Duarte", + email = "iduarte.scientist@gmail.com", + role = "aut", + comment = c(ORCID = "0000-0003-0060-2936")), + person(given = "Ana-Teresa", family = "Maia", + email = "maia.anateresa@gmail.com", role = "aut", + comment = c(ORCID = "0000-0002-0454-9207")), + person("CINTESIS", + role = c("cph", "fnd")) + ) +Description: A minimal set of routines to calculate the Grantham distance. + The Grantham distance attempts to provide a proxy for the evolutionary + distance between two amino acids based on three key chemical + properties: composition, polarity and molecular volume. In turn, + evolutionary distance is used as a proxy for the impact of missense + mutations. The higher the distance, the more deleterious the + substitution is expected to be. +License: MIT + file LICENSE +Encoding: UTF-8 +LazyData: true +RoxygenNote: 7.1.2 +Depends: + R (>= 2.10) +Imports: + tibble, + magrittr, + vctrs, + dplyr, + tidyr, + rlang diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..5cb59a6 --- /dev/null +++ b/LICENSE @@ -0,0 +1,2 @@ +YEAR: 2021 +COPYRIGHT HOLDER: Ramiro Magno diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 0000000..a242c45 --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,21 @@ +# MIT License + +Copyright (c) 2021 Ramiro Magno + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..9488cd3 --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,12 @@ +# Generated by roxygen2: do not edit by hand + +export("%>%") +export(amino_acid_pairs) +export(amino_acids) +export(grantham_distance) +export(grantham_distance_exact) +export(grantham_distance_original) +export(grantham_equation) +importFrom(magrittr,"%>%") +importFrom(rlang,.data) +importFrom(tibble,tibble) diff --git a/R/amino_acid_index.R b/R/amino_acid_index.R new file mode 100644 index 0000000..d78a11a --- /dev/null +++ b/R/amino_acid_index.R @@ -0,0 +1,6 @@ +amino_acid_index <- function(amino_acid) { + match(amino_acid, amino_acids()) +} + +# aa_idx: abbreviated form of `amino_acid_index`. +aa_idx <- amino_acid_index diff --git a/R/amino_acid_pairs.R b/R/amino_acid_pairs.R new file mode 100644 index 0000000..b6fd0ee --- /dev/null +++ b/R/amino_acid_pairs.R @@ -0,0 +1,60 @@ +#' Generate amino acid pairs +#' +#' This function generates combinations of amino acids in pairs. By default, it +#' generates all pair combinations of the 20 standard amino acids. +#' +#' @param x A character vector of amino acids (three-letter codes). +#' @param y Another character vector of amino acids (three-letter codes). +#' @param keep_self Whether to keep pairs involving the same amino acid. +#' @param keep_duplicates Whether to keep duplicated pairs. +#' @param keep_reverses Whether to keep pairs that are reversed versions of +#' others. E.g. if `keep_reverses` is `TRUE` the pairs `"Ser"`-`"Arg"` and +#' `"Arg"`-`"Ser"` will be kept in the returned tibble; however, if +#' `keep_reverses` is `FALSE`, only the first pair is preserved in the output. +#' +#' @return A [tibble][tibble::tibble-package] of amino acid pairs. +#' +#' @examples +#' # Generate all pairs of the 20 standard amino acids +#' amino_acid_pairs() +#' +#' # Remove the self-to-self pairs +#' amino_acid_pairs(keep_self = FALSE) +#' +#' # Generate specific combinations of Ser against Ala and Trp. +#' amino_acid_pairs(x = 'Ser', y = c('Ala', 'Trp')) +#' @md +#' @importFrom rlang .data +#' @export +amino_acid_pairs <- + function(x = amino_acids(), + y = amino_acids(), + keep_self = TRUE, + keep_duplicates = TRUE, + keep_reverses = TRUE) { + + if(!all_amino_acids(x)) + stop('`x` must be a vector of three-letter code amino acids') + + if (!all_amino_acids(y)) + stop('`y` must be a vector of three-letter code amino acids' + ) + + tbl <- tidyr::expand_grid(x = x, y = y) + tbl <- `if`(keep_self, tbl, dplyr::filter(tbl, x != y)) + tbl <- `if`(keep_duplicates, tbl, dplyr::distinct(tbl)) + + tbl <- + if (keep_reverses) { + tbl # do nothing + } else { + tbl %>% + dplyr::rowwise() %>% + dplyr::mutate(key = paste(sort(c(x, y)), collapse = '-')) %>% + dplyr::ungroup() %>% + dplyr::distinct(.data$key, .keep_all = TRUE) %>% + dplyr::select(-'key') + } + + return(tbl) +} diff --git a/R/amino_acids.R b/R/amino_acids.R new file mode 100644 index 0000000..5ca2122 --- /dev/null +++ b/R/amino_acids.R @@ -0,0 +1,16 @@ +#' The 20 standard amino acids +#' +#' The 20 amino acids that are encoded directly by the codons of the universal +#' genetic code. +#' +#' @return Three-letter codes of the standard amino acids. +#' +#' @examples +#' amino_acids() +#' +#' @export +amino_acids <- function() { + c("Ser", "Arg", "Leu", "Pro", "Thr", "Ala", "Val", "Gly", "Ile", + "Phe", "Tyr", "Cys", "His", "Gln", "Asn", "Lys", "Asp", "Glu", + "Met", "Trp") +} diff --git a/R/amino_acids_properties.R b/R/amino_acids_properties.R new file mode 100644 index 0000000..335e575 --- /dev/null +++ b/R/amino_acids_properties.R @@ -0,0 +1,13 @@ +#' Amino acid side chain property values +#' +#' A dataset containing the amino acid side chain property values +#' ---composition, polarity and molecular volume. These values were obtained +#' from Table 1, Grantham (1974), \doi{10.1126/science.185.4154.862}. +#' +#' @examples +#' amino_acids_properties +#' +#' @source +#' Table 1, Grantham (1974), \doi{10.1126/science.185.4154.862}. +#' +"amino_acids_properties" diff --git a/R/grantham-package.R b/R/grantham-package.R new file mode 100644 index 0000000..404c70c --- /dev/null +++ b/R/grantham-package.R @@ -0,0 +1,7 @@ +#' @keywords internal +"_PACKAGE" + +## usethis namespace: start +#' @importFrom tibble tibble +## usethis namespace: end +NULL diff --git a/R/grantham_distance.R b/R/grantham_distance.R new file mode 100644 index 0000000..4c3df62 --- /dev/null +++ b/R/grantham_distance.R @@ -0,0 +1,281 @@ +#' Grantham distance +#' +#' @description +#' This function calculates Grantham's distance \eqn{d_{i,j}} between two +#' amino acids (\eqn{i} and \eqn{j}) based on their chemical properties: +#' +#' \deqn{d_{i,j} = \rho ((\alpha (c_i-c_j)^2 + \beta (p_i-p_j)^2 + \gamma (v_i-v_j)^2)^\frac{1}{2}} +#' +#' This calculation is based on three amino acid side chain properties that were +#' found to be the three strongest correlators with the relative substitution +#' frequency (RSF) (references cited in Grantham (1974)), namely: +#' +#' - composition \eqn{c}, meaning the atomic weight ratio of hetero (noncarbon) +#' elements in end groups or rings to carbons in the side chain. +#' - polarity \eqn{p}; +#' - molecular volume \eqn{v}. +#' +#' Each property difference is weighted by dividing by the mean distance found +#' with it alone in the formula. The constants \eqn{\alpha}, \eqn{\beta} and +#' \eqn{\gamma} are squares of the inverses of mean distances of each property, +#' respectively. +#' +#' The distances reported by Grantham (1972) are further scaled by a factor +#' ---here coined \eqn{\rho}--- such that the mean of all distances is 100. +#' Although this factor is not explicitly included in Grantham's distance +#' formula, it is actually used for calculating the amino acid pair distances +#' reported in Table 2 of Grantham's paper. So, for all intents and purposes, +#' this factor should be regarded as part of the formula used to calculate +#' Grantham distance, and therefore we include it explicitly in the equation +#' above. +#' +#' If you want to calculate Grantham's distance right off from the identity of +#' the amino acids, instead of using their chemical properties, then use +#' [grantham_distance()]. +#' +#' @param c_i composition value for the _ith_ amino acid. +#' @param c_j composition value for the _jth_ amino acid. +#' @param p_i polarity value for the _ith_ amino acid. +#' @param p_j polarity value for the _jth_ amino acid. +#' @param v_i molecular volume value for the _ith_ amino acid. +#' @param v_j molecular volume value for the _jth_ amino acid. +#' @param alpha The constant \eqn{\alpha} in the equation of Grantham's +#' paper, in page 863. +#' @param beta The constant \eqn{\beta} in the equation of Grantham's +#' paper, in page 863. +#' @param gamma The constant \eqn{\gamma} in the equation of Grantham's +#' paper, in page 863. +#' @param rho Grantham's distances reported in Table 2, Science (1974). +#' 185(4154): 862--4 by R. Grantham, are scaled by a factor (here named +#' \eqn{\rho}) such that the mean value of all distances are 100. The `rho` +#' parameter allows this factor \eqn{\rho} to be changed. By default +#' \eqn{\rho=50.723}, the same value used by Grantham. This value is +#' originally mentioned in the caption of Table 2 of the aforementioned paper. +#' +#' @return A double vector of Grantham's distances. +#' +#' @seealso Check [amino_acids_properties] for a table of the three property +#' values that can be used with this formula. This data set is from Table 1, +#' Science (1974). 185(4154): 862--4 by R. Grantham. +#' +#' @md +#' @export +grantham_equation <- + function(c_i, + c_j, + p_i, + p_j, + v_i, + v_j, + alpha = 1.833, + beta = 0.1018, + gamma = 0.000399, + rho = 50.723) { + + d_ij <- rho * + (alpha * (c_i - c_j) ^ 2 + + beta * (p_i - p_j) ^ 2 + + gamma * (v_i - v_j) ^ 2) ^ 0.5 + + return(d_ij) + } + +#' Grantham distance +#' +#' @description +#' This function calculates the Grantham distance for pairs of amino acids. +#' Amino acid identities should be provided as three-letter codes in `x` and +#' `y`. Amino acids identified in `x` and `y` are matched element-wise, i.e. the +#' first element of `x` is paired with the first element of `y`, and so on. +#' +#' The Grantham distance attempts to provide a proxy for the evolutionary +#' distance between two amino acids based on three key chemical properties: +#' composition, polarity and molecular volume. In turn, evolutionary distance is +#' used as a proxy for the impact of missense substitutions. The higher the +#' distance, the more deleterious the substitution is. +#' +#' The distance calculation is provided by two methods. The so-called _original_ +#' method, meaning that the amino acid distances used are the ones provided by +#' Grantham in his original publication in Table 2. This is the default method. +#' In addition, you may choose the _exact_ method, which uses the chemical +#' properties provided in Grantham's Table 1 to compute the amino acid +#' differences anew. The distances calculated with the _exact_ method are not +#' rounded to the nearest integer and will differ by ~1 unit for some amino acid +#' pairs from the _original_ method. +#' +#' If you want to calculate Grantham's distance by providing the values of the +#' amino acid properties explicitly, then use [grantham_equation()] instead. +#' +#' @param x A character vector of amino acid three-letter codes. +#' @param y A character vector of amino acid three-letter codes. +#' @param method Either `"original"` (default) or `"exact"`, see description for +#' more details. +#' @param alpha The constant \eqn{\alpha} in the equation of Grantham's +#' paper, in page 863. +#' @param beta The constant \eqn{\beta} in the equation of Grantham's +#' paper, in page 863. +#' @param gamma The constant \eqn{\gamma} in the equation of Grantham's +#' paper, in page 863. +#' @param rho Grantham's distances reported in Table 2, Science (1974). +#' 185(4154): 862--4 by R. Grantham, are scaled by a factor (here named +#' \eqn{\rho}) such that the mean value of all distances are 100. The `rho` +#' parameter allows this factor \eqn{\rho} to be changed. By default +#' \eqn{\rho=50.723}, the same value used by Grantham. This value is +#' originally mentioned in the caption of Table 2 of the aforementioned paper. +#' +#' @return A [tibble][tibble::tibble-package] of Grantham's distances for each +#' amino acid pair. +#' +#' @md +#' +#' @source \doi{10.1126/science.185.4154.862}. +#' +#' @examples +#' # Grantham's distance between Serine (Ser) and Glutamate (Glu) +#' grantham_distance('Ser', 'Glu') +#' +#' # Grantham's distance between Serine (Ser) and Glutamate (Glu) +#' # with the "exact" method +#' grantham_distance('Ser', 'Glu', method = 'exact') +#' +#' # `grantham_distance()` is vectorised +#' # amino acids are paired element-wise between `x` and `y` +#' grantham_distance(x = c('Pro', 'Gly'), y = c('Glu', 'Arg')) +#' +#' # Use `amino_acid_pairs()` to generate pairs (by default generates all pairs) +#' aa_pairs <- amino_acid_pairs() +#' grantham_distance(x = aa_pairs$x, y = aa_pairs$y) +#' +#' @export +grantham_distance <- + function(x, + y, + method = c('original', 'exact'), + alpha = 1.833, + beta = 0.1018, + gamma = 0.000399, + rho = 50.723) { + + if(!all_amino_acids(x)) + stop('`x` should contain only amino acid three-letter codes.') + + if(!all_amino_acids(y)) + stop('`y` should contain only amino acid three-letter codes.') + + # `rec`: recycled vectors `x` and `y`: + rec <- vctrs::vec_recycle_common(x = x, y = y) + + # Check that `method` is either 'original' or 'exact'. + method <- match.arg(method) + + if(identical(method, 'original')) + return(grantham_distance_original(x = rec$x, + y = rec$y)) + else + return( + grantham_distance_exact( + x = rec$x, + y = rec$y, + alpha = alpha, + beta = beta, + gamma = gamma, + rho = rho + ) + ) +} + +#' Grantham's distance (original) +#' +#' This function calculates the Grantham's distance for pairs of amino acids. It +#' uses the pre-calculated distances for each amino acid pair as published in +#' Table 2 of Science (1974). 185(4154): 862--4 by R. Grantham. +#' +#' @param x A character vector of amino acid three-letter codes. +#' @param y A character vector of amino acid three-letter codes. +#' +#' @return A [tibble][tibble::tibble-package] of Grantham's distances for each +#' amino acid pair. +#' +#' @md +#' @source \doi{10.1126/science.185.4154.862}. +#' @keywords internal +#' @export +grantham_distance_original <- function(x, y) { + + amino_acid_pairs <- matrix(c(aa_idx(x), aa_idx(y)), ncol = 2) + tbl <- tibble::tibble(x = x, y = y, d = grantham_distances_matrix[amino_acid_pairs]) + + return(tbl) +} + +#' Grantham's distance (exact) +#' +#' @md +#' +#' @description +#' This function calculates the Grantham's distance for pairs of amino acids. It +#' uses the values for the amino acid properties as published in Table 1 of +#' Science (1974). 185(4154): 862--4 by R. Grantham. +#' +#' @details +#' Contrary to Grantham's distances presented in Table 2 of Grantham's paper, the +#' distances returned by this funtion are calculated anew starting from the +#' amino acid properties (composition, polarity and molecular volume). No +#' rounding to nearest integer is performed. +#' +#' @param x A character vector of amino acid three-letter codes, e.g. `"Ala"` +#' (Alanine). +#' @param y A character vector of amino acid three-letter codes. +#' @param alpha The constant \eqn{\alpha} in the equation of Grantham's +#' paper, in page 863. +#' @param beta The constant \eqn{\beta} in the equation of Grantham's +#' paper, in page 863. +#' @param gamma The constant \eqn{\gamma} in the equation of Grantham's +#' paper, in page 863. +#' @param rho Grantham's distances reported in Table 2, Science (1974). +#' 185(4154): 862--4 by R. Grantham, are scaled by a factor (here named +#' \eqn{\rho}) such that the mean value of all distances are 100. The `rho` +#' parameter allows this factor \eqn{\rho} to be changed. By default +#' \eqn{\rho=50.723}, the same value used by Grantham. This value is +#' originally mentioned in the caption of Table 2 of the aforementioned paper. +#' +#' @return A [tibble][tibble::tibble-package] of Grantham's distances for each +#' amino acid pair. +#' @source \doi{10.1126/science.185.4154.862}. +#' +#' @seealso [grantham_equation()] +#' +#' @examples +#' grantham_distance_exact(c('Ser', 'Ser'), c('Pro', 'Trp')) +#' +#' @keywords internal +#' @export +grantham_distance_exact <- function(x, + y, + alpha = 1.833, + beta = 0.1018, + gamma = 0.000399, + rho = 50.723) { + + # Filter the properties table for the queried amino acids + x_tbl <- amino_acids_properties[aa_idx(x), ] + y_tbl <- amino_acids_properties[aa_idx(y), ] + + # Grantham's distance computed from the amino acids' properties as provided in + # Table 1 of Grantham (1974). + d <- grantham_equation(c_i = x_tbl$c, + c_j = y_tbl$c, + p_i = x_tbl$p, + p_j = y_tbl$p, + v_i = x_tbl$v, + v_j = y_tbl$v, + alpha = alpha, + beta = beta, + gamma = gamma, + rho = rho + ) + + tbl <- tibble::tibble(x = x, y = y, d = d) + + return(tbl) +} diff --git a/R/grantham_distances_matrix.R b/R/grantham_distances_matrix.R new file mode 100644 index 0000000..873572b --- /dev/null +++ b/R/grantham_distances_matrix.R @@ -0,0 +1,13 @@ +#' Grantham distance matrix +#' +#' A dataset containing Grantham distances in the format of a matrix. These +#' values were obtained from Table 2, Grantham (1974), +#' \doi{10.1126/science.185.4154.862}. +#' +#' @examples +#' grantham_distances_matrix +#' +#' @source +#' Table 2, Grantham (1974), \doi{10.1126/science.185.4154.862}. +#' +"grantham_distances_matrix" diff --git a/R/ij2k.R b/R/ij2k.R new file mode 100644 index 0000000..e0ffe98 --- /dev/null +++ b/R/ij2k.R @@ -0,0 +1,12 @@ +#' Convert an (i, j) index to a linear index. +#' +#' Converts an (i, j) index to a linear index. Converts the double index of +#' a square matrix to the corresponding linear one. This is column-major +#' as it is default in R. +#' +#' @param i i index, i.e. row position; indexing starts at 1. +#' @param j j index, i.e. column position; indexing starts at 1. +#' @param n size of the square matrix. +#' @return Linear position. +#' @keywords internal +ij2k <- function(i, j, n) (j - 1) * n + i diff --git a/R/is_amino_acid.R b/R/is_amino_acid.R new file mode 100644 index 0000000..56ff4d8 --- /dev/null +++ b/R/is_amino_acid.R @@ -0,0 +1,9 @@ +#' @keywords internal +is_amino_acid <- function(x) { + x %in% amino_acids() +} + +#' @keywords internal +all_amino_acids <- function(x) { + all(is_amino_acid(x)) +} diff --git a/R/sltm_k.R b/R/sltm_k.R new file mode 100644 index 0000000..3917e26 --- /dev/null +++ b/R/sltm_k.R @@ -0,0 +1,20 @@ +#' Linear positions of the entries of a strictly lower triangular matrix +#' +#' Returns the linear indices of the non-zero entries of a strictly lower +#' triangular matrix. +#' +#' @param n Dimension of a `n` by `n` square matrix. +#' +#' @return An integer vector of linear positions in column-major order. +#' @md +#' +#' @examples +#' sltm_k(3) +#' +#' @noRd +#' @keywords internal +sltm_k <- function(n) { + if(!(n > 1)) stop('`n` must be greater than 1') + + utils::combn(seq_len(n), 2, function(ij) {ij2k(i = ij[2], j = ij[1], n)}) +} diff --git a/R/sysdata.rda b/R/sysdata.rda new file mode 100644 index 0000000..0584f58 Binary files /dev/null and b/R/sysdata.rda differ diff --git a/R/utils-pipe.R b/R/utils-pipe.R new file mode 100644 index 0000000..fd0b1d1 --- /dev/null +++ b/R/utils-pipe.R @@ -0,0 +1,14 @@ +#' Pipe operator +#' +#' See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details. +#' +#' @name %>% +#' @rdname pipe +#' @keywords internal +#' @export +#' @importFrom magrittr %>% +#' @usage lhs \%>\% rhs +#' @param lhs A value or the magrittr placeholder. +#' @param rhs A function call using the magrittr semantics. +#' @return The result of calling `rhs(lhs)`. +NULL diff --git a/README.Rmd b/README.Rmd new file mode 100644 index 0000000..7819ff0 --- /dev/null +++ b/README.Rmd @@ -0,0 +1,133 @@ +--- +output: github_document +--- + + + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + fig.path = "man/figures/README-", + out.width = "100%" +) +``` + +# grantham + + + + +The goal of `{grantham}` is to provide a minimal set of routines to calculate +the Grantham distance [1]. + +The Grantham distance attempts to provide a proxy for the evolutionary distance +between two amino acids based on three key chemical properties: composition, +polarity and molecular volume. In turn, evolutionary distance is used as a proxy +for the impact of missense mutations. The higher the distance, the more +deleterious the substitution is expected to be. + +## Installation + +You can install the development version of `{grantham}` like so: + +``` r +# install.packages("remotes") +remotes::install_github("maialab/grantham") +``` + +## Usage + +Grantham distance between two amino acids: + +```{r} +library(grantham) + +grantham_distance(x = 'Ser', y = 'Phe') +``` + +The function `grantham_distance()` is vectorised with amino acids being matched element-wise to form pairs for comparison: + +```{r} +grantham_distance(x = c('Ser', 'Arg'), y = c('Phe', 'Leu')) +``` + +The two vectors of amino acids must have compatible sizes in the sense of +[vec_recycle()](https://vctrs.r-lib.org/reference/vec_recycle.html) for element +recycling to be possible, i.e., either the two vectors have the same length, or +one of them is of length one, and it is recycled up to the length of the other. + +```{r} +# `'Ser'` is recycled to match the length of the second vector, i.e. 3. +grantham_distance(x = 'Ser', y = c('Phe', 'Leu', 'Arg')) +``` + +Use the function `amino_acid_pairs()` to generate all 20 x 20 amino acid pairs: + +```{r} +aa_pairs <- amino_acid_pairs() +aa_pairs +``` + +And now calculate all Grantham distances for all pairs `aa_pairs`: + +```{r} +grantham_distance(x = aa_pairs$x, y = aa_pairs$y) +``` + +Because distances are symmetric, and pairs formed by the same amino acid are +trivially zero, you might want to exclude these pairs: + +```{r} +# `keep_self = FALSE`: excludes pairs such as ("Ser", "Ser") +# `keep_reverses = FALSE`: excludes reversed pairs, e.g. ("Arg", "Ser") will be +# removed because ("Ser", "Arg") already exists. +aa_pairs <- amino_acid_pairs(keep_self = FALSE, keep_reverses = FALSE) + +# These amino acid pairs are the 190 pairs shown in Table 2 of Grantham's +# original publication. +aa_pairs + +# Grantham distance for the 190 unique amino acid pairs +grantham_distance(x = aa_pairs$x, y = aa_pairs$y) +``` + +The Grantham distance $d_{i,j}$ for two amino acids $i$ and $j$ is: + +$$d_{i,j} = \rho (\alpha (c_i-c_j)^2+\beta (p_i-p_j)^2+ \gamma (v_i-v_j)^2)^{1/2}\ .$$ + +The distance is based on three chemical properties of amino acid side chains: + +- composition ($c$) +- polarity ($p$) +- molecular volume ($v$) + +We provide a data set with these properties: + +```{r} +amino_acids_properties +``` + +If you want to calculate the Grantham distance from these property values you +may use the function `grantham_equation()`. + + +## Related software + +Other sources we've found in the R ecosystem that also provide code for calculation of the Grantham distance: + +- A GitHub Gist by Daniel E Cook provides the function `calculate_grantham()`, see [Fetch_Grantham.R](https://gist.github.com/danielecook/501f03650bca6a3db31ff3af2d413d2a). +- The `{midasHLA}` package includes the unexported function `distGrantham()` in [utils.R](https://github.com/Genentech/midasHLA/blob/ec29296f9bfd7c4fae9e2040592b618e5f2a99a1/R/utils.R). +- The `{HLAdivR}` package exports a data set with the Grantham distances in the format of a matrix, see [data.R]( https://github.com/rbentham/HLAdivR/blob/master/R/data.R). + +## Code of Conduct + +Please note that the `{grantham}` package is released with a [Contributor Code +of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). +By contributing to this project, you agree to abide by its terms. + + +## References + +1. Grantham, R. _Amino acid difference formula to help explain protein evolution_. Science 185, 862--864 +(1974). doi: [10.1126/science.185.4154.862](https://doi.org/10.1126/science.185.4154.862). diff --git a/README.md b/README.md new file mode 100644 index 0000000..22d106d --- /dev/null +++ b/README.md @@ -0,0 +1,228 @@ + + + +# grantham + + + + +The goal of `{grantham}` is to provide a minimal set of routines to +calculate the Grantham distance \[1\]. + +The Grantham distance attempts to provide a proxy for the evolutionary +distance between two amino acids based on three key chemical properties: +composition, polarity and molecular volume. In turn, evolutionary +distance is used as a proxy for the impact of missense mutations. The +higher the distance, the more deleterious the substitution is expected +to be. + +## Installation + +You can install the development version of `{grantham}` like so: + +``` r +# install.packages("remotes") +remotes::install_github("maialab/grantham") +``` + +## Usage + +Grantham distance between two amino acids: + +``` r +library(grantham) + +grantham_distance(x = 'Ser', y = 'Phe') +#> # A tibble: 1 × 3 +#> x y d +#> +#> 1 Ser Phe 155 +``` + +The function `grantham_distance()` is vectorised with amino acids being +matched element-wise to form pairs for comparison: + +``` r +grantham_distance(x = c('Ser', 'Arg'), y = c('Phe', 'Leu')) +#> # A tibble: 2 × 3 +#> x y d +#> +#> 1 Ser Phe 155 +#> 2 Arg Leu 102 +``` + +The two vectors of amino acids must have compatible sizes in the sense +of [vec_recycle()](https://vctrs.r-lib.org/reference/vec_recycle.html) +for element recycling to be possible, i.e., either the two vectors have +the same length, or one of them is of length one, and it is recycled up +to the length of the other. + +``` r +# `'Ser'` is recycled to match the length of the second vector, i.e. 3. +grantham_distance(x = 'Ser', y = c('Phe', 'Leu', 'Arg')) +#> # A tibble: 3 × 3 +#> x y d +#> +#> 1 Ser Phe 155 +#> 2 Ser Leu 145 +#> 3 Ser Arg 110 +``` + +Use the function `amino_acid_pairs()` to generate all 20 x 20 amino acid +pairs: + +``` r +aa_pairs <- amino_acid_pairs() +aa_pairs +#> # A tibble: 400 × 2 +#> x y +#> +#> 1 Ser Ser +#> 2 Ser Arg +#> 3 Ser Leu +#> 4 Ser Pro +#> 5 Ser Thr +#> 6 Ser Ala +#> 7 Ser Val +#> 8 Ser Gly +#> 9 Ser Ile +#> 10 Ser Phe +#> # … with 390 more rows +``` + +And now calculate all Grantham distances for all pairs `aa_pairs`: + +``` r +grantham_distance(x = aa_pairs$x, y = aa_pairs$y) +#> # A tibble: 400 × 3 +#> x y d +#> +#> 1 Ser Ser 0 +#> 2 Ser Arg 110 +#> 3 Ser Leu 145 +#> 4 Ser Pro 74 +#> 5 Ser Thr 58 +#> 6 Ser Ala 99 +#> 7 Ser Val 124 +#> 8 Ser Gly 56 +#> 9 Ser Ile 142 +#> 10 Ser Phe 155 +#> # … with 390 more rows +``` + +Because distances are symmetric, and pairs formed by the same amino acid +are trivially zero, you might want to exclude these pairs: + +``` r +# `keep_self = FALSE`: excludes pairs such as ("Ser", "Ser") +# `keep_reverses = FALSE`: excludes reversed pairs, e.g. ("Arg", "Ser") will be +# removed because ("Ser", "Arg") already exists. +aa_pairs <- amino_acid_pairs(keep_self = FALSE, keep_reverses = FALSE) + +# These amino acid pairs are the 190 pairs shown in Table 2 of Grantham's +# original publication. +aa_pairs +#> # A tibble: 190 × 2 +#> x y +#> +#> 1 Ser Arg +#> 2 Ser Leu +#> 3 Ser Pro +#> 4 Ser Thr +#> 5 Ser Ala +#> 6 Ser Val +#> 7 Ser Gly +#> 8 Ser Ile +#> 9 Ser Phe +#> 10 Ser Tyr +#> # … with 180 more rows + +# Grantham distance for the 190 unique amino acid pairs +grantham_distance(x = aa_pairs$x, y = aa_pairs$y) +#> # A tibble: 190 × 3 +#> x y d +#> +#> 1 Ser Arg 110 +#> 2 Ser Leu 145 +#> 3 Ser Pro 74 +#> 4 Ser Thr 58 +#> 5 Ser Ala 99 +#> 6 Ser Val 124 +#> 7 Ser Gly 56 +#> 8 Ser Ile 142 +#> 9 Ser Phe 155 +#> 10 Ser Tyr 144 +#> # … with 180 more rows +``` + +The Grantham distance *d**i*, *j* for two amino acids *i* and +*j* is: + +*d**i*, *j* = *ρ*(*α*(*c**i*−*c**j*)2+*β*(*p**i*−*p**j*)2+*γ*(*v**i*−*v**j*)2)1/2 . + +The distance is based on three chemical properties of amino acid side +chains: + +- composition (*c*) +- polarity (*p*) +- molecular volume (*v*) + +We provide a data set with these properties: + +``` r +amino_acids_properties +#> # A tibble: 20 × 4 +#> amino_acid c p v +#> +#> 1 Ser 1.42 9.2 32 +#> 2 Arg 0.65 10.5 124 +#> 3 Leu 0 4.9 111 +#> 4 Pro 0.39 8 32.5 +#> 5 Thr 0.71 8.6 61 +#> 6 Ala 0 8.1 31 +#> 7 Val 0 5.9 84 +#> 8 Gly 0.74 9 3 +#> 9 Ile 0 5.2 111 +#> 10 Phe 0 5.2 132 +#> 11 Tyr 0.2 6.2 136 +#> 12 Cys 2.75 5.5 55 +#> 13 His 0.58 10.4 96 +#> 14 Gln 0.89 10.5 85 +#> 15 Asn 1.33 11.6 56 +#> 16 Lys 0.33 11.3 119 +#> 17 Asp 1.38 13 54 +#> 18 Glu 0.92 12.3 83 +#> 19 Met 0 5.7 105 +#> 20 Trp 0.13 5.4 170 +``` + +If you want to calculate the Grantham distance from these property +values you may use the function `grantham_equation()`. + +## Related software + +Other sources we’ve found in the R ecosystem that also provide code for +calculation of the Grantham distance: + +- A GitHub Gist by Daniel E Cook provides the function + `calculate_grantham()`, see + [Fetch_Grantham.R](https://gist.github.com/danielecook/501f03650bca6a3db31ff3af2d413d2a). +- The `{midasHLA}` package includes the unexported function + `distGrantham()` in + [utils.R](https://github.com/Genentech/midasHLA/blob/ec29296f9bfd7c4fae9e2040592b618e5f2a99a1/R/utils.R). +- The `{HLAdivR}` package exports a data set with the Grantham + distances in the format of a matrix, see + [data.R](https://github.com/rbentham/HLAdivR/blob/master/R/data.R). + +## Code of Conduct + +Please note that the `{grantham}` package is released with a +[Contributor Code of +Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). +By contributing to this project, you agree to abide by its terms. + +## References + +1. Grantham, R. *Amino acid difference formula to help explain protein + evolution*. Science 185, 862–864 (1974). doi: + [10.1126/science.185.4154.862](https://doi.org/10.1126/science.185.4154.862). diff --git a/data-raw/amino_acids_properties.csv b/data-raw/amino_acids_properties.csv new file mode 100644 index 0000000..92bf7df --- /dev/null +++ b/data-raw/amino_acids_properties.csv @@ -0,0 +1,21 @@ +amino_acid,c,p,v +Ser,1.42,9.2,32 +Arg,0.65,10.5,124 +Leu,0,4.9,111 +Pro,0.39,8,32.5 +Thr,0.71,8.6,61 +Ala,0,8.1,31 +Val,0,5.9,84 +Gly,0.74,9,3 +Ile,0,5.2,111 +Phe,0,5.2,132 +Tyr,0.2,6.2,136 +Cys,2.75,5.5,55 +His,0.58,10.4,96 +Gln,0.89,10.5,85 +Asn,1.33,11.6,56 +Lys,0.33,11.3,119 +Asp,1.38,13,54 +Glu,0.92,12.3,83 +Met,0,5.7,105 +Trp,0.13,5.4,170 diff --git a/data-raw/amino_acids_properties.ods b/data-raw/amino_acids_properties.ods new file mode 100644 index 0000000..b6b653f Binary files /dev/null and b/data-raw/amino_acids_properties.ods differ diff --git a/data-raw/data.R b/data-raw/data.R new file mode 100644 index 0000000..ecbd60d --- /dev/null +++ b/data-raw/data.R @@ -0,0 +1,66 @@ +library(readr) +library(here) +library(grantham) + +# Grantham distances' matrix +grantham_distances_matrix <- + readr::read_csv( + file = here::here('data-raw', 'grantham_distance_matrix.csv'), + col_types = 'ciiiiiiiiiiiiiiiiiii', + col_select = -1 + ) %>% + as.matrix() %>% + `rownames<-`(., colnames(.)) + +# Sort the rows and columns by the order present in `amino_acids()`. This +# ordering should already be as in the return value of `amino_acids()`, but just +# in case... +grantham_distances_matrix <- + grantham_distances_matrix[amino_acids(), amino_acids()] + +# The values for the amino acid properties in "amino_acids_properties.csv" were +# directly obtained from Table 1 of Grantham (1974). +amino_acids_properties <- + readr::read_csv( + file = here::here('data-raw', 'amino_acids_properties.csv'), + col_types = 'cdd' + ) %>% # Next line is just ensure that the order comes out the same as in `amino_acids()`. + dplyr::left_join(tibble::tibble(amino_acid = amino_acids()), ., by = 'amino_acid') + +# The 20 amino acids. +n_amino_acids <- length(amino_acids()) + +mean_chemical_distance <- + with(amino_acids_properties, + c( + 'c' = mean(outer(c, c, function(x, y) abs(x - y))[grantham:::sltm_k(n_amino_acids)]), + 'p' = mean(outer(p, p, function(x, y) abs(x - y))[grantham:::sltm_k(n_amino_acids)]), + 'v' = mean(outer(v, v, function(x, y) abs(x - y))[grantham:::sltm_k(n_amino_acids)]) + ) + ) %>% + signif(digits = 4) %>% + round(digits = 3) + +# The mean weighting factors (as they are referred to in the caption of Table 1 +# of R. Grantham (1974) as used as indicated in that caption. If we were to +# calculate them here from the `mean_chemical_distance` one would find that the +# alpha value (1.833) is slightly off by a small percentage 0.11% (calculated +# value is 1.831.) +# As the difference is relatively minor, we stick with the values reported in +# the original paper to avoid confusion. +mean_weighting_factors <- c('alpha' = 1.833, 'beta' = 0.1018, 'gamma' = 0.000399) + +# These variables end up in R/sysdata.rda +usethis::use_data( + amino_acids_properties, + grantham_distances_matrix, + mean_chemical_distance, + mean_weighting_factors, + internal = TRUE, + overwrite = TRUE +) + +# These end up in data/*.rda +usethis::use_data(amino_acids_properties, overwrite = TRUE) +usethis::use_data(grantham_distances_matrix, overwrite = TRUE) + diff --git a/data-raw/grantham_distance_matrix.csv b/data-raw/grantham_distance_matrix.csv new file mode 100644 index 0000000..2cba7cf --- /dev/null +++ b/data-raw/grantham_distance_matrix.csv @@ -0,0 +1,21 @@ +,Ser,Arg,Leu,Pro,Thr,Ala,Val,Gly,Ile,Phe,Tyr,Cys,His,Gln,Asn,Lys,Asp,Glu,Met,Trp +Ser,0,110,145,74,58,99,124,56,142,155,144,112,89,68,46,121,65,80,135,177 +Arg,110,0,102,103,71,112,96,125,97,97,77,180,29,43,86,26,96,54,91,101 +Leu,145,102,0,98,92,96,32,138,5,22,36,198,99,113,153,107,172,138,15,61 +Pro,74,103,98,0,38,27,68,42,95,114,110,169,77,76,91,103,108,93,87,147 +Thr,58,71,92,38,0,58,69,59,89,103,92,149,47,42,65,78,85,65,81,128 +Ala,99,112,96,27,58,0,64,60,94,113,112,195,86,91,111,106,126,107,84,148 +Val,124,96,32,68,69,64,0,109,29,50,55,192,84,96,133,97,152,121,21,88 +Gly,56,125,138,42,59,60,109,0,135,153,147,159,98,87,80,127,94,98,127,184 +Ile,142,97,5,95,89,94,29,135,0,21,33,198,94,109,149,102,168,134,10,61 +Phe,155,97,22,114,103,113,50,153,21,0,22,205,100,116,158,102,177,140,28,40 +Tyr,144,77,36,110,92,112,55,147,33,22,0,194,83,99,143,85,160,122,36,37 +Cys,112,180,198,169,149,195,192,159,198,205,194,0,174,154,139,202,154,170,196,215 +His,89,29,99,77,47,86,84,98,94,100,83,174,0,24,68,32,81,40,87,115 +Gln,68,43,113,76,42,91,96,87,109,116,99,154,24,0,46,53,61,29,101,130 +Asn,46,86,153,91,65,111,133,80,149,158,143,139,68,46,0,94,23,42,142,174 +Lys,121,26,107,103,78,106,97,127,102,102,85,202,32,53,94,0,101,56,95,110 +Asp,65,96,172,108,85,126,152,94,168,177,160,154,81,61,23,101,0,45,160,181 +Glu,80,54,138,93,65,107,121,98,134,140,122,170,40,29,42,56,45,0,126,152 +Met,135,91,15,87,81,84,21,127,10,28,36,196,87,101,142,95,160,126,0,67 +Trp,177,101,61,147,128,148,88,184,61,40,37,215,115,130,174,110,181,152,67,0 diff --git a/data-raw/grantham_distance_matrix.ods b/data-raw/grantham_distance_matrix.ods new file mode 100644 index 0000000..07acbd7 Binary files /dev/null and b/data-raw/grantham_distance_matrix.ods differ diff --git a/data/amino_acids_properties.rda b/data/amino_acids_properties.rda new file mode 100644 index 0000000..324606a Binary files /dev/null and b/data/amino_acids_properties.rda differ diff --git a/data/grantham_distances_matrix.rda b/data/grantham_distances_matrix.rda new file mode 100644 index 0000000..5552c5b Binary files /dev/null and b/data/grantham_distances_matrix.rda differ diff --git a/grantham.Rproj b/grantham.Rproj new file mode 100644 index 0000000..270314b --- /dev/null +++ b/grantham.Rproj @@ -0,0 +1,21 @@ +Version: 1.0 + +RestoreWorkspace: Default +SaveWorkspace: Default +AlwaysSaveHistory: Default + +EnableCodeIndexing: Yes +UseSpacesForTab: Yes +NumSpacesForTab: 2 +Encoding: UTF-8 + +RnwWeave: Sweave +LaTeX: pdfLaTeX + +AutoAppendNewline: Yes +StripTrailingWhitespace: Yes + +BuildType: Package +PackageUseDevtools: Yes +PackageInstallArgs: --no-multiarch --with-keep.source +PackageRoxygenize: rd,collate,namespace diff --git a/man/amino_acid_pairs.Rd b/man/amino_acid_pairs.Rd new file mode 100644 index 0000000..74aa4a2 --- /dev/null +++ b/man/amino_acid_pairs.Rd @@ -0,0 +1,45 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/amino_acid_pairs.R +\name{amino_acid_pairs} +\alias{amino_acid_pairs} +\title{Generate amino acid pairs} +\usage{ +amino_acid_pairs( + x = amino_acids(), + y = amino_acids(), + keep_self = TRUE, + keep_duplicates = TRUE, + keep_reverses = TRUE +) +} +\arguments{ +\item{x}{A character vector of amino acids (three-letter codes).} + +\item{y}{Another character vector of amino acids (three-letter codes).} + +\item{keep_self}{Whether to keep pairs involving the same amino acid.} + +\item{keep_duplicates}{Whether to keep duplicated pairs.} + +\item{keep_reverses}{Whether to keep pairs that are reversed versions of +others. E.g. if \code{keep_reverses} is \code{TRUE} the pairs \code{"Ser"}-\code{"Arg"} and +\code{"Arg"}-\code{"Ser"} will be kept in the returned tibble; however, if +\code{keep_reverses} is \code{FALSE}, only the first pair is preserved in the output.} +} +\value{ +A \link[tibble:tibble-package]{tibble} of amino acid pairs. +} +\description{ +This function generates combinations of amino acids in pairs. By default, it +generates all pair combinations of the 20 standard amino acids. +} +\examples{ +# Generate all pairs of the 20 standard amino acids +amino_acid_pairs() + +# Remove the self-to-self pairs +amino_acid_pairs(keep_self = FALSE) + +# Generate specific combinations of Ser against Ala and Trp. +amino_acid_pairs(x = 'Ser', y = c('Ala', 'Trp')) +} diff --git a/man/amino_acids.Rd b/man/amino_acids.Rd new file mode 100644 index 0000000..d3acd3b --- /dev/null +++ b/man/amino_acids.Rd @@ -0,0 +1,19 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/amino_acids.R +\name{amino_acids} +\alias{amino_acids} +\title{The 20 standard amino acids} +\usage{ +amino_acids() +} +\value{ +Three-letter codes of the standard amino acids. +} +\description{ +The 20 amino acids that are encoded directly by the codons of the universal +genetic code. +} +\examples{ +amino_acids() + +} diff --git a/man/amino_acids_properties.Rd b/man/amino_acids_properties.Rd new file mode 100644 index 0000000..02e0883 --- /dev/null +++ b/man/amino_acids_properties.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/amino_acids_properties.R +\docType{data} +\name{amino_acids_properties} +\alias{amino_acids_properties} +\title{Amino acid side chain property values} +\format{ +An object of class \code{tbl_df} (inherits from \code{tbl}, \code{data.frame}) with 20 rows and 4 columns. +} +\source{ +Table 1, Grantham (1974), \doi{10.1126/science.185.4154.862}. +} +\usage{ +amino_acids_properties +} +\description{ +A dataset containing the amino acid side chain property values +---composition, polarity and molecular volume. These values were obtained +from Table 1, Grantham (1974), \doi{10.1126/science.185.4154.862}. +} +\examples{ +amino_acids_properties + +} +\keyword{datasets} diff --git a/man/grantham-package.Rd b/man/grantham-package.Rd new file mode 100644 index 0000000..df2c77b --- /dev/null +++ b/man/grantham-package.Rd @@ -0,0 +1,26 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/grantham-package.R +\docType{package} +\name{grantham-package} +\alias{grantham} +\alias{grantham-package} +\title{grantham: Grantham distance} +\description{ +A minimal set of routines to calculate the Grantham distance. The Grantham distance attempts to provide a proxy for the evolutionary distance between two amino acids based on three key chemical properties: composition, polarity and molecular volume. In turn, evolutionary distance is used as a proxy for the impact of missense mutations. The higher the distance, the more deleterious the substitution is expected to be. +} +\author{ +\strong{Maintainer}: Ramiro Magno \email{ramiro.magno@gmail.com} (\href{https://orcid.org/0000-0001-5226-3441}{ORCID}) + +Authors: +\itemize{ + \item Isabel Duarte \email{iduarte.scientist@gmail.com} (\href{https://orcid.org/0000-0003-0060-2936}{ORCID}) + \item Ana-Teresa Maia \email{maia.anateresa@gmail.com} (\href{https://orcid.org/0000-0002-0454-9207}{ORCID}) +} + +Other contributors: +\itemize{ + \item CINTESIS [copyright holder, funder] +} + +} +\keyword{internal} diff --git a/man/grantham_distance.Rd b/man/grantham_distance.Rd new file mode 100644 index 0000000..8c295ab --- /dev/null +++ b/man/grantham_distance.Rd @@ -0,0 +1,88 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/grantham_distance.R +\name{grantham_distance} +\alias{grantham_distance} +\title{Grantham distance} +\source{ +\doi{10.1126/science.185.4154.862}. +} +\usage{ +grantham_distance( + x, + y, + method = c("original", "exact"), + alpha = 1.833, + beta = 0.1018, + gamma = 0.000399, + rho = 50.723 +) +} +\arguments{ +\item{x}{A character vector of amino acid three-letter codes.} + +\item{y}{A character vector of amino acid three-letter codes.} + +\item{method}{Either \code{"original"} (default) or \code{"exact"}, see description for +more details.} + +\item{alpha}{The constant \eqn{\alpha} in the equation of Grantham's +paper, in page 863.} + +\item{beta}{The constant \eqn{\beta} in the equation of Grantham's +paper, in page 863.} + +\item{gamma}{The constant \eqn{\gamma} in the equation of Grantham's +paper, in page 863.} + +\item{rho}{Grantham's distances reported in Table 2, Science (1974). +185(4154): 862--4 by R. Grantham, are scaled by a factor (here named +\eqn{\rho}) such that the mean value of all distances are 100. The \code{rho} +parameter allows this factor \eqn{\rho} to be changed. By default +\eqn{\rho=50.723}, the same value used by Grantham. This value is +originally mentioned in the caption of Table 2 of the aforementioned paper.} +} +\value{ +A \link[tibble:tibble-package]{tibble} of Grantham's distances for each +amino acid pair. +} +\description{ +This function calculates the Grantham distance for pairs of amino acids. +Amino acid identities should be provided as three-letter codes in \code{x} and +\code{y}. Amino acids identified in \code{x} and \code{y} are matched element-wise, i.e. the +first element of \code{x} is paired with the first element of \code{y}, and so on. + +The Grantham distance attempts to provide a proxy for the evolutionary +distance between two amino acids based on three key chemical properties: +composition, polarity and molecular volume. In turn, evolutionary distance is +used as a proxy for the impact of missense substitutions. The higher the +distance, the more deleterious the substitution is. + +The distance calculation is provided by two methods. The so-called \emph{original} +method, meaning that the amino acid distances used are the ones provided by +Grantham in his original publication in Table 2. This is the default method. +In addition, you may choose the \emph{exact} method, which uses the chemical +properties provided in Grantham's Table 1 to compute the amino acid +differences anew. The distances calculated with the \emph{exact} method are not +rounded to the nearest integer and will differ by ~1 unit for some amino acid +pairs from the \emph{original} method. + +If you want to calculate Grantham's distance by providing the values of the +amino acid properties explicitly, then use \code{\link[=grantham_equation]{grantham_equation()}} instead. +} +\examples{ +# Grantham's distance between Serine (Ser) and Glutamate (Glu) +grantham_distance('Ser', 'Glu') + +# Grantham's distance between Serine (Ser) and Glutamate (Glu) +# with the "exact" method +grantham_distance('Ser', 'Glu', method = 'exact') + +# `grantham_distance()` is vectorised +# amino acids are paired element-wise between `x` and `y` +grantham_distance(x = c('Pro', 'Gly'), y = c('Glu', 'Arg')) + +# Use `amino_acid_pairs()` to generate pairs (by default generates all pairs) +aa_pairs <- amino_acid_pairs() +grantham_distance(x = aa_pairs$x, y = aa_pairs$y) + +} diff --git a/man/grantham_distance_exact.Rd b/man/grantham_distance_exact.Rd new file mode 100644 index 0000000..46fcc7a --- /dev/null +++ b/man/grantham_distance_exact.Rd @@ -0,0 +1,63 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/grantham_distance.R +\name{grantham_distance_exact} +\alias{grantham_distance_exact} +\title{Grantham's distance (exact)} +\source{ +\doi{10.1126/science.185.4154.862}. +} +\usage{ +grantham_distance_exact( + x, + y, + alpha = 1.833, + beta = 0.1018, + gamma = 0.000399, + rho = 50.723 +) +} +\arguments{ +\item{x}{A character vector of amino acid three-letter codes, e.g. \code{"Ala"} +(Alanine).} + +\item{y}{A character vector of amino acid three-letter codes.} + +\item{alpha}{The constant \eqn{\alpha} in the equation of Grantham's +paper, in page 863.} + +\item{beta}{The constant \eqn{\beta} in the equation of Grantham's +paper, in page 863.} + +\item{gamma}{The constant \eqn{\gamma} in the equation of Grantham's +paper, in page 863.} + +\item{rho}{Grantham's distances reported in Table 2, Science (1974). +185(4154): 862--4 by R. Grantham, are scaled by a factor (here named +\eqn{\rho}) such that the mean value of all distances are 100. The \code{rho} +parameter allows this factor \eqn{\rho} to be changed. By default +\eqn{\rho=50.723}, the same value used by Grantham. This value is +originally mentioned in the caption of Table 2 of the aforementioned paper.} +} +\value{ +A \link[tibble:tibble-package]{tibble} of Grantham's distances for each +amino acid pair. +} +\description{ +This function calculates the Grantham's distance for pairs of amino acids. It +uses the values for the amino acid properties as published in Table 1 of +Science (1974). 185(4154): 862--4 by R. Grantham. +} +\details{ +Contrary to Grantham's distances presented in Table 2 of Grantham's paper, the +distances returned by this funtion are calculated anew starting from the +amino acid properties (composition, polarity and molecular volume). No +rounding to nearest integer is performed. +} +\examples{ +grantham_distance_exact(c('Ser', 'Ser'), c('Pro', 'Trp')) + +} +\seealso{ +\code{\link[=grantham_equation]{grantham_equation()}} +} +\keyword{internal} diff --git a/man/grantham_distance_original.Rd b/man/grantham_distance_original.Rd new file mode 100644 index 0000000..7fe0ee9 --- /dev/null +++ b/man/grantham_distance_original.Rd @@ -0,0 +1,26 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/grantham_distance.R +\name{grantham_distance_original} +\alias{grantham_distance_original} +\title{Grantham's distance (original)} +\source{ +\doi{10.1126/science.185.4154.862}. +} +\usage{ +grantham_distance_original(x, y) +} +\arguments{ +\item{x}{A character vector of amino acid three-letter codes.} + +\item{y}{A character vector of amino acid three-letter codes.} +} +\value{ +A \link[tibble:tibble-package]{tibble} of Grantham's distances for each +amino acid pair. +} +\description{ +This function calculates the Grantham's distance for pairs of amino acids. It +uses the pre-calculated distances for each amino acid pair as published in +Table 2 of Science (1974). 185(4154): 862--4 by R. Grantham. +} +\keyword{internal} diff --git a/man/grantham_distances_matrix.Rd b/man/grantham_distances_matrix.Rd new file mode 100644 index 0000000..a321777 --- /dev/null +++ b/man/grantham_distances_matrix.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/grantham_distances_matrix.R +\docType{data} +\name{grantham_distances_matrix} +\alias{grantham_distances_matrix} +\title{Grantham distance matrix} +\format{ +An object of class \code{matrix} (inherits from \code{array}) with 20 rows and 20 columns. +} +\source{ +Table 2, Grantham (1974), \doi{10.1126/science.185.4154.862}. +} +\usage{ +grantham_distances_matrix +} +\description{ +A dataset containing Grantham distances in the format of a matrix. These +values were obtained from Table 2, Grantham (1974), +\doi{10.1126/science.185.4154.862}. +} +\examples{ +grantham_distances_matrix + +} +\keyword{datasets} diff --git a/man/grantham_equation.Rd b/man/grantham_equation.Rd new file mode 100644 index 0000000..8cf6f8c --- /dev/null +++ b/man/grantham_equation.Rd @@ -0,0 +1,90 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/grantham_distance.R +\name{grantham_equation} +\alias{grantham_equation} +\title{Grantham distance} +\usage{ +grantham_equation( + c_i, + c_j, + p_i, + p_j, + v_i, + v_j, + alpha = 1.833, + beta = 0.1018, + gamma = 0.000399, + rho = 50.723 +) +} +\arguments{ +\item{c_i}{composition value for the \emph{ith} amino acid.} + +\item{c_j}{composition value for the \emph{jth} amino acid.} + +\item{p_i}{polarity value for the \emph{ith} amino acid.} + +\item{p_j}{polarity value for the \emph{jth} amino acid.} + +\item{v_i}{molecular volume value for the \emph{ith} amino acid.} + +\item{v_j}{molecular volume value for the \emph{jth} amino acid.} + +\item{alpha}{The constant \eqn{\alpha} in the equation of Grantham's +paper, in page 863.} + +\item{beta}{The constant \eqn{\beta} in the equation of Grantham's +paper, in page 863.} + +\item{gamma}{The constant \eqn{\gamma} in the equation of Grantham's +paper, in page 863.} + +\item{rho}{Grantham's distances reported in Table 2, Science (1974). +185(4154): 862--4 by R. Grantham, are scaled by a factor (here named +\eqn{\rho}) such that the mean value of all distances are 100. The \code{rho} +parameter allows this factor \eqn{\rho} to be changed. By default +\eqn{\rho=50.723}, the same value used by Grantham. This value is +originally mentioned in the caption of Table 2 of the aforementioned paper.} +} +\value{ +A double vector of Grantham's distances. +} +\description{ +This function calculates Grantham's distance \eqn{d_{i,j}} between two +amino acids (\eqn{i} and \eqn{j}) based on their chemical properties: + +\deqn{d_{i,j} = \rho ((\alpha (c_i-c_j)^2 + \beta (p_i-p_j)^2 + \gamma (v_i-v_j)^2)^\frac{1}{2}} + +This calculation is based on three amino acid side chain properties that were +found to be the three strongest correlators with the relative substitution +frequency (RSF) (references cited in Grantham (1974)), namely: +\itemize{ +\item composition \eqn{c}, meaning the atomic weight ratio of hetero (noncarbon) +elements in end groups or rings to carbons in the side chain. +\item polarity \eqn{p}; +\item molecular volume \eqn{v}. +} + +Each property difference is weighted by dividing by the mean distance found +with it alone in the formula. The constants \eqn{\alpha}, \eqn{\beta} and +\eqn{\gamma} are squares of the inverses of mean distances of each property, +respectively. + +The distances reported by Grantham (1972) are further scaled by a factor +---here coined \eqn{\rho}--- such that the mean of all distances is 100. +Although this factor is not explicitly included in Grantham's distance +formula, it is actually used for calculating the amino acid pair distances +reported in Table 2 of Grantham's paper. So, for all intents and purposes, +this factor should be regarded as part of the formula used to calculate +Grantham distance, and therefore we include it explicitly in the equation +above. + +If you want to calculate Grantham's distance right off from the identity of +the amino acids, instead of using their chemical properties, then use +\code{\link[=grantham_distance]{grantham_distance()}}. +} +\seealso{ +Check \link{amino_acids_properties} for a table of the three property +values that can be used with this formula. This data set is from Table 1, +Science (1974). 185(4154): 862--4 by R. Grantham. +} diff --git a/man/ij2k.Rd b/man/ij2k.Rd new file mode 100644 index 0000000..82a65aa --- /dev/null +++ b/man/ij2k.Rd @@ -0,0 +1,24 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/ij2k.R +\name{ij2k} +\alias{ij2k} +\title{Convert an (i, j) index to a linear index.} +\usage{ +ij2k(i, j, n) +} +\arguments{ +\item{i}{i index, i.e. row position; indexing starts at 1.} + +\item{j}{j index, i.e. column position; indexing starts at 1.} + +\item{n}{size of the square matrix.} +} +\value{ +Linear position. +} +\description{ +Converts an (i, j) index to a linear index. Converts the double index of +a square matrix to the corresponding linear one. This is column-major +as it is default in R. +} +\keyword{internal} diff --git a/man/pipe.Rd b/man/pipe.Rd new file mode 100644 index 0000000..1f8f237 --- /dev/null +++ b/man/pipe.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils-pipe.R +\name{\%>\%} +\alias{\%>\%} +\title{Pipe operator} +\usage{ +lhs \%>\% rhs +} +\arguments{ +\item{lhs}{A value or the magrittr placeholder.} + +\item{rhs}{A function call using the magrittr semantics.} +} +\value{ +The result of calling `rhs(lhs)`. +} +\description{ +See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details. +} +\keyword{internal}