diff --git a/DESCRIPTION b/DESCRIPTION index 6d8f337..75ffc47 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -6,7 +6,7 @@ Description: A tool for exploring correlations. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, - or rearranging and visualising the matrix in terms of the + or rearranging and visualizing the matrix in terms of the strength of the correlations. Authors@R: c(person(given = "Edgar", diff --git a/NEWS.md b/NEWS.md index 96f9cda..5a10f03 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,7 +8,7 @@ - Improves `correlate()` for database backed tables -- Fixes compatability issues with `dplyr` +- Fixes compatibility issues with `dplyr` # corrr 0.3.2 @@ -52,7 +52,7 @@ The `diagonal` argument of `as_matrix` and `as_matrix.cor_df` is now an optional ## Fixes -- When `legend = TRUE` (now the default setting), `rplot` and `network_plot` generate a single, unlabelled legend referring to the size of the correlations. +- When `legend = TRUE` (now the default setting), `rplot` and `network_plot` generate a single, unlabeled legend referring to the size of the correlations. ## Other diff --git a/R/correlate.R b/R/correlate.R index 1050db4..adafce7 100644 --- a/R/correlate.R +++ b/R/correlate.R @@ -10,7 +10,7 @@ #' \item A tibble (see \code{\link[tibble]{tibble}}) #' \item An additional class, "cor_df" #' \item A "rowname" column -#' \item Standardised variances (the matrix diagonal) set to missing values by +#' \item Standardized variances (the matrix diagonal) set to missing values by #' default (\code{NA}) so they can be ignored in calculations. #' } #' diff --git a/R/output.R b/R/output.R index e142b92..204354f 100644 --- a/R/output.R +++ b/R/output.R @@ -69,10 +69,10 @@ fashion.default <- function(x, decimals = 2, leading_zeros = FALSE, na_print = " #' #' @param rdf Correlation data frame (see \code{\link{correlate}}) or object #' that can be coerced to one (see \code{\link{as_cordf}}). -#' @param legend Boolean indicating whether a legend mapping the colours to the correlations should be displayed. +#' @param legend Boolean indicating whether a legend mapping the colors to the correlations should be displayed. #' @param shape \code{\link{geom_point}} aesthetic. #' @param print_cor Boolean indicating whether the correlations should be printed over the shapes. -#' @param colours,colors Vector of colours to use for n-colour gradient. +#' @param colours,colors Vector of colors to use for n-color gradient. #' @return Plots a correlation data frame #' @export #' @examples @@ -104,13 +104,13 @@ rplot.default <- function(rdf, ...) { #' #' Output a network plot of a correlation data frame in which variables that are #' more highly correlated appear closer together and are joined by stronger -#' paths. Paths are also coloured by their sign (blue for positive and red for +#' paths. Paths are also colored by their sign (blue for positive and red for #' negative). The proximity of the points are determined using multidimensional #' clustering. #' #' @param min_cor Number from 0 to 1 indicating the minimum value of #' correlations (in absolute terms) to plot. -#' @param colours,colors Vector of colours to use for n-colour gradient. +#' @param colours,colors Vector of colors to use for n-color gradient. #' @param repel Should variable labels repel each other? If TRUE, text is added #' via \code{\link[ggrepel]{geom_text_repel}} instead of \code{\link[ggplot2]{geom_text}} #' @param curved Should the paths be curved? If TRUE, paths are added via diff --git a/R/reshape.R b/R/reshape.R index f84a58a..cbbfb70 100644 --- a/R/reshape.R +++ b/R/reshape.R @@ -60,7 +60,7 @@ focus_ <- function(x, ..., .dots, mirror) { #' Conditionally focus correlation data frame #' -#' Apply a predicate function to each colum of correlations. Columns that +#' Apply a predicate function to each column of correlations. Columns that #' evaluate to TRUE will be included in a call to \code{\link{focus}}. #' #' @param x Correlation data frame or object to be coerced to one via @@ -104,7 +104,7 @@ focus_if.default <- function(x, .predicate, ..., mirror = FALSE) { #' matrix diagonal) should be dropped? Will automatically be set to TRUE if #' mirror is FALSE. #' @param remove.dups Removes duplicate entries, without removing all NAs -#' @return tbl with three colums (x and y variables, and their correlation) +#' @return tbl with three columns (x and y variables, and their correlation) #' @export #' @examples #' x <- correlate(mtcars) diff --git a/R/retract.R b/R/retract.R index 9f52e73..854deda 100644 --- a/R/retract.R +++ b/R/retract.R @@ -1,4 +1,4 @@ -#' Creates a data frame from a streched correlation table +#' Creates a data frame from a stretched correlation table #' #' \code{retract} does the opposite of what \code{stretch} does #' diff --git a/README.Rmd b/README.Rmd index 1b6c5f8..59bf458 100644 --- a/README.Rmd +++ b/README.Rmd @@ -43,7 +43,7 @@ Using `corrr` typically starts with `correlate()`, which acts like the base corr - A `tbl` with an additional class, `cor_df` - An extra "rowname" column -- Standardised variances (the matrix diagonal) set to missing values (`NA`) so they can be ignored. +- Standardized variances (the matrix diagonal) set to missing values (`NA`) so they can be ignored. ### API @@ -59,7 +59,7 @@ Reshape structure (`tbl` or `cor_df` out): - `focus()` on select columns and rows. - `stretch()` into a long format. -Output/visualisations (console/plot out): +Output/visualizations (console/plot out): - `fashion()` the correlations for pretty printing. - `rplot()` the correlations with shapes in place of the values. diff --git a/README.md b/README.md index ee990d3..f198056 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ following structure: - A `tbl` with an additional class, `cor_df` - An extra “rowname” column - - Standardised variances (the matrix diagonal) set to missing values + - Standardized variances (the matrix diagonal) set to missing values (`NA`) so they can be ignored. ### API @@ -66,7 +66,7 @@ Reshape structure (`tbl` or `cor_df` out): - `focus()` on select columns and rows. - `stretch()` into a long format. -Output/visualisations (console/plot out): +Output/visualizations (console/plot out): - `fashion()` the correlations for pretty printing. - `rplot()` the correlations with shapes in place of the values. diff --git a/cran-comments.md b/cran-comments.md index a5febe9..3fe7571 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -4,7 +4,7 @@ * Adds `dice()` function, wraps `focus(x,..., mirror = TRUE)` * Adds `retract()` function, opposite of `stretch()` * Improves `correlate()` for database backed tables -* Fixes compatability issues with `dplyr` +* Fixes compatibility issues with `dplyr` ## Test environments * Local windows 10 install, R 3.6.0 diff --git a/man/correlate.Rd b/man/correlate.Rd index aa02b2c..97749bb 100644 --- a/man/correlate.Rd +++ b/man/correlate.Rd @@ -45,7 +45,7 @@ use of pairwise deletion by default. \item A tibble (see \code{\link[tibble]{tibble}}) \item An additional class, "cor_df" \item A "rowname" column - \item Standardised variances (the matrix diagonal) set to missing values by + \item Standardized variances (the matrix diagonal) set to missing values by default (\code{NA}) so they can be ignored in calculations. } } diff --git a/man/corrr-package.Rd b/man/corrr-package.Rd index d9a7584..118a315 100644 --- a/man/corrr-package.Rd +++ b/man/corrr-package.Rd @@ -12,7 +12,7 @@ A tool for exploring correlations. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, - or rearranging and visualising the matrix in terms of the + or rearranging and visualizing the matrix in terms of the strength of the correlations. } \seealso{ diff --git a/man/focus_if.Rd b/man/focus_if.Rd index 73ac9e0..4149b9b 100644 --- a/man/focus_if.Rd +++ b/man/focus_if.Rd @@ -23,7 +23,7 @@ not.} A tibble or, if mirror = TRUE, a correlation data frame. } \description{ -Apply a predicate function to each colum of correlations. Columns that +Apply a predicate function to each column of correlations. Columns that evaluate to TRUE will be included in a call to \code{\link{focus}}. } \examples{ diff --git a/man/network_plot.Rd b/man/network_plot.Rd index 6c4e071..d5132ad 100644 --- a/man/network_plot.Rd +++ b/man/network_plot.Rd @@ -15,9 +15,9 @@ that can be coerced to one (see \code{\link{as_cordf}}).} \item{min_cor}{Number from 0 to 1 indicating the minimum value of correlations (in absolute terms) to plot.} -\item{legend}{Boolean indicating whether a legend mapping the colours to the correlations should be displayed.} +\item{legend}{Boolean indicating whether a legend mapping the colors to the correlations should be displayed.} -\item{colours, colors}{Vector of colours to use for n-colour gradient.} +\item{colours, colors}{Vector of colors to use for n-color gradient.} \item{repel}{Should variable labels repel each other? If TRUE, text is added via \code{\link[ggrepel]{geom_text_repel}} instead of \code{\link[ggplot2]{geom_text}}} @@ -29,7 +29,7 @@ via \code{\link[ggrepel]{geom_text_repel}} instead of \code{\link[ggplot2]{geom_ \description{ Output a network plot of a correlation data frame in which variables that are more highly correlated appear closer together and are joined by stronger -paths. Paths are also coloured by their sign (blue for positive and red for +paths. Paths are also colored by their sign (blue for positive and red for negative). The proximity of the points are determined using multidimensional clustering. } diff --git a/man/retract.Rd b/man/retract.Rd index 1c268f7..8fc8c34 100644 --- a/man/retract.Rd +++ b/man/retract.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/retract.R \name{retract} \alias{retract} -\title{Creates a data frame from a streched correlation table} +\title{Creates a data frame from a stretched correlation table} \usage{ retract(.data, x, y, val) } diff --git a/man/rplot.Rd b/man/rplot.Rd index 9a5b97f..359ae1e 100644 --- a/man/rplot.Rd +++ b/man/rplot.Rd @@ -11,11 +11,11 @@ rplot(rdf, legend = TRUE, shape = 16, colours = c("indianred2", \item{rdf}{Correlation data frame (see \code{\link{correlate}}) or object that can be coerced to one (see \code{\link{as_cordf}}).} -\item{legend}{Boolean indicating whether a legend mapping the colours to the correlations should be displayed.} +\item{legend}{Boolean indicating whether a legend mapping the colors to the correlations should be displayed.} \item{shape}{\code{\link{geom_point}} aesthetic.} -\item{colours, colors}{Vector of colours to use for n-colour gradient.} +\item{colours, colors}{Vector of colors to use for n-color gradient.} \item{print_cor}{Boolean indicating whether the correlations should be printed over the shapes.} } diff --git a/man/stretch.Rd b/man/stretch.Rd index b67126d..a6aaff8 100644 --- a/man/stretch.Rd +++ b/man/stretch.Rd @@ -16,7 +16,7 @@ mirror is FALSE.} \item{remove.dups}{Removes duplicate entries, without removing all NAs} } \value{ -tbl with three colums (x and y variables, and their correlation) +tbl with three columns (x and y variables, and their correlation) } \description{ \code{stretch} is a specified implementation of tidyr::gather() to be applied diff --git a/tools/readme/combination-1.png b/tools/readme/combination-1.png index 6c63b64..78dd4b8 100644 Binary files a/tools/readme/combination-1.png and b/tools/readme/combination-1.png differ diff --git a/tools/readme/combination-2.png b/tools/readme/combination-2.png index e458ce8..1391f46 100644 Binary files a/tools/readme/combination-2.png and b/tools/readme/combination-2.png differ diff --git a/vignettes/using-corrr.Rmd b/vignettes/using-corrr.Rmd index 24a9dd6..174c718 100644 --- a/vignettes/using-corrr.Rmd +++ b/vignettes/using-corrr.Rmd @@ -15,7 +15,7 @@ library(corrr) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` -corrr is a package for exploring **corr**elations in **R**. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualising the matrix in terms of the strength of the correlations. +corrr is a package for exploring **corr**elations in **R**. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualizing the matrix in terms of the strength of the correlations. ## Using corrr @@ -23,7 +23,7 @@ Using `corrr` starts with `correlate()`, which acts like the base correlation fu - A `tbl` with an additional class, `cor_df` - An extra "rowname" column -- Standardised variances (the matrix diagonal) set to missing values (`NA`) so they can be ignored. +- Standardized variances (the matrix diagonal) set to missing values (`NA`) so they can be ignored. To work with further, let's create a correlation data frame using `correlate()` from the `mtcars` data that comes with R: @@ -35,7 +35,7 @@ d ## Why a correlation data frame? -At first, a correlation data frame might seem like an unneccessary complexity compared to the traditional matrix. However, the purpose of corrr is to help use explore these correlations, not to do mathematical or statistical operations. Thus, by having the correlations in a data frame, we can make use of packages that help us work with data frames like `dplyr`, `tidyr`, `ggplot2`, and focus on using data pipelines. Lets look at some examples: +At first, a correlation data frame might seem like an unnecessary complexity compared to the traditional matrix. However, the purpose of corrr is to help use explore these correlations, not to do mathematical or statistical operations. Thus, by having the correlations in a data frame, we can make use of packages that help us work with data frames like `dplyr`, `tidyr`, `ggplot2`, and focus on using data pipelines. Lets look at some examples: ```{r, message=F, warning=F} library(dplyr) @@ -53,7 +53,7 @@ d %>% select(rowname, mpg, cyl, disp) ``` -Furthermore, by having the diagonal set to missing, we don't need to put in extra effort to ignore them when summarising the correlations. For example: +Furthermore, by having the diagonal set to missing, we don't need to put in extra effort to ignore them when summarizing the correlations. For example: ```{r, warning = FALSE, message = FALSE} # Compute mean of each column @@ -77,7 +77,7 @@ Reshape structure (`tbl` or `cor_df` out): - `focus()` on select columns and rows. - `stretch()` into a long format. -Output/visualisations (console/plot out): +Output/visualizations (console/plot out): - `fashion()` the correlations for pretty printing. - `rplot()` a shape for each correlation. @@ -100,7 +100,7 @@ d %>% fashion() # Print in nice format ``` -Alternatively, we can visualise these correlations (let's clear the lower triangle for a change): +Alternatively, we can visualize these correlations (let's clear the lower triangle for a change): ```{r, warning = FALSE} d %>% diff --git a/vignettes/using-corrr.html b/vignettes/using-corrr.html index 03f6c9e..2f473f8 100644 --- a/vignettes/using-corrr.html +++ b/vignettes/using-corrr.html @@ -12,7 +12,7 @@ - + Using corrr @@ -37,7 +37,7 @@ pre.numberSource a.sourceLine { position: relative; left: -4em; } pre.numberSource a.sourceLine::before - { content: attr(title); + { content: attr(data-line-number); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; pointer-events: all; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; @@ -303,94 +303,94 @@

Using corrr

Simon Jackson

-

2019-04-19

+

2019-07-12

-

corrr is a package for exploring correlations in R. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualising the matrix in terms of the strength of the correlations.

+

corrr is a package for exploring correlations in R. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualizing the matrix in terms of the strength of the correlations.

Using corrr

Using corrr starts with correlate(), which acts like the base correlation function cor(). It differs by defaulting to pairwise deletion, and returning a correlation data frame (cor_df) of the following structure:

To work with further, let’s create a correlation data frame using correlate() from the mtcars data that comes with R:

-
library(corrr)
-d <- correlate(mtcars, quiet = TRUE)
-d
-#> # A tibble: 11 x 12
-#>    rowname     mpg     cyl    disp      hp     drat      wt     qsec
-#>    <chr>     <dbl>   <dbl>   <dbl>   <dbl>    <dbl>   <dbl>    <dbl>
-#>  1 mpg      NA      -0.852  -0.848  -0.776   0.681   -0.868   0.419 
-#>  2 cyl      -0.852  NA       0.902   0.832  -0.700    0.782  -0.591 
-#>  3 disp     -0.848   0.902  NA       0.791  -0.710    0.888  -0.434 
-#>  4 hp       -0.776   0.832   0.791  NA      -0.449    0.659  -0.708 
-#>  5 drat      0.681  -0.700  -0.710  -0.449  NA       -0.712   0.0912
-#>  6 wt       -0.868   0.782   0.888   0.659  -0.712   NA      -0.175 
-#>  7 qsec      0.419  -0.591  -0.434  -0.708   0.0912  -0.175  NA     
-#>  8 vs        0.664  -0.811  -0.710  -0.723   0.440   -0.555   0.745 
-#>  9 am        0.600  -0.523  -0.591  -0.243   0.713   -0.692  -0.230 
-#> 10 gear      0.480  -0.493  -0.556  -0.126   0.700   -0.583  -0.213 
-#> 11 carb     -0.551   0.527   0.395   0.750  -0.0908   0.428  -0.656 
-#> # ... with 4 more variables: vs <dbl>, am <dbl>, gear <dbl>, carb <dbl>
+
library(corrr)
+d <- correlate(mtcars, quiet = TRUE)
+d
+#> # A tibble: 11 x 12
+#>    rowname    mpg    cyl   disp     hp    drat     wt    qsec     vs
+#>    <chr>    <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>   <dbl>  <dbl>
+#>  1 mpg     NA     -0.852 -0.848 -0.776  0.681  -0.868  0.419   0.664
+#>  2 cyl     -0.852 NA      0.902  0.832 -0.700   0.782 -0.591  -0.811
+#>  3 disp    -0.848  0.902 NA      0.791 -0.710   0.888 -0.434  -0.710
+#>  4 hp      -0.776  0.832  0.791 NA     -0.449   0.659 -0.708  -0.723
+#>  5 drat     0.681 -0.700 -0.710 -0.449 NA      -0.712  0.0912  0.440
+#>  6 wt      -0.868  0.782  0.888  0.659 -0.712  NA     -0.175  -0.555
+#>  7 qsec     0.419 -0.591 -0.434 -0.708  0.0912 -0.175 NA       0.745
+#>  8 vs       0.664 -0.811 -0.710 -0.723  0.440  -0.555  0.745  NA    
+#>  9 am       0.600 -0.523 -0.591 -0.243  0.713  -0.692 -0.230   0.168
+#> 10 gear     0.480 -0.493 -0.556 -0.126  0.700  -0.583 -0.213   0.206
+#> 11 carb    -0.551  0.527  0.395  0.750 -0.0908  0.428 -0.656  -0.570
+#> # … with 3 more variables: am <dbl>, gear <dbl>, carb <dbl>

Why a correlation data frame?

-

At first, a correlation data frame might seem like an unneccessary complexity compared to the traditional matrix. However, the purpose of corrr is to help use explore these correlations, not to do mathematical or statistical operations. Thus, by having the correlations in a data frame, we can make use of packages that help us work with data frames like dplyr, tidyr, ggplot2, and focus on using data pipelines. Lets look at some examples:

-
library(dplyr)
-
-# Filter rows to occasions in which cyl has a correlation of .7 or more with
-# another variable.
-d %>% filter(cyl > .7)
-#> # A tibble: 3 x 12
-#>   rowname    mpg   cyl   disp     hp   drat     wt   qsec     vs     am
-#>   <chr>    <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
-#> 1 disp    -0.848 0.902 NA      0.791 -0.710  0.888 -0.434 -0.710 -0.591
-#> 2 hp      -0.776 0.832  0.791 NA     -0.449  0.659 -0.708 -0.723 -0.243
-#> 3 wt      -0.868 0.782  0.888  0.659 -0.712 NA     -0.175 -0.555 -0.692
-#> # ... with 2 more variables: gear <dbl>, carb <dbl>
-
-# Select the mpg, cyl and disp columns (and rowname)
-d %>% select(rowname, mpg, cyl, disp)
-#> # A tibble: 11 x 4
-#>    rowname     mpg     cyl    disp
-#>    <chr>     <dbl>   <dbl>   <dbl>
-#>  1 mpg      NA      -0.852  -0.848
-#>  2 cyl      -0.852  NA       0.902
-#>  3 disp     -0.848   0.902  NA    
-#>  4 hp       -0.776   0.832   0.791
-#>  5 drat      0.681  -0.700  -0.710
-#>  6 wt       -0.868   0.782   0.888
-#>  7 qsec      0.419  -0.591  -0.434
-#>  8 vs        0.664  -0.811  -0.710
-#>  9 am        0.600  -0.523  -0.591
-#> 10 gear      0.480  -0.493  -0.556
-#> 11 carb     -0.551   0.527   0.395
-
-# Combine above in a single pipeline
-d %>%
-  filter(cyl > .7) %>% 
-  select(rowname, mpg, cyl, disp)
-#> # A tibble: 3 x 4
-#>   rowname    mpg   cyl   disp
-#>   <chr>    <dbl> <dbl>  <dbl>
-#> 1 disp    -0.848 0.902 NA    
-#> 2 hp      -0.776 0.832  0.791
-#> 3 wt      -0.868 0.782  0.888
-

Furthermore, by having the diagonal set to missing, we don’t need to put in extra effort to ignore them when summarising the correlations. For example:

-
# Compute mean of each column
-library(purrr)
-d %>% 
-  select(-rowname) %>% 
-  map_dbl(~ mean(., na.rm = TRUE))
-#>           mpg           cyl          disp            hp          drat 
-#> -0.1050454113 -0.0925483176 -0.0872737071  0.0006800268 -0.0037165212 
-#>            wt          qsec            vs            am          gear 
-#> -0.0828684293 -0.1752247305 -0.1145625942  0.0053087327  0.0484120552 
-#>          carb 
-#>  0.0563419513
+

At first, a correlation data frame might seem like an unnecessary complexity compared to the traditional matrix. However, the purpose of corrr is to help use explore these correlations, not to do mathematical or statistical operations. Thus, by having the correlations in a data frame, we can make use of packages that help us work with data frames like dplyr, tidyr, ggplot2, and focus on using data pipelines. Lets look at some examples:

+
library(dplyr)
+
+# Filter rows to occasions in which cyl has a correlation of .7 or more with
+# another variable.
+d %>% filter(cyl > .7)
+#> # A tibble: 3 x 12
+#>   rowname    mpg   cyl   disp     hp   drat     wt   qsec     vs     am
+#>   <chr>    <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
+#> 1 disp    -0.848 0.902 NA      0.791 -0.710  0.888 -0.434 -0.710 -0.591
+#> 2 hp      -0.776 0.832  0.791 NA     -0.449  0.659 -0.708 -0.723 -0.243
+#> 3 wt      -0.868 0.782  0.888  0.659 -0.712 NA     -0.175 -0.555 -0.692
+#> # … with 2 more variables: gear <dbl>, carb <dbl>
+
+# Select the mpg, cyl and disp columns (and rowname)
+d %>% select(rowname, mpg, cyl, disp)
+#> # A tibble: 11 x 4
+#>    rowname    mpg    cyl   disp
+#>    <chr>    <dbl>  <dbl>  <dbl>
+#>  1 mpg     NA     -0.852 -0.848
+#>  2 cyl     -0.852 NA      0.902
+#>  3 disp    -0.848  0.902 NA    
+#>  4 hp      -0.776  0.832  0.791
+#>  5 drat     0.681 -0.700 -0.710
+#>  6 wt      -0.868  0.782  0.888
+#>  7 qsec     0.419 -0.591 -0.434
+#>  8 vs       0.664 -0.811 -0.710
+#>  9 am       0.600 -0.523 -0.591
+#> 10 gear     0.480 -0.493 -0.556
+#> 11 carb    -0.551  0.527  0.395
+
+# Combine above in a single pipeline
+d %>%
+  filter(cyl > .7) %>% 
+  select(rowname, mpg, cyl, disp)
+#> # A tibble: 3 x 4
+#>   rowname    mpg   cyl   disp
+#>   <chr>    <dbl> <dbl>  <dbl>
+#> 1 disp    -0.848 0.902 NA    
+#> 2 hp      -0.776 0.832  0.791
+#> 3 wt      -0.868 0.782  0.888
+

Furthermore, by having the diagonal set to missing, we don’t need to put in extra effort to ignore them when summarizing the correlations. For example:

+
# Compute mean of each column
+library(purrr)
+d %>% 
+  select(-rowname) %>% 
+  map_dbl(~ mean(., na.rm = TRUE))
+#>           mpg           cyl          disp            hp          drat 
+#> -0.1050454113 -0.0925483176 -0.0872737071  0.0006800268 -0.0037165212 
+#>            wt          qsec            vs            am          gear 
+#> -0.0828684293 -0.1752247305 -0.1145625942  0.0053087327  0.0484120552 
+#>          carb 
+#>  0.0563419513

API

As the above section suggests, the corrr API is designed with data pipelines in mind (e.g., to use %>% from the magrittr package). After correlate(), the primary corrr functions take a cor_df as their first argument, and return a cor_df or tbl (or output like a plot). These functions serve one of three purposes:

@@ -404,7 +404,7 @@

API

  • focus() on select columns and rows.
  • stretch() into a long format.
  • -

    Output/visualisations (console/plot out):

    +

    Output/visualizations (console/plot out):

    By combing these functions in data pipelines, it’s possible to easily explore your correlations.

    For example, lets focus on the correlations of mpg and cyl with all the others:

    - +

    Or maybe we want to focus in on a few variables (mirrored in rows too) and print the correlations without an upper triangle and fashioned to look nice:

    - -

    Alternatively, we can visualise these correlations (let’s clear the lower triangle for a change):

    - -

    + +

    Alternatively, we can visualize these correlations (let’s clear the lower triangle for a change):

    + +

    Perhaps we’d like to rearrange the correlations so that the plot becomes easier to interpret. In this case, we can add rearrange() into our pipeline before shaving one of the triangles (we’ll take correlation sign into account with absolute = FALSE).

    - -

    + +

    diff --git a/vignettes/using-corrr_files/figure-html/unnamed-chunk-6-1.png b/vignettes/using-corrr_files/figure-html/unnamed-chunk-6-1.png deleted file mode 100644 index d485898..0000000 Binary files a/vignettes/using-corrr_files/figure-html/unnamed-chunk-6-1.png and /dev/null differ diff --git a/vignettes/using-corrr_files/figure-html/unnamed-chunk-7-1.png b/vignettes/using-corrr_files/figure-html/unnamed-chunk-7-1.png deleted file mode 100644 index 5b6d9f8..0000000 Binary files a/vignettes/using-corrr_files/figure-html/unnamed-chunk-7-1.png and /dev/null differ