peeler

Peeler implements the bvstep algorithm from Clarke and Warwick 1998 and uses it to ‘peel’ a dataset to find structural redundancy. It also provides a way to randomly start bvstep many times to assess consistency and avoid local optima.

Installation

You can install the development version of peeler from GitHub with:

# install.packages("devtools")
devtools::install_github("galenholt/peeler")

Example

library(peeler)
library(vegan)
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.6-4
data(varespec)

You can use bvstep alone for a single realisation.

bvout <- bvstep(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rand_start = TRUE, nrand = 5
)

bvout
#> # A tibble: 10 × 5
#>     step FB    num_vars  corr species                                           
#>    <dbl> <chr>    <int> <dbl> <chr>                                             
#>  1     1 <NA>         5 0.532 Cladarbu, Cladcerv, Cladunci, Pleuschr, Rhodtome  
#>  2     2 B            4 0.532 Cladarbu, Cladunci, Pleuschr, Rhodtome            
#>  3     3 F            5 0.677 Cladarbu, Cladstel, Cladunci, Pleuschr, Rhodtome  
#>  4     4 F            6 0.771 Cladarbu, Cladrang, Cladstel, Cladunci, Pleuschr,…
#>  5     5 F            7 0.838 Cladarbu, Cladrang, Cladstel, Cladunci, Pleuschr,…
#>  6     6 F            8 0.868 Cladarbu, Cladrang, Cladstel, Cladunci, Empenigr,…
#>  7     7 F            9 0.896 Cladarbu, Cladrang, Cladstel, Cladunci, Dicrfusc,…
#>  8     8 F           10 0.916 Cladarbu, Cladrang, Cladstel, Cladunci, Dicrfusc,…
#>  9     9 F           11 0.935 Cladarbu, Cladrang, Cladstel, Cladunci, Dicrfusc,…
#> 10    10 F           12 0.951 Callvulg, Cladarbu, Cladrang, Cladstel, Cladunci,…

While it is always possible to get a correlation of 1 when ref_mat and comp_mat are the same, it is sometimes the case that local optima will cause the algorithm to cut off before rho_threshold is reached. If you want to always get at least one result, use a negative min_delta_rho (ideally min_delta_rho = -Inf). We force it to happen here by setting rho to 1.

bvout_force <- bvstep(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rand_start = TRUE, nrand = 5,
  rho_threshold = 1,
  min_delta_rho = -Inf
)

bvout_force
#> # A tibble: 45 × 5
#>     step FB    num_vars  corr species                                           
#>    <dbl> <chr>    <int> <dbl> <chr>                                             
#>  1     1 <NA>         5 0.415 Cladgrac, Cladphyl, Cladrang, Dicrfusc, Polycomm  
#>  2     2 F            6 0.611 Cladgrac, Cladphyl, Cladrang, Dicrfusc, Pleuschr,…
#>  3     3 B            5 0.612 Cladgrac, Cladphyl, Cladrang, Dicrfusc, Pleuschr  
#>  4     4 F            6 0.746 Cladgrac, Cladphyl, Cladrang, Cladstel, Dicrfusc,…
#>  5     5 B            5 0.746 Cladgrac, Cladrang, Cladstel, Dicrfusc, Pleuschr  
#>  6     6 F            6 0.788 Cladgrac, Cladrang, Cladstel, Dicrfusc, Pleuschr,…
#>  7     7 F            7 0.846 Cladarbu, Cladgrac, Cladrang, Cladstel, Dicrfusc,…
#>  8     8 B            6 0.847 Cladarbu, Cladrang, Cladstel, Dicrfusc, Pleuschr,…
#>  9     9 F            7 0.880 Cladarbu, Cladrang, Cladstel, Dicrfusc, Empenigr,…
#> 10    10 F            8 0.900 Cladarbu, Cladrang, Cladstel, Dicrfusc, Empenigr,…
#> # ℹ 35 more rows

If the two matrices are not identical (e.g. species and environment), this will keep adding columns until either rho_threshold is met or they are all included. Note that in this case, it is not guaranteed that this is the highest correlation possible when the two matrices differ.

Random starts to explore the space

The bv_multi function runs bvstep for a number of random starts to avoid local optima, here with 5 species to start, iterated 10 times. We can set the num_best_results to set how many results to return. Here, ‘best’ above rho_threshold is determined first by minimum species and then correlation, while below rho_threshold it is determined first by correlation and then number of species. This is in keeping with the idea of finding the fewest species to meet the threshold. To return all steps, simply set num_best_results to the same value as num_restarts.

bv_m <- bv_multi(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rho_threshold = 0.95,
  return_type = "final",
  rand_start = TRUE, nrand = 5, num_restarts = 10
)

bv_m
#> # A tibble: 5 × 7
#>   random_start  step FB    num_vars  corr species                  num_tied_with
#>   <chr>        <dbl> <chr>    <int> <dbl> <chr>                            <int>
#> 1 10              16 B           12 0.955 Callvulg, Cladarbu, Cla…             6
#> 2 3               16 F           12 0.955 Callvulg, Cladarbu, Cla…             6
#> 3 6               18 B           12 0.955 Callvulg, Cladarbu, Cla…             6
#> 4 5               18 F           12 0.955 Callvulg, Cladarbu, Cla…             6
#> 5 9               12 F           12 0.955 Callvulg, Cladarbu, Cla…             6

The default return_type = 'final' gives the best outcome of each random start, for the best num_best_results. If we want the full steps of each of the num_random_starts, set return_type = 'steps'. This can also be returned as a list, if returndf = FALSE.

bv_steps <- bv_multi(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rho_threshold = 0.95,
  return_type = "steps",
  rand_start = TRUE, nrand = 5, num_restarts = 10
)
bv_steps
#> # A tibble: 86 × 6
#>    random_start  step FB    num_vars  corr species                              
#>    <chr>        <dbl> <chr>    <int> <dbl> <chr>                                
#>  1 4                1 <NA>         5 0.342 Cladamau, Cladarbu, Claddefo, Empeni…
#>  2 4                2 F            6 0.583 Cladamau, Cladarbu, Claddefo, Cladst…
#>  3 4                3 F            7 0.712 Cladamau, Cladarbu, Claddefo, Cladra…
#>  4 4                4 B            6 0.712 Cladamau, Cladarbu, Cladrang, Cladst…
#>  5 4                5 B            5 0.712 Cladarbu, Cladrang, Cladstel, Empeni…
#>  6 4                6 F            6 0.809 Cladarbu, Cladrang, Cladstel, Empeni…
#>  7 4                7 F            7 0.854 Cladarbu, Cladrang, Cladstel, Empeni…
#>  8 4                8 B            6 0.855 Cladarbu, Cladrang, Cladstel, Empeni…
#>  9 4                9 F            7 0.880 Cladarbu, Cladrang, Cladstel, Dicrfu…
#> 10 4               10 F            8 0.900 Cladarbu, Cladrang, Cladstel, Dicrfu…
#> # ℹ 76 more rows

With return_type = 'unique', we return the best num_best_results from all steps in all random starts. The first line of this should match the first line of return_type = 'final', since that is the best result overall. After that, they may differ as the penultimate set from a particular random start might be better than the final of some others.

bv_unique <- bv_multi(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rho_threshold = 0.95,
  return_type = "unique",
  rand_start = TRUE, nrand = 5, num_restarts = 10
)
bv_unique
#> # A tibble: 5 × 7
#>   random_start  step FB    num_vars  corr species                  num_tied_with
#>   <chr>        <dbl> <chr>    <int> <dbl> <chr>                            <int>
#> 1 3               16 F           12 0.955 Callvulg, Cladarbu, Cla…             6
#> 2 7               16 F           12 0.955 Callvulg, Cladarbu, Cla…             6
#> 3 2               14 F           12 0.955 Callvulg, Cladarbu, Cla…             6
#> 4 5               18 F           12 0.955 Callvulg, Cladarbu, Cla…             6
#> 5 6               18 F           12 0.955 Callvulg, Cladarbu, Cla…             6

Peels

The peel function runs bv_multi iteratively, removing the best set each time.

peels <- peel(
  ref_mat = varespec,
  comp_mat = varespec,
  nrand = 6,
  num_restarts = 10,
  corr_method = "spearman"
)
peels
#> # A tibble: 11 × 7
#>     peel  step FB    num_vars     corr species                     num_tied_with
#>    <dbl> <dbl> <chr>    <int>    <dbl> <chr>                               <int>
#>  1     1    12 F            5  0.962   Cladarbu, Cladrang, Cladst…             5
#>  2     2    11 F           12  0.687   Cladbotr, Cladchlo, Cladde…             1
#>  3     3    12 F            7  0.399   Betupube, Cetrisla, Cladam…             1
#>  4     4    16 F            7  0.320   Barbhatc, Cladcris, Descfl…             3
#>  5     5     5 F            4  0.238   Callvulg, Cladphyl, Nephar…             2
#>  6     6     4 B            3  0.169   Cetreric, Cladcocc, Cladco…             2
#>  7     7     5 B            2  0.121   Empenigr, Vacculig                     10
#>  8     8     4 B            1  0.0200  Pinusylv                               10
#>  9     9     3 B            1 -0.00753 Diphcomp                               10
#> 10    10     2 B            1 -0.00943 Cladfimb                               10
#> 11    11     1 <NA>         1 -0.0964  Polyjuni                               10

There are a number of user-defineable options in each of those functions, see their documentation.

THere are also two potentially useful helper functions, extract_final, which gets the last step in bvstep output, and extract_names. The bvstep output has names in a single string with comma-separated species names, and we often want them as a character vector. The extract_names function parses this.

best_bv <- extract_names(bvout, step = "last")
best_bv
#>  [1] "Callvulg" "Cladarbu" "Cladrang" "Cladstel" "Cladunci" "Dicrfusc"
#>  [7] "Dicrsp"   "Empenigr" "Pleuschr" "Rhodtome" "Vaccmyrt" "Vaccviti"

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
R		R
man		man
renv		renv
tests		tests
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
CITATION.cff		CITATION.cff
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
codecov.yml		codecov.yml
peeler.Rproj		peeler.Rproj
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

peeler

Installation

Example

Random starts to explore the space

Peels

About

Licenses found

Releases

Packages

Languages

License

Licenses found

galenholt/peeler

Folders and files

Latest commit

History

Repository files navigation

peeler

Installation

Example

Random starts to explore the space

Peels

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages