diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml
index 9cfbc7f2..a0ee130e 100644
--- a/.github/workflows/R-CMD-check.yaml
+++ b/.github/workflows/R-CMD-check.yaml
@@ -18,6 +18,7 @@ jobs:
fail-fast: false
matrix:
config:
+ - {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
diff --git a/NEWS.md b/NEWS.md
index 624ae906..a1dce0ec 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -12,7 +12,7 @@ streamlined the package.
* Write a replacement function for `stringr::word` that is much faster.
* Additional speed up and accuracy of fuzzy_match function by
- Restricting reference list to names with the same first letter as input string.
- - Switch from using `utils::adist` to `stringdist:stringdist(method = "dl")`
+ - Switch from using `utils::adist` to `stringdist::stringdist(method = "dl")`
* Rework `standardise_names` to remove punctuation from the start of the string
* Rework `strip_names_extra` (previously `strip_names_2`) to just perform
additional functions to `strip_names`, rather than repeating those performed by `strip_names`.
diff --git a/R/align_taxa.R b/R/align_taxa.R
index 5d36616e..a261eb95 100644
--- a/R/align_taxa.R
+++ b/R/align_taxa.R
@@ -22,7 +22,7 @@
#' synonyms, orthographic variants) over fuzzy matches.
#' - It prioritises matches to taxa in the APC over names in the APNI.
#' - It identifies string patterns in input names that suggest a name can only
-#' be aligned to a genus (hybrids that are not in the APC/ANI; graded species;
+#' be aligned to a genus (hybrids that are not in the APC/APNI; graded species;
#' taxa not identified to species), and indicates these names only have a
#' genus-rank match.
#'
diff --git a/R/update_taxonomy.R b/R/update_taxonomy.R
index 80012ab3..ebc0b053 100644
--- a/R/update_taxonomy.R
+++ b/R/update_taxonomy.R
@@ -18,7 +18,7 @@
#' Notes:
#' - As the input for this function is a table with 5 columns (output by
#' align_taxa), this function will only be used when you explicitly want to
-#' separate the aligment and updating components of APCalign. This function is
+#' separate the alignment and updating components of APCalign. This function is
#' the second half of create_taxonomic_update_lookup.
#'
#' @family taxonomic alignment functions
diff --git a/README.Rmd b/README.Rmd
index 182928d5..424c2b0b 100644
--- a/README.Rmd
+++ b/README.Rmd
@@ -25,35 +25,23 @@ library(APCalign)
# APCalign
-'APCalign' uses the [Australian Plant Census (APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) and [Australian Plant Name Index](https://biodiversity.org.au/nsl/services/search/names) to align and update Australian plant taxon name strings. 'APCalign' also supplies information about
-the established status (native/introduced) of plant taxa across different states/territories.
+`APCalign` uses the [Australian Plant Census (APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) and [Australian Plant Name Index](https://biodiversity.org.au/nsl/services/search/names) to align and update Australian plant taxon name strings. 'APCalign' also supplies information about
+the established status (native/introduced) of plant taxa across different states/territories. It's useful for updating species list and intersecting them with the APC consensus understanding of established status (native/introduced).
## Installation
-For Windows and Linux:
-
```{r install, eval= FALSE}
-# install.packages("remotes")
-# remotes::install_github("traitecoevo/APCalign", dependencies = TRUE, upgrade = "ask")
-
-```
-
-for MacOS there is currently an extra line needed to install a working binary of the `arrow` dependency from r-universe instead of CRAN:
-
-```{r install_mac, eval= FALSE}
-
-# install.packages("arrow", repos = c('https://apache.r-universe.dev', 'https://cloud.r-project.org'))
-# remotes::install_github("traitecoevo/APCalign", dependencies = TRUE, upgrade = "ask")
-
+ install.packages("remotes")
+ remotes::install_github("traitecoevo/APCalign")
+
```
-
## A quick demo
Generating a look-up table can be done with just one function:
-```{r}
+```{r,message=FALSE}
library(APCalign)
@@ -68,7 +56,7 @@ create_taxonomic_update_lookup(
if you're going to use APCalign more than once, it will save you time to load the taxonomic resources into memory first:
-```{r}
+```{r,message=FALSE}
tax_resources <- load_taxonomic_resources()
@@ -83,7 +71,7 @@ create_taxonomic_update_lookup(
)
```
-Checking for Australian natives:
+Checking for a list of species to see if they are classified as Australian natives:
```{r, message=FALSE}
@@ -96,12 +84,12 @@ We also developed a shiny application for non-R users to update and align their
## Learn more
-Highly recommend looking at our [Getting Started](https://traitecoevo.github.io/APCalign/articles/APCalign.html) vignette to learn about how to use 'APCalign'. You can also learn more about our [taxa matching algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html).
+Highly recommend looking at our [Getting Started](https://traitecoevo.github.io/APCalign/articles/APCalign.html) vignette to learn about how to use `APCalign`. You can also learn more about our [taxa matching algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html).
## Found a bug?
-Did you come across an unexpected taxon name change? Elusive error you can't debug - [submit an issue](https://github.com/traitecoevo/APCalign/issues) and we will try our best to help
+Did you come across an unexpected taxon name change? Elusive error you can't debug - [submit an issue](https://github.com/traitecoevo/APCalign/issues) and we will try our best to help.
## Comments and contributions
diff --git a/README.md b/README.md
index 90a13d13..3164276a 100644
--- a/README.md
+++ b/README.md
@@ -10,31 +10,23 @@ coverage](https://codecov.io/gh/traitecoevo/APCalign/branch/master/graph/badge.s
# APCalign
-‘APCalign’ uses the [Australian Plant Census
+`APCalign` uses the [Australian Plant Census
(APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) and
[Australian Plant Name
Index](https://biodiversity.org.au/nsl/services/search/names) to align
and update Australian plant taxon name strings. ‘APCalign’ also supplies
information about the established status (native/introduced) of plant
-taxa across different states/territories.
+taxa across different states/territories. It’s useful for updating
+species list and intersecting them with the APC consensus understanding
+of established status (native/introduced).
## Installation
-For Windows and Linux:
-
``` r
-# install.packages("remotes")
-# remotes::install_github("traitecoevo/APCalign", dependencies = TRUE, upgrade = "ask")
-```
-
-for MacOS there is currently an extra line needed to install a working
-binary of the `arrow` dependency from r-universe instead of CRAN:
-
-``` r
-
-# install.packages("arrow", repos = c('https://apache.r-universe.dev', 'https://cloud.r-project.org'))
-# remotes::install_github("traitecoevo/APCalign", dependencies = TRUE, upgrade = "ask")
+ install.packages("remotes")
+ remotes::install_github("traitecoevo/APCalign")
+
```
## A quick demo
@@ -52,58 +44,49 @@ create_taxonomic_update_lookup(
"Commersonia rosea"
)
)
-#> Checking alignments of 3 taxa
+#> ================================================================================================================================================================
+#> # A tibble: 3 × 12
+#> original_name aligned_name accepted_name suggested_name genus taxon_rank
+#>
+#> 1 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
+#> 2 Acacia longifolia Acacia long… Acacia longi… Acacia longif… Acac… species
+#> 3 Commersonia rosea Commersonia… Androcalva r… Androcalva ro… Andr… species
+#> # ℹ 6 more variables: taxonomic_dataset , taxonomic_status ,
+#> # scientific_name , aligned_reason , update_reason ,
+#> # number_of_collapsed_taxa
```
- #> Loading resources into memory...
- #> ================================================================================================================================================================
- #> ...done
- #> -> of these 2 names have a perfect match to a scientific name in the APC. Alignments being sought for remaining names.
- #> # A tibble: 3 × 12
- #> original_name aligned_name accepted_name suggested_name genus taxon_rank
- #>
- #> 1 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
- #> 2 Acacia longifolia Acacia long… Acacia longi… Acacia longif… Acac… species
- #> 3 Commersonia rosea Commersonia… Androcalva r… Androcalva ro… Andr… species
- #> # ℹ 6 more variables: taxonomic_dataset , taxonomic_status ,
- #> # scientific_name , aligned_reason , update_reason ,
- #> # number_of_collapsed_taxa
-
if you’re going to use APCalign more than once, it will save you time to
load the taxonomic resources into memory first:
``` r
tax_resources <- load_taxonomic_resources()
+#> ================================================================================================================================================================
+
+create_taxonomic_update_lookup(
+ taxa = c(
+ "Banksia integrifolia",
+ "Acacia longifolia",
+ "Commersonia rosea",
+ "not a species"
+ ),
+ resources = tax_resources
+)
+#> # A tibble: 4 × 12
+#> original_name aligned_name accepted_name suggested_name genus taxon_rank
+#>
+#> 1 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
+#> 2 Acacia longifolia Acacia long… Acacia longi… Acacia longif… Acac… species
+#> 3 Commersonia rosea Commersonia… Androcalva r… Androcalva ro… Andr… species
+#> 4 not a species
+#> # ℹ 6 more variables: taxonomic_dataset , taxonomic_status ,
+#> # scientific_name , aligned_reason , update_reason ,
+#> # number_of_collapsed_taxa
```
- #> Loading resources into memory...
- #> ================================================================================================================================================================
- #> ...done
-
- create_taxonomic_update_lookup(
- taxa = c(
- "Banksia integrifolia",
- "Acacia longifolia",
- "Commersonia rosea",
- "not a species"
- ),
- resources = tax_resources
- )
- #> Checking alignments of 4 taxa
- #> -> of these 2 names have a perfect match to a scientific name in the APC. Alignments being sought for remaining names.
- #> # A tibble: 4 × 12
- #> original_name aligned_name accepted_name suggested_name genus taxon_rank
- #>
- #> 1 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
- #> 2 Acacia longifolia Acacia long… Acacia longi… Acacia longif… Acac… species
- #> 3 Commersonia rosea Commersonia… Androcalva r… Androcalva ro… Andr… species
- #> 4 not a species
- #> # ℹ 6 more variables: taxonomic_dataset , taxonomic_status ,
- #> # scientific_name , aligned_reason , update_reason ,
- #> # number_of_collapsed_taxa
-
-Checking for Australian natives:
+Checking for a list of species to see if they are classified as
+Australian natives:
``` r
@@ -125,7 +108,7 @@ align their taxonomic names. You can find the application here:
Highly recommend looking at our [Getting
Started](https://traitecoevo.github.io/APCalign/articles/APCalign.html)
-vignette to learn about how to use ‘APCalign’. You can also learn more
+vignette to learn about how to use `APCalign`. You can also learn more
about our [taxa matching
algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html).
@@ -134,7 +117,7 @@ algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.
Did you come across an unexpected taxon name change? Elusive error you
can’t debug - [submit an
issue](https://github.com/traitecoevo/APCalign/issues) and we will try
-our best to help
+our best to help.
## Comments and contributions
diff --git a/man/align_taxa.Rd b/man/align_taxa.Rd
index c16a3240..7ac0aa32 100644
--- a/man/align_taxa.Rd
+++ b/man/align_taxa.Rd
@@ -158,7 +158,7 @@ patterns, prioritising exact matches (to accepted names as well as
synonyms, orthographic variants) over fuzzy matches.
\item It prioritises matches to taxa in the APC over names in the APNI.
\item It identifies string patterns in input names that suggest a name can only
-be aligned to a genus (hybrids that are not in the APC/ANI; graded species;
+be aligned to a genus (hybrids that are not in the APC/APNI; graded species;
taxa not identified to species), and indicates these names only have a
genus-rank match.
}
diff --git a/man/update_taxonomy.Rd b/man/update_taxonomy.Rd
index cf9804c6..44a2f01d 100644
--- a/man/update_taxonomy.Rd
+++ b/man/update_taxonomy.Rd
@@ -107,7 +107,7 @@ Notes:
\itemize{
\item As the input for this function is a table with 5 columns (output by
align_taxa), this function will only be used when you explicitly want to
-separate the aligment and updating components of APCalign. This function is
+separate the alignment and updating components of APCalign. This function is
the second half of create_taxonomic_update_lookup.
}
}
diff --git a/vignettes/APCalign.Rmd b/vignettes/APCalign.Rmd
index e0c5e2c3..3de7c826 100644
--- a/vignettes/APCalign.Rmd
+++ b/vignettes/APCalign.Rmd
@@ -9,7 +9,7 @@ vignette: >
-When working with biodiversity data, it is important to verify taxonomic names with an authoritative list and correct any out-of-date names. The 'APCalign' package simplifies this process by:
+When working with biodiversity data, it is important to verify taxonomic names with an authoritative list and correct any out-of-date names. The `APCalign` package simplifies this process by:
- Accessing up-to-date taxonomic information from the [Australian Plant Census](https://biodiversity.org.au/nsl/services/search/taxonomy) and the [Australia Plant Name Index](https://biodiversity.org.au/nsl/services/search/names).
- Aligning authoritative names to your taxonomic names using our [fuzzy matching algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html)
@@ -17,18 +17,13 @@ When working with biodiversity data, it is important to verify taxonomic names
## Installation
-'APCalign' is currently not on CRAN. You can install its current developmental version using
-
-
-
```r
-# install.packages("remotes")
+install.packages("remotes")
remotes::install_github("traitecoevo/APCalign")
-
library(APCalign)
```
-To demonstrate how to use 'APCalign', we will use an example dataset `gbif_lite` which is documented in `?gbif_lite`
+To demonstrate how to use `APCalign`, we will use an example dataset `gbif_lite` which is documented in `?gbif_lite`
@@ -52,14 +47,14 @@ gbif_lite |> print(n = 6)
## Retrieve taxonomic resources
-The first step is to retrieve the entire APC and APNI name databases and store them locally as taxonomic resources. We achieve this using `load_taxonomic_resources()`.
+The first step is to retrieve the entire APC and APNI name databases and store them locally as taxonomic resources. We achieve this using `load_taxonomic_resources()`. The resources are compressed as parquet files to speed download and local loading.
There are two versions of the databases that you can retrieve with the `stable_or_current_data` argument. Calling:
- `stable` will retrieve the most recent, archived version of the databases from our [GitHub releases](https://github.com/traitecoevo/APCalign/releases). This is set as the default option.
- `current` will retrieve the up-to-date databases directly from the APC and APNI website.
-Note that the databases are quite large so the initial retrieval of `stable` versions will take a few minutes. Once the taxonomic resources have been stored locally, subsequent retrievals will take less time. Retrieving `current` resources will always take longer since it is accessing the latest information from the website. Check out our [Resource Caching](https://traitecoevo.github.io/APCalign/articles/caching.html) article to learn more about how the APC and APNIC databases are accessed, stored and retrieved.
+Note that the databases are reasonably large so the initial retrieval of the core data will take a few minutes. Once the taxonomic resources have been stored locally, subsequent retrievals will take less time. Retrieving `current` resources will always take longer since it is accessing the latest information from the website in an uncompressed format.
```r
diff --git a/vignettes/articles/reproducibility.Rmd b/vignettes/articles/reproducibility.Rmd
index 0dbb4c37..4090266e 100644
--- a/vignettes/articles/reproducibility.Rmd
+++ b/vignettes/articles/reproducibility.Rmd
@@ -61,7 +61,7 @@ default_version()
Then copying and pasting the output into `load_taxonomic_resources()` directly. This way makes the version of taxonomic resources more explicit in your code.
-To ensure the specific version of taxonomic resources is availabe for subsequent functions make sure to assign them to an object:
+To ensure the specific version of taxonomic resources is available for subsequent functions make sure to assign them to an object:
```{r}
resources_0.0.4.9000 <- load_taxonomic_resources(