A comprehensive and straightforward workflow for standardizing, integrating, and cleaning biodiversity data
Handle biodiversity data from several different sources is not an easy task. Here we present the Biodiversity Data Cleaning (BDC) workflow, an automated workflow to address quality issues and improve datas’ fitness-for-use. The workflow harmonizes and integrates data from different sources following common standards and protocols and implements various tests and tools to flag, document, clean, and correct the taxonomic, spatial, and temporal information of biodiversity data.
The workflow is composed of five core steps:
- Standardization and integration of different datasets;
- Pre-filter: flagging and removal of invalid or non-interpretable information, followed by data amendments (e.g., correct transposed coordinates and standardize country names);
- Taxonomy: cleaning, parsing, and standardization of scientific names against multiple taxonomic references. The workflow corrects spelling errors and converts nomenclatural synonyms to currently accepted names;
- Space: flagging of erroneous, suspicious, and low-precision geographic coordinates;
- Time: flagging and, whenever possible, correction of inconsistent collection date.
Aim to facilitate the documentation, visualization, and interpretation of results of data quality tests, in all steps, several files documenting the workflow results are saved automatically in a folder named “Output”. These files include i) records needing further inspection, ii) databases containing the results of each step, iii) figures, and iv) data-quality reports documenting the results.
You can install the released version of “BDC” from github with:
if (!require("remotes")) install.packages("remotes")
if (!require("bdc")) remotes::install_github("brunobrr/bdc")
See BDC package website (https://brunobrr.github.io/bdc/) for detailed explanation on each step of the workflow.
If you encounter a clear bug, please file an issue here. For questions or suggestion, please send us a email ([email protected]).