Skip to content


Repository files navigation

DOI License: CC BY 4.0

IberianBees database v.1.0.0 🐝

This is a repository to document the distribution and diversity of bee species of the Iberian Peninsula. You can see a summary of the data here.

How to contribute:

If you have data on Iberian bee's occurrence, fill in this template and send it to [email protected]

How to use this repo

  • The IberianBees database can be found on: Data/iberian_bees.csv.gz. This is a zip file so double click on it to unzip.

  • Metadata can be consulted here.

  • Records with non-accepted names on the Iberian bee species masterlist have been excluded of the final dataset but can be found on Data/Processing_iberian_bees_raw/removed.csv.

  • Please, if you spot any issue, please let @ibartomeus know to avoid duplicating efforts by creating an issue with the corresponding unique identifier (uid) of the record that needs to be fixed.

  • If you are curious on the process keep reading.


To build this database, we follow a reproducible workflow to clean and ensemble the data.

1- Use Scripts/1_1_Fetch_data.R to update data from internet (i.e. Gbif, iNaturalist).

2- Add new datasets (i.e. csv files) locally to Data/Rawdata/csvs/.

3- Process and clean individual files and assign a unique identifier within the folder Scripts/1_2_Processing_raw_data/.

4- Run Scripts/2_Run_all-Merge_all.R. This will run all individual files in Scripts/1_2_Processing_raw_data/and bind the data. The data can be merged directly without running all files by running the second section of the code "2 Merge all files".

5- Conduct a final cleaning (things that weren't fixed on the individual files on step 3). This is done in Scripts/3_1_Final_cleaning.R and will generate the final dataset Data/iberian_bees.csv.gz.

5.1- Non accepted species are excluded and saved on Data/Processing_iberian_bees_raw/removed.csv.

5.2- The non-accepted species names (e.g., synonyms) are checked manually from Data/Processing_iberian_bees_raw/to_check.csv and added to Data/Processing_iberian_bees_raw/manual_checks.csv once they have been reviewed with taxonomic advice when necessary. After running Scripts/3_1_Final_cleaning.R the fixed species will be included on the final Iberianbees dataset.

Metadata is generated using DataSpice.


Here, we provide an example of how to select, filter and plot the distribution of the species Xylocopa violacea for the records after the year 1999.

  • First, read compressed data in gzip format:
data <- read.table("../Data/iberian_bees.csv.gz", 
header = T, quote = "\"", sep = ",",row.names=1)
  • Second, select records of X. violacea after 1999
library(dplyr) #Library to filter data
xylocopa <- data %>% filter(Accepted_name == "Xylocopa violacea" & Year > 1999)
  • Finally, load map and plot records:
library(ggplot2) #to load worldmap and plotting
#Load map
world <- map_data("world")
#Plot records and adjust map to the Iberian Peninsula
ggplot(data = xylocopa, aes(Longitude, Latitude)) +
geom_map(data = world, map = world,
aes(long, lat, map_id = region), color = "white", fill = "grey", size = 0.1) +
coord_sf(xlim = c(-9, 4), ylim = c(36, 44)) +