-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathMapping_GBIFdata.Rmd
248 lines (192 loc) · 8.78 KB
/
Mapping_GBIFdata.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
---
title: "Accessing GBIF data"
author: "Denisse Fierro Arcos"
date: "2022-05-05"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Mapping biological data from GBIF
Now that we can search and download data from GBIF using the `rgbif` package, we can use it to create maps. This way we can see the distribution of a particular species across space. Even better, we will make this an interactive map using the `leaflet` library.
Before we continue, it is important to remember that in order to create maps, all observations must be georeferenced. That is, all observations must include coordinates (latitude and longitude) of the place where the observation was obtained.
## Load relevant libraries
As mentioned before, we will be using `leaflet` to create our interactive maps, and we will using the `tidyverse` library to manipulate our data, just as we did in the previous notebook.
```{r libaries, warning = F, message = F}
library(tidyverse)
library(leaflet)
```
## Loading data
In the previous notebook we saved the dataset we downloaded, so now we will load this file.
```{r load_data, warning = F, message = F}
slewini_aus <- read_csv("Data/Slewini_Australia.csv")
```
## Cleaning data
We will create a new column with the sum of observations per row as we did in the previous notebook, but this time we will save the result in a new variable.
```{r}
slewini_map <- slewini_aus %>%
#We will change NA values in both columns to 0
mutate(individualCount = replace_na(individualCount, 0),
organismQuantity = replace_na(organismQuantity, 0),
#Then we will create a new column with the sum of these two columns
count = rowSums(across(individualCount:organismQuantity))) %>%
#Now we will change any rows with zero values to one because we selected
#to include presence data only in our search. This means that each row
#represents one individual
mutate(count = case_when(count == 0 ~ 1,
T ~ count)) %>%
#Let's select only the columns we are interested in plotting
select(count, decimalLatitude, decimalLongitude) %>%
#We will rename the latitude and longitude columns so leaflet can
#recognise them as coordinates
rename(latitude = decimalLatitude, longitude = decimalLongitude) %>%
#Finally, if observations have the same latitude and longitude, we will
#add up their counts
group_by(latitude, longitude) %>%
summarise(n = sum(count))
```
### Checking for issues in our data
Before we continue, a word of caution. For the purposes of this workshop, we are assuming that all information in our dataset is accurate. However, there may be imprecisions in this data, so GBIF has included an `issue` column explaining any known issues with the data.
Let's look at the unique issues reported in our dataset.
```{r issues}
slewini_aus %>%
#We select the "issue" column
select(issue) %>%
#Only include unique entries
distinct()
```
Some of these issues are straightforward to understand, but others not so much. We can get a full explanation of each issue using the `gbif_issues_lookup()` function.
```{r issue_check}
rgbif::gbif_issues_lookup(issue = "GEODETIC_DATUM_ASSUMED_WGS84") %>%
#This function prints the content of the description column
pull(description)
```
We could even check how many observations have issues reported in our dataset.
```{r issues_obs}
slewini_aus %>%
#We drop any rows with no information in the issue column
drop_na(issue) %>%
#Finally we count them
count()
```
There are issues reported for most entries in our dataset, but should this be a cause of concern? That would depend on the issue reported and our tolerance for imprecision. Now you have some tools available to help you make a decision.
## Mapping data
As explained before, we will ignore any reported issues in our data and assume that they are all valid.
```{r map_test}
#Let's create a smaller sample of our data to test our map
slewini_map %>%
slice_sample(prop = .10) %>%
leaflet() %>%
#We add a base map
addTiles() %>%
#We add markers to our observations
addMarkers(~longitude, ~latitude,
#We show some information in the pop up. We will include
#the counts, but we must transform them to strings first
popup = as.character(slewini_map$n))
```
Our map is looking good, so we can good ahead and plot all observations.
```{r map}
#Let's create a smaller sample of our data to test our map
leaflet(slewini_map) %>%
#We add a base map
addTiles() %>%
#We add markers to our observations
addMarkers(~longitude, ~latitude,
#We show some information in the pop up. We will include
#the counts, but we must transform them to strings first
popup = as.character(slewini_map$n))
```
It seems like we have an error, there is an observation that clearly does not belong to Australia. We need to refer to our full dataset and identify the observations with the smallest longitude values.
```{r}
slewini_aus %>%
#We filter rows with the smallest longitude value
filter(decimalLongitude == min(decimalLongitude)) %>%
#We select only a few columns
select(decimalLatitude, decimalLongitude, issue)
```
We can then check what these issues are telling us. It seems the first issue may be relevant to us.
```{r}
rgbif::gbif_issues_lookup("COUNTRY_COORDINATE_MISMATCH") %>%
pull(description)
```
We have confirmed that there is a known issue with these observations, so we will remove them from our map data and plot them again.
```{r}
slewini_map %>%
filter(longitude != min(.$longitude)) %>%
leaflet() %>%
#We add a base map
addTiles() %>%
#We add markers to our observations
addMarkers(~longitude, ~latitude,
#We show some information in the pop up. We will include
#the counts, but we must transform them to strings first
popup = as.character(slewini_map$n))
```
### Mapping clustered data
We can also make map showing clusters where sharks are observed most often. To do this, we must first expand our map data so each row represents a single observation. First, let's find out how data observations we have in our dataset.
```{r}
sum(slewini_map$n)
```
This is not what we expected. We should have whole numbers only. Let's look into our main dataset.
```{r}
slewini_aus %>%
filter(individualCount %% 1 != 0 | organismQuantity %% 1 != 0) %>%
select(individualCount, organismQuantity, organismQuantityType)
```
They were definitely meant to be individuals, so they were recorded incorrectly. We will remove these from our dataset.
```{r}
new_slewini_map <- slewini_aus %>%
#we remove observations outside Australian waters
filter(decimalLongitude != min(decimalLongitude)) %>%
#We will change NA values in both columns to 0
mutate(individualCount = replace_na(individualCount, 0),
organismQuantity = replace_na(organismQuantity, 0),
#Then we will create a new column with the sum of these two columns
count = rowSums(across(individualCount:organismQuantity))) %>%
#Now we will change any rows with zero values to one because we selected
#to include presence data only in our search. This means that each row
#represents one individual
mutate(count = case_when(count == 0 ~ 1,
T ~ count)) %>%
#we remove the rows with decimal values included as counts
filter(count %% 1 == 0) %>%
#Let's select only the columns we are interested in plotting
select(count, decimalLatitude, decimalLongitude) %>%
#We will rename the latitude and longitude columns so leaflet can
#recognise them as coordinates
rename(latitude = decimalLatitude, longitude = decimalLongitude)
```
Now we can check how many observations we have recorded.
```{r}
sum(new_slewini_map$count)
```
Now let's expand our dataset and ensure that our number of rows matches the number of observations.
```{r}
new_slewini_map <- new_slewini_map %>%
uncount(count)
```
Great! Now we have the data ready for calculating clusters and mapping them.
```{r}
#We start up the map as before
map_slewini <- new_slewini_map %>%
mutate(num = 1) %>%
leaflet() %>%
#Add a base layer
addTiles() %>%
#We will use circles instead of markers
addCircleMarkers(~longitude, ~latitude, popup = as.character(slewini_map$n),
#We set the radius and opacity of our circles
radius = 1, fillOpacity = 0.5,
#We calculate the clusters
clusterOptions = markerClusterOptions())
map_slewini
```
Once we are happy with the result, we can save the map. Just make sure that you have saved the map to a variable.
```{r}
#Check if Map folder exists, otherwise create a new one
if(!dir.exists("Map")){
dir.create("Map")}
#Save map
htmlwidgets::saveWidget(map_slewini, file = "Map/Slewini_Australia.html")
```