forked from UofTCoders/rcourse
-
Notifications
You must be signed in to change notification settings - Fork 0
/
lec12-randomization-tests.Rmd
593 lines (489 loc) · 25.5 KB
/
lec12-randomization-tests.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
---
title: "Data simulation and randomization tests"
author: "James S. Santangelo"
---
## Lesson preamble
> ### Learning objectives
>
> - Learn to simulate small-scale data in R.
> - Understand the basic logic underlying randomization tests and how they differ from
standard parametric statistic tests.
> - Apply randomization test to real ecological community data.
>
> ### Lesson outline
>
> Total lesson time: 2 hours
>
> - Brief intro to randomization tests (10 min)
> - Simulating small-scale data in R (30 mins)
> - General procedure for conducting a randomization test (10 mins)
> - Simulate data and perform t-test (10 mins)
> - Perform randomization test on same simulated data (30 mins)
> - Perform randomization test on real ecological data (30 mins)
>
> ### Setup
>
> - `install.packages('dplyr')` (or `tidyverse`)
> - `install.packages('ggplot2')` (or `tidyverse`)
> - `install.packages("reshape2")`
> - `install.packages("EcoSimR")`
## Introduction to randomization tests
As we saw in previous weeks, standard parametric statistical tests (e.g. ANOVA,
standard linear regression) have numerous assumptions that should be respected
prior to drawing conclusions from their results. Violation of these assumptions
can lead to erroneous conclusions about the population(s) you are trying to
understand. Mixed-effects models provided us with a few ways of overcoming the
limitations of standard models, while still using all of our data and remaining
in a statistical framework that we generally understand and are comfortable
with. Randomization tests provide yet another alternative solution to parametric
tests.
Many ecological and evolutionary questions ask: **Is the observed pattern
different than what we would expect by random chance?** In other words, the null
hypothesis is randomness. Randomization tests (sometimes called permutation
tests) allow us to test whether the observed data are different from a random
distribution generated by reordering our observed data. If the pattern is
random, then it should be just as likely as any other pattern generated by
reordering the data. If it is not random, then it should occur more or less
frequently than we expect under this distribution.
Randomization tests do not require that we make the same assumptions about our
data as standard parametric tests. We do not require that samples be randomly
drawn from the population since the goal of randomization tests is not to
estimate parameters about the population (parameter estimation requires random
samples to estimate means and variances). Similarly, we do not require normality
of the data nor do we need equal variances among treatment groups since we are
not really thinking about the population from which the data came and are not
testing against a distribution (e.g. _t_ distribution) with known shape. This
will all become more clear when we walk through an example. But first, an aside.
## An aside on simulating data in R
R allows us to generate fake data (i.e. simulate data) rather easily. This is
useful if we just want some toy data to play around with, want to simulate data
similar to the type we will collect in an experiment to test our analyses, or
want to simulate data collection many times to estimate population parameters
(e.g. variance or standard deviation). We can sample existing (i.e. real) data
using the `sample()` function, or take random samples from known statistical
distributions (e.g. Normal distribution). We will return to `sample()` later in
this lesson. For now, let's focus on sampling from known distributions.
We'll start by taking random samples from a normal distribution with mean equal
to 1 and standard deviation equal to 0.
```{r message=FALSE, warning=FALSE}
# Let's install all the packages we'll need
library(tidyverse)
library(reshape2)
library(EcoSimR)
```
```{r message=FALSE, warning=FALSE}
set.seed(42) # Ensure reproducibility of random results.
mean <- 0
sd <- 1
n <- 100
x <- rnorm(n, mean, sd)
# Fancy plot, if desired
ggplot(data.frame(x), aes(x=x)) +
geom_histogram(color="black", fill="white") +
annotate("text", x = -0.30, y = 9, label = round(mean(x), 4)) +
annotate("text", x = -0.30, y = 9.5, label = "Mean") +
ylab("Count") +
geom_vline(xintercept = mean(x), linetype = "dashed",
size = 1, colour = "red") +
theme_classic()
# Get mean
mean(x)
```
Note how the mean from the above distribution (0.0325) is not exactly the same
as the mean we specified (0). This should not be too surprising since we took
*random* samples from a normal distribution and there is likely going to be
variance within these samples. However, if we were to repeat this sampling many
times (e.g. 10,000), each time plotting the mean of the distribution, we would
expect the mean of this new distribution (i.e. the mean of the 10,000 means) to
be much closer to what we specified. Let's try this.
```{r message=FALSE, warning=FALSE}
set.seed(43)
mean <- 0
sd <- 1
size <- 100
x <- replicate(10000, mean(rnorm(size, mean, sd)))
# Fancy plot, if desired
ggplot(data.frame(x), aes(x=x)) +
geom_histogram(color="black", fill="white") +
annotate("text", x = -0.055, y = 1050, label = round(mean(x), 4)) +
annotate("text", x = -0.055, y = 1100, label = "Mean") +
ylab("Count") +
geom_vline(xintercept = mean(x), linetype = "dashed",
size = 1, colour = "red") +
theme_classic()
# Get mean
mean(x)
```
As expected, the above distribution is normal and its mean is much closer to 0
than the distribution generated by sampling only a single time. Interestingly,
the same would be true even if the original samples did not come from a normal
distribution. For those who took introductory stats, you might remember that
this is called the **Central Limit Theorem**, which states that the distribution
of sample means approaches a normal distribution regardless of the shape of the
original population's distribution.
Note that we could also simulate data from a linear model (or any other model),
as the example below shows.
```{r}
set.seed(44)
y_intercept <- 5.1
beta <- 0.23
x <- rnorm(25, mean = 7.6, sd = 0.7)
error <- rnorm(x, mean = 0, sd = 0.5)
y <- y_intercept + beta*x + error
model <- lm(y ~ x)
summary(model)
plot(y ~ x)
abline(model)
```
## Walkthrough of randomization test
Now that we know how to simulate data, let's use this to demonstrate the logic
underlying randomization tests. The basic procedure for a randomization test is
as follows:
1. Calculate test statistic for the sample data (e.g. difference in means,
difference in means, t-statistic, etc.). The test statistic is chosen by the
researcher.
2. Randomly reshuffle observations among the treatment groups, each time
calculating a test statistic.
3. Repeat 2 multiple times, generating a distribution of test-statistics.
4. Calculate the proportion of times the actual test statistic is outside the
distribution of test-statistics.
We will perform the above approach on some simulated data. Imagine we had
ventured to South America and collected 10 male and 10 female Hercules beetles.
We brought the beetles back to the lab and measured the width of their bodies at
the largest point. The question we are interested in is: **Do male and female
Hercules beetles differ in body width?** We will assume that body width is
normally distributed (i.e. we will draw from a normal distribution).
```{r message=FALSE, warning=FALSE}
set.seed(46)
# Male data
df_males <- data.frame(
width = rnorm(10, mean=16.5, sd=2.5),
sex = "Male"
)
# Female data
df_females <- data.frame(
width = rnorm(n=10, mean=15, sd=1.8),
sex = "Female"
)
# Let's combine the datasets
df_body_widths <- rbind(df_males, df_females)
# Let's look at the male and female data
ggplot(df_body_widths, aes(x = width, fill = sex)) +
ylab("Count") + xlab("Body width") +
geom_histogram(bins = 30, colour = "black") +
geom_vline(data = filter(df_body_widths, sex == "Male"), aes(xintercept = mean(width)),
size = 1, linetype = "dashed", colour = "red") +
geom_vline(data = filter(df_body_widths, sex == "Female"), aes(xintercept = mean(width)),
size = 1, linetype = "dashed", colour = "blue") +
theme_classic()
```
Males have a mean body width of `r mean(df_males$width)` while females have a
mean body width of `r mean(df_females$width)`
We will use a randomization test to test if males and females differ in body
width. If they do **not** differ, then each body width measurement in our
dataframe should be equally likely to belong to a male or to a female. In other
words, body width should independent of beetle sex. We can simulate this by
randomly rearranging the body width values, effectively removing the effect of
sex on body width, and then calculating a test statistic (e.g. _t_ statistic,
difference in means). We can do this many times (e.g. 5,000) and ask how many
times our observed test statistic falls outside the randomly generated
distribution of test statistics. We will use the difference in mean body width
as our test statistic so let's go ahead and calculate the _observed_ difference
in body width between male and female beetles.
```{r}
mean_males <- mean(df_males$width)
mean_females <- mean(df_females$width)
diff_means_obs <- mean_males - mean_females
```
The observed difference in mean body width between male and female Hercules
beetles is `r diff_means_obs`
Traditionally, if we wanted to determine if there is a significant difference in
means between two groups, we could use a **t-test**. Let's perform a t-test on
these data so we could compare it to the results of the randomization test that
we will perform. Note the `alternative = two.sided` argument passed to
`t.test()`. Since we are interested in whether males and females differ in mean
body length without reference to whether one sex is _specifically_ larger or
smaller than the other, we are performing a two-sided hypothesis test.
Realistically, males could either have larger or smaller body widths so a
two-tailed test is appropriate here.
```{r}
t.test(width ~ sex, data = df_body_widths, alternative = "two.sided")
```
From the t-test above, we see that our t-statistic has a value of 2.49 and that
males and females differ significantly in body width (_P_ < 0.05). From the
means reported in the t-test, it looks like males have larger body widths than
females (17.36 vs. 15.03). Let's now perform a randomization test. To get a
better sense of what we're doing, let's first perform a single reshuffling of
the data and look at the resulting distribution of the data.
```{r message=FALSE, warning=FALSE}
# Set seed for reproducible results
set.seed(47)
# Randomly reshuffle the width column
reshuffled <- df_body_widths
reshuffled$width <- sample(reshuffled$width,
size = nrow(reshuffled), replace = FALSE)
# View data to confirm that values have been reshuffle
head(df_body_widths)
head(reshuffled)
```
Note the differences in body width values between the original and reshuffled
dataframes above.
```{r message=FALSE, warning=FALSE}
# Plot histograms of reshuffled data
ggplot(reshuffled, aes(x = width, fill = sex)) +
ylab("Count") + xlab("Body width") +
geom_histogram(bins = 30, colour = "black") +
geom_vline(data = filter(reshuffled, sex == "Male"), aes(xintercept = mean(width)),
size = 1, linetype = "dashed", colour = "red") +
geom_vline(data = filter(reshuffled, sex == "Female"),aes(xintercept = mean(width)),
size = 1, linetype = "dashed", colour = "blue") +
theme_classic()
# Let get the mean difference of this single simulation
mean_males_sim1 <- mean(reshuffled[reshuffled$sex == "Male", "width"])
mean_females_sim1 <- mean(reshuffled[reshuffled$sex == "Female", "width"])
mean_diff_sim1 <- mean_males_sim1 - mean_females_sim1
```
The mean difference from a single reshuffling of the data is `r mean_diff_sim1`
We can see that the mean difference from the simulated data is considerably
different from the observed mean difference. This makes sense since the body
width values were randomly assigned to males and females. Let's now do this
5,000 times, each time calculating the mean difference in body width.
```{r message=FALSE, warning=FALSE}
# Set seed for reproducible results
set.seed(49)
# Initialize a list to store the simulated test-statistics
simulated_means <- list()
nreps = 5000 # 5000 iterations
for(i in 1:nreps){
# Create temporary dataframe to permute so we don't modify the original
reshuffled <- df_body_widths
# Permute the width column with the 'sample()' function.
reshuffled$width <- sample(reshuffled$width, size = nrow(reshuffled),
replace = FALSE)
# Calculate the means for each sex
mean_males_sim <- mean(reshuffled %>% filter(sex == "Male")
%>% pull(width))
mean_females_sim <- mean(reshuffled %>% filter(sex == "Female")
%>% pull(width))
# Calculate to difference between simulated male and female body width
# means
mean_diff_sim <- mean_males_sim - mean_females_sim
# Append simulated mean difference to list
simulated_means[i] <- mean_diff_sim
}
# Unlist simulated means list into numeric vector
simulated_means <- unlist(simulated_means)
# Show first 10 simulated mean differences
simulated_means[1:10]
```
We now have a numeric vector containing 5,000 simulated differences in mean body
width between males and females. Let's plot a histogram of the simulated values
and overlay onto the histogram our observed mean difference.
```{r}
ggplot() +
ylab("Count") + xlab("Simulated mean difference") +
geom_histogram(aes(x = simulated_means), bins = 30,
fill = "grey", alpha = 0.4, colour = "black") +
geom_vline(xintercept = diff_means_obs, size = 1,
linetype = "dashed", colour = "black") +
theme_classic()
```
Finally, to get our *P*-value, we calculate the number of times the simulated
mean difference exceeded the observed mean difference from our data. Because we
are performing a two-tailed test, this amounts to determining the number of
times the simulated mean difference is **either greater or lesser than the
observed difference**. We can do this by asking how many times the absolute
value of the simulated mean difference is greater or equal to the absolute value
of the observed mean difference.
```{r}
abs_simulated_means <- abs(simulated_means)
abs_diff_means_obs <- abs(diff_means_obs)
exceed_count <- length(abs_simulated_means[abs_simulated_means >=
abs_diff_means_obs])
p_val <- exceed_count / nreps
```
The P-value from the randomization test is `r p_val`
As we can see, the randomization test provides results that are largely
consistent with the t-test. This is not surprising since we specifically sampled
independent data from normal distributions with similar variances. In other
words, our sampled data do not violate any of the assumptions of the t-test and
in such a case the difference in means is by definition analogous to the _t_
statistic used in the t-test. There are some cases where randomization tests
provide more accurate (or _exact_) *P*-values than parametric tests (e.g. small
samples from skewed distributions) but even more diverse applications of
randomization tests are found in ecology and evolutionary biology.
## Randomization tests: A real-world application
Randomization tests are applied in all corners of ecological and evolutionary
research but for our purpose we'll turn to the field of community ecology for an
example. Community ecologists have long been interested in understanding the
factors that shape the organization of ecological communities. Are the species
in an ecological community just a random assortment of species available from
the regional species pool? Conversely, do species interactions and shared
resources determine the local distribution of species, such that some species
are found together more often than we would expect by chance whereas other never
or only rarely co-occur. The idea that competition and shared resources are
important in driving community assembly is known as _assembly rules_ and was
first proposed by Diamond (1975). An important null model based on randomization
tests was later developed to test what species assemblages would look like in
the absence of competition and shared resources (Connor and Simberloff 1979). We
will apply one such randomization test in this section.
First, let's look at the type of data we need for this analysis.
```{r eval=FALSE, message=FALSE, warning=FALSE}
# Load in example dataset
comMatrix_example <- "https://uoftcoders.github.io/rcourse/data/lec09_CommunityMatrix_Example.csv"
download.file(comMatrix_example, "lec09_CommunityMatrix_Example.csv")
comMatrix_example <- read_csv("lec09_CommunityMatrix_Example.csv")
```
```{r, echo=FALSE, message=FALSE, warning=FALSE}
comMatrix_example <- read_csv("data/lec09_CommunityMatrix_Example.csv")
```
```{r, message=FALSE, warning=FALSE}
head(comMatrix_example)
# Sliced to return a numeric Matrix, which is required for
# colSums() and rowSums()
colSums(comMatrix_example[,2:5])
rowSums(comMatrix_example[,2:5])
```
As you can see, the data is a matrix with sites (e.g. islands, quadrats, etc.)
as columns and species as rows. Each cell represents whether a species is
present (1) or absent (0) at that site. This type of data is often referred to
as a _presence-absence matrix_. The column totals (using `colSums()`) gives us a
sense of how species richness varies across the sites while the row totals
(using `rowSums()`) tells us how rare or common a species is across all sites.
The question we are interested in addressing is: **Are the observed species
co-occurrence patterns the result of species interactions or random chance?**
Let's read in some real data before proceeding. The data are from the [National
Ecological Observatory Network](https://www.neonscience.org/) and represent the
presence and absence of plant species in eight square meter plots during the
month of July, 2017, at a single site around Harvard, Boston, MA. Note this is
only a subset of the very large dataset provided by NEON, which dates back to
2013 and includes about 46 different sites.
```{r, eval=FALSE, message=FALSE, warning=FALSE}
# Load in NEON data
neon_data <- "https://uoftcoders.github.io/rcourse/data/NEON_PlantPA_HARV_201707.csv"
download.file(neon_data, "NEON_PlantPA_HARV_201707.csv")
neon_data <- read_csv("NEON_PlantPA_HARV_201707.csv")
```
```{r, echo=FALSE, message=FALSE, warning=FALSE}
neon_data <- read_csv("data/NEON_PlantPA_HARV_201707.csv")
```
```{r message=FALSE, warning=FALSE}
glimpse(neon_data)
```
Clearly this data is not in the form that we need it to be. As I'm sure you know
by now, R has many useful libraries and functions that should allow us to change
this relatively easily. We'll rely on the `reshape2` package to transform our
dataset into a presence-absence matrix.
```{r message=FALSE, warning=FALSE}
neon_data_filtered <- neon_data %>%
# We only want rows with plant species and not 'otherVariables'
dplyr::filter(divDataType == "plantSpecies") %>%
# To create a presence-absence matrix, we only need the taxonID
# (i.e. species) and the plotID (i.e. Site)
dplyr::select(plotID, taxonID)
# Create Presence-Absence matrix
# Keep only unique rows so that cells represent presence-absence
# and not abundances
neon_data_filtered <- unique(neon_data_filtered)
PA_matrix <- dcast(neon_data_filtered, formula = taxonID ~ plotID, fun.aggregate = length)
head(PA_matrix)
```
Well that was painless! We now have our data in a form that is usable in our
randomization test but we need to decide two things first: (1) _What is our test
statistic?_ This is what we'll use to determine if co-occurrence patterns are
random or not, similar to how we used the difference in means to tell if male
and female beetles differed in length. (2) _How do we randomize such a large
matrix?_ Do we randomly permute rows and columns so that each cell is equally
likely to contain a 1 or 0? Do we constrain row permutations so that row totals
(i.e. species rarity) are preserved but site richness is free to vary? Do we
constrain column permutations so that column totals (i.e. site richness) are
preserved but species rarity can vary? It turns out there are exactly 9
different ways a matrix like this can be randomized, each involving a different
set of constraints and biological assumptions (Gotelli 2000).
For our purpose, we will preserve species rarity but will assume that each site
is equally likely to be colonized. In other words, we are assuming that sites do
not vary in habitat quality or anything else that may change the number of
species that can occupy it. However, we want rare species to remain rare
throughout the permutations. This model is analogous to the _SIM2_ algorithm in
Gotelli (2000) and is reasonable in our case since the sites are relatively
close in proximity (i.e. likely have similar habitat quality).
For our index, we will use the **C-score** (Stone and Roberts 1990). We will not
go into the calculation of this score here but suffice it to say that the higher
the C-score, the less co-occurrence (on average) between all pairs of species in
the matrix (i.e. a more segregated matrix). Smaller C-scores indicate an
aggregated matrix (i.e. more co-occurrence among species pairs). Thankfully for
us, there is an R package that implements the randomization algorithm for us.
Let's go ahead and conduct the analysis.
```{r message=FALSE, warning=FALSE}
set.seed(50)
# Perform co-occurrence analysis
co_oc_analysis <- cooc_null_model(PA_matrix, algo = "sim2",
nReps=1000,
metric = "c_score",
suppressProg = TRUE,
saveSeed = TRUE)
# Summarize output from co-occurrence analysis
summary(co_oc_analysis)
# Let plot the randomization test
plot(co_oc_analysis,type="hist")
```
As we can see, our observed C-score (red line) is much lower (significantly so)
than we would expect by random chance given the species in our community. This
suggests the community is _aggregated_ such that some species are more likely to
occur with other species in the community than we would expect if the
communities were assembling themselves randomly. This can happen if there is
facilitation and some species create ecological niches or favourable habitat for
other species such that they occur more often. Importantly, the C-score is an
average across all pairwise comparisons in the matrix and doesn't tell us
anything about which specific species pairs may be more or less likely to
co-occur. Identifying which species pairs are driving this pattern would require
quite a bit more time, and identifying the ecological mechanisms behind
co-occurrence patterns of specific species pairs requires careful
experimentation.
For the sake of completeness, we can also visualize a single permutation of the
matrix compared against our observed matrix.
```{r}
plot(co_oc_analysis,type="cooc")
```
Notice how all of the rows in the two figures above always have the same number
of filled squares. This is confirmation that the rarity of species (i.e. row
totals) has been preserved during the permutation.
## Other uses of randomization tests
Testing for mean differences among groups and for putative rules governing
community assembly are only two of the many uses for randomization tests. In
this section, I briefly highlight other uses of data randomization in
statistics, ecology, and evolutionary biology.
Randomization tests are additionally used in numerous fields in ecology and
evolutionary biology such as spatial ecology (e.g. testing spatial patterns in
animal distribution and allowing for spatial autocorrelations), phylogenetics
(e.g. permutations of species phylogenies), community ecology, and environmental
monitoring. In addition to randomization tests in the way that we've
demonstrated here, randomization procedures play an important role in other
aspects biology and statistics.
**Bootstrapping** involves sampling data _with replacement_ to estimate the
sampling distribution of a population estimator (e.g. mean, median, standard
deviation). This is often done to estimate the error (e.g. standard error,
confidence interval) around the estimated population parameter.
**Cross-validation** involves randomly subsampling a dataset, which is then held
as a _model validation set_ on which the predictive ability of a model can be
tested. A model is tested on the remaining data (i.e. the _training set_) and
used to predict the validation set. This is done many times to estimate the
accuracy of the model.
**Monte-carlo methods** are a huge class of methods that generally involve
simulating random numbers to estimate probability distributions and for
obtaining numerical solution to really complicated problems. For our purposes,
these are especially common in computational biology, genomics and in Bayesian
inference (e.g. building phylogenies).
## Additional reading
1. Diamond, J.M. 1975. Assembly of species communities. p. 342-444 in: Ecology
and Evolution of Communities. M.L. Cody and J.M. Diamond (eds.). Harvard
University Press, Cambridge.
2. Connor, E.F. and D. Simberloff. 1979. The assembly of species communities:
chance or competition? Ecology 60: 1132-1140.
3. Connor, E.F., M.D. Collins, and D. Simberloff. 2013. The checkered history of
checkerboard distributions. Ecology 94: 2403-2414
4. Stone. L. and A. Roberts. 1990. The checkerboard score and species
distributions. Oecologia 85: 74-79.
5. Manly, B. 2007. Randomization, bootstrap and Monte Carlo methods in biology.
B. Carlin, C. Chatfield M. Tanner and J. Zidek, eds. 3rd edition. Chapman and
Hall/CRC.