-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSTATISTICS_indices.Rmd
275 lines (176 loc) · 7.67 KB
/
STATISTICS_indices.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
---
title: "STATISTICS indices"
date: "16/05/2017"
output:
html_document:
number_sections: no
toc: yes
toc_float:
collapsed: false
smooth_scroll: false
---
```{r, echo=FALSE}
library(knitr)
```
<br/>
<br/>
___________________________________________________________________________________________________
# Combinaisons
<br/>
<div style="text-align:center;">
<embed src="pictures_tutorials/example_combinaisons.pdf" width='800' height='400' allowfullscreen webkitallowfullscreen></embed>
</div>
<br/>
___________________________________________________________________________________________________
# Correlations
<br/>
### Correlation between points
- **Pearson correlation :** test linearity
```{r, echo=FALSE, fig.align='center', out.width='400px', fig.retina=1}
include_graphics("pictures_tutorials/example_pearson-correlation.jpg")
```
<br/>
- **Spearman correlation :** test monotony
```{r, echo=FALSE, fig.align='center', out.width='400px', fig.retina=1}
include_graphics("pictures_tutorials/example_spearman-correlation.gif")
```
<br/>
### Correlation between matrices
- **Mantel test**
- Correlation between 2 distance matrices (e.g. : community distance matrix & environmental matrix)
- Significance through permutations of rows and columns
- Significance of the observed correlation = proportion of such permutations that lead to a higher correlation coefficient
<br/>
- **Partial Mantel test**
- Correlation between 2 matrices
<br/>
- **Mantel correlogram**
- Correlation between 2 matrices, by measuring the correlation between each class of distances
<br/> (Mantel test on each distance class)
<br/>
```R
## Mantel's permutation test for similarity of two matrices
## 1. Permutes the rows and columns of the two matrices randomly
## 2. Calculates a Z-statistic =
## sum of the pairwise product of the lower triangles
## of the permuted matrices, for each permutation of rows and columns
## 3. Compares the permuted distribution with the Z-statistic observed
library(ape)
mantel.test(mat1,mat2)
## If only 1 variable :
## calculates the simple Mantel r
## If more than 1 variable :
## calculates partial Mantel r by permuting one of the original dissimilarity matrices
library(ecodist)
mantel(mat2 ~ mat1)
## Matrix correlation between two dissimilarity matrices
## Partial matrix correlation between three dissimilarity matrices
## (significance through permutation of the first matrix)
library(vegan)
mantel(mat1,mat2)
mantel.partial(mat1,mat2,mat3)
```
<br/>
___________________________________________________________________________________________________
# Model evaluation
<br/>
### $R^2$ (1989 : Cox & Snell)
- Mesure la force de l’association (la taille d’effet) et fournit un indice de l’ajustement au modèle.
- Représente une **estimation de la variance expliquée par le modèle**.
- Plus la valeur est élevée, plus la probabilité prédite par le modèle s’approche de la valeur observée.
- MAIS n’atteint jamais le maximum théorique de 1 et varie en fonction de la taille de l’échantillon.
<br/>
### $R^2$ (1991 : Nagelkerke)
Modification du $R^2$ de Cox & Snell pour obtenir une valeur théorique plus proche de 1.
<br/>
- <u>Package `fmsb`, function `NagelkerkeR2` :</u>
\begin{align}
R^2_{fmsb} &=& \frac{1 - e^{\frac{dev - null}{n}}}{1 - e^{\frac{- null}{n}}}
\end{align}
- <u>[RLeca](http://rleca.pbworks.com/w/page/5469696/Scripts) functions :</u>
\begin{align}
R^2_{normal} &=& 1 - \left[ \frac{e^{\frac{- null}{2}}}{e^{\frac{- dev}{2}}} \right]^{2/n} \\ \\
R^2_{rescaled} &=& \frac{R^2_{normal}}{1 - \left( e^{\frac{- null}{n}} \right)^{2/n}} \\ \\
R^2_{adjusted} &=& 1 - \left( \frac{n-1}{n-p} \right)\cdot(1 - R^2_{normal}) \\ \\
R^2_{adjusted\text{, }rescaled} &=& 1 - \left( \frac{n-1}{n-p} \right)\cdot(1 - R^2_{rescaled})
\end{align}
<br/>
with :
- $n$ = number of observations
- $p$ = number of predictors
- $dev$ = model deviance
- $null$ = model null deviance
<br/>
### Area Under Curve (AUC)
- <u>Package `Hmisc`, function `somers2` :</u>
\begin{align}
AUC_{Hmisc} &=& \frac{\overline{\text{rank }_1} - \frac{n_1 + 1}{2}}{n - n_1}
\end{align}
- <u>Package `PresenceAbsence`, function `AUC` :</u>
\begin{align}
AUC_{PresenceAbsence} &=& \frac{n_0 \cdot n_1 + n_0 \cdot \frac{n_0 + 1}{2} - \Sigma \text{ rank }_0}{n_0 \cdot n_1}
\end{align}
<br/>
with :
- $n$ = number of observations
- $n_0$ = number of observed absences
- $n_1$ = number of observed presences
- $\text{rank }_0$ = rank of predictions whose observed values = 0
- $\overline{\text{rank }_1}$ = mean rank of predictions whose observed values = 1
<br/>
### Boyce index (Hilzer, 2006)
Presence-only data. Spearman correlation
- **Predicted to expected ratio :**
- predicted = proportion de points d'évaluation qui tombent dans chaque classe
<br/> (axis = habitat suitability, probabilité de présence, entre 0 et 1 par exemple)(juste les pixels d'évaluation)
- expected = proportion de l'espace qui tombe dans chaque classe (espace = tous les pixels prédits)
- Attendu que le ratio soit inférieur à 1 pour les classes proche de 0 et supérieur pour les classes proches de 1, puisque nos données d'évaluation sont des présences, donc on devrait être biaisé vers la droite de l'axe
<br/> -> test de spearman là dessus
<br/>
<div style="text-align:center;">
<embed src="pictures_tutorials/example_Boyce-index.pdf" width='800' height='600' allowfullscreen webkitallowfullscreen></embed>
</div>
<br/>
___________________________________________________________________________________________________
# Variance
<br/>
### VIF (Variance Inflation Factor)
- Detects **multi-collinearity** (correlation between predictors) in regression analysis.
- Estimates how much the variance of a regression coefficient is inflated due to multi-collinearity in the model.
- *Quotient of the variance in a model with multiple terms by the variance of a model with one term $i$ alone.*
\begin{align}
VIF_i &=& \frac{1}{1 - R^2_i}
\end{align}
Ranges from 1 upwards :
- The numerical value for VIF tells you (in decimal form) what percentage the variance (i.e. the standard error squared) is inflated for each coefficient.
- *For example, a VIF of 1.9 tells you that the variance of a particular coefficient is 90% bigger than what you would expect if there was no multi-collinearity — if there was no correlation with other predictors.*
- 1 = not correlated
- Between 1 and 5 = moderately correlated
- Greater than 5 = highly correlated
<br/>
### Importance of variables
<br/>
<div style="text-align:center;">
<embed src="pictures_tutorials/example_importance-variables.pdf" width='800' height='600' allowfullscreen webkitallowfullscreen></embed>
</div>
<br/>
### Variance partitioning
- **Formule du crible / Poincaré :** $V_n = V \left( U_{i=1}^n A_i \right)$
<br/>
<u>[RLeca](http://rleca.pbworks.com/w/page/5469696/Scripts) generalization :</u>
- **Transcription de la formule Poincaré :** <br/> *somme de toutes les variances des combinaisons à j éléments impairs parmi n, <br/> moins la somme de toutes les variances des combinaisons à k éléments impairs parmi n*
\begin{equation}
V_{n} = \Sigma_{j = 1, 3, ...}^n \left( \Sigma_{c = C_{n}^j} V_{(c)} \right) - \Sigma_{k = 2, 4, ...}^n \left( \Sigma_{d = C_{n}^k} V_{(d)} \right)
\end{equation}
- **Formule récursive :**
\begin{equation}
V_{C_{n}^i, n} = V_{C_{n}^i, n-1} - V_{C_{n}^{i+1}, n}
\end{equation}
<br/>
with :
- $n$ = number of predictors
- $i$ = predictor index
- $C_{n}^j$ = combinaison à j élements parmi n
- $V$ = variance (Nagelkerke $R^2_{adjusted, rescaled}$)
<br/>
___________________________________________________________________________________________________