forked from AaGillet/MorphoRegions
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
149 lines (98 loc) · 7.17 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# *MorphoRegions*: Analysis of Regionalization Patterns in Serially Homologous Structures
<!-- badges: start -->
<!-- badges: end -->
*MorphoRegions* is an R package built to computationally identify regions (morphological, functional, etc.) in serially homologous structures such as, but not limited to, the vertebrate backbone. Regions are modeled as segmented linear regressions with each segment corresponding to a region and region boundaries (or breakpoints) corresponding to changes along the serially homologous structure. The optimal number of regions and their breakpoint positions are identified using maximum-likelihood methods without *a priori* assumptions.
This package was first presented in [Gillet et al. (2024)](https://www.biorxiv.org/content/10.1101/2024.03.15.585285v1) and is an updated version of the [*regions* R package](https://github.com/katrinajones/regions) from [Jones et al. (2018)](https://www.science.org/doi/abs/10.1126/science.aar3126) with improved computational methods and expanded fitting and plotting options.
## Installing *MorphoRegions*
You can install the released version of *MorphoRegions* from [CRAN](https://CRAN.R-project.org/package=MorphoRegions) with:
```{r eval=F}
install.packages("MorphoRegions")
```
Or the development version from [GitHub](https://github.com/AaGillet/MorphoRegions) with:
```{r eval=F}
# install.packages("remotes")
remotes::install_github("AaGillet/MorphoRegions")
```
## Example
The following example illustrates the basic steps to prepare the data, fit regionalization models, select the best model, and plot the results. See `vignette("MorphoRegions")` or the [*MorphoRegions* website](https://aagillet.github.io/MorphoRegions/) for a detailed guide of the package and its functionalities.
```{r loadpackage}
library(MorphoRegions)
```
#### Preparing the data
Data should be provided as a dataframe where each row is an element of the serially homologous structure (e.g., a vertebra). One column should contain positional information of each element (e.g., vertebral number) and other columns should contain variables that will be used to calculate regions (e.g., morphological measurements). The `dolphin` dataset contains vertebral measurements of a dolphin with the positional information (vertebral number) in the first column.
```{r loaData}
data("dolphin")
```
```{r printData, echo = F}
knitr::kable(head(dolphin))
```
Prior to analysis, data must be processed into an object usable by *MorphoRegions* using `process_measurements()`. The `pos` argument is used to specify the name or index of the column containing positional information and the `fillNA` argument allows to fill missing values in the dataset (up to two successive elements).
```{r processData}
dolphin_data <- process_measurements(dolphin, pos = 1)
class(dolphin_data)
```
Data are then ordinated using a Principal Coordinates Analysis (PCO) to reduce dimensionality and allow the combination of a variety of data types. The number of PCOs to retain for analyses can be selected using `PCOselect()` (see the vignette for different methods of PCO axes selection).
```{r PCO}
dolphin_pco <- svdPCO(dolphin_data, metric = "gower")
# Select PCOs with variance > 0.05 :
PCOs <- PCOselect(dolphin_pco, method = "variance",
cutoff = .05)
PCOs
```
#### Fitting regressions and selecting the best model
The `calcregions()` function allows fitting all possible combinations of segmented linear regressions from 1 region (no breakpoint) to the number of regions specified in the `noregions` argument. In this example, up to 5 regions (4 breakpoints) will be fitted along the backbone, however, there is no limit for this value and it is possible to fit as many regions as you would like. For this example, regions will be fitted with a minimum of 3 vertebrae per region (`minvert = 3`) and using a continuous fit (`cont = TRUE`) (see `vignette("MorphoRegions")` or [*MorphoRegions* website](https://aagillet.github.io/MorphoRegions/) for details about fitting options).
```{r calcregions}
regionresults <- calcregions(dolphin_pco, scores = PCOs, noregions = 5,
minvert = 3, cont = TRUE,
exhaus = TRUE, verbose = FALSE)
regionresults
```
For each given number of regions, the best fit is selected by minimizing the residual sum of squares (`sumRSS`):
```{r modelselect}
models <- modelselect(regionresults)
models
```
The best overall model (best number of regions) is then select by ordering models from the best fit (top row) to the worst fit (last row) using either the AICc or BIC criterion:
```{r modelsupport}
supp <- modelsupport(models)
supp
```
Here, for both criteria, the best model is the 5 regions models with breakpoints at vertebrae 23, 27, 34, and 40. *The breakpoint value corresponds to the last vertebra included in the region, so the first region here is made of vertebrae 8 to 23 included and the second region is made of vertebrae 24 to 27.* The function also returns the **region score**, a continuous value reflecting the level of regionalization while accounting for uncertainty in the best number of regions (see `vignette("MorphoRegions")` or [*MorphoRegions* website](https://aagillet.github.io/MorphoRegions/) for more details).
#### Plotting results
Results of the best model (or any other model) can be visualized either as a scatter plot or as a vertebral map.
The **scatter plot** shows the PCO score (here for PCO 1 and 2) of each vertebra along the backbone (gray dots) and the segmented linear regressions (cyan line) of the model to plot. Breakpoints are showed by dotted orange lines.
```{r scatterplot, out.width = "55%", fig.align='center'}
plotsegreg(dolphin_pco, scores = 1:2, modelsupport = supp,
criterion = "bic", model = 1)
```
In the **vertebral map** plot, each vertebra is represented by a rectangle color-coded according to the region to which it belongs. Vertebrae not included in the analysis (here vertebrae 1 to 7) are represented by gray rectangles and can be removed using `dropNA = TRUE`.
```{r vertebralmap, fig.show='hold', out.width = "80%", fig.align='center',fig.height=0.65}
plotvertmap(dolphin_pco, name = "Dolphin", modelsupport = supp,
criterion = "bic", model = 1)
plotvertmap(dolphin_pco, name = "Dolphin", modelsupport = supp,
criterion = "bic", model = 1, dropNA = TRUE)
```
The variability around breakpoint positions can be calculated using `calcBPvar()` and then displayed on the vertebral map. The weighted average position of each breakpoint is shown by the black dot and the weighted variance is illustrated by the horizontal black bar.
```{r vertebralmapBPvar, out.width = "80%", fig.align='center',fig.height=0.65}
bpvar <- calcBPvar(regionresults, noregions = 5,
pct = 0.1, criterion = "bic")
plotvertmap(dolphin_pco, name = "Dolphin",
dropNA = TRUE, bpvar = bpvar)
```
## Citation
To cite *MorphoRegions*, please use:
```{r citation, eval=F}
citation("MorphoRegions")
```