-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLOG.Rmd
186 lines (105 loc) · 6.34 KB
/
LOG.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
title: "LOG & EXTENDED README"
author: "Henrik Christiansen"
date: "June 30, 2021"
output:
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Project: radseq pilot
RADseq optimization pilot experiment with several Antarctic species.
Contact: <[email protected]>
## Log:
30/06/2021: updated readme
29/06/2021: added coverage calculation in script 05_marker_density.R
24/06/2021: changed and rerun empirical digestion plots, set alpha to 1 and changed plotting order to avoid overlays
23/06/2021: added digestion with a standard 1000 mb genome
10-14/06/2021: added empirical loci statistics and updated marker density calculations
05-06/2021: added new functions and code to run marker density calculations, new script: 05_marker_density.R
24/04/2021: added extensive documentation
29/03/2021: rerun 01_digests.R again
24/03/2021: updated and rerun 01_digests.R for some additional size windows
22/03/2021: updated amphipod species ID
01/03/2021: updated bird plots in 02_empirical_digests.R
23/02/2021: updated double digest calculations in recto_REs_and_functions.R and 02_empirical_digests.R
22/02/2021: updated and rerun 02_empirical_digests.R, added input files for coverage calculations
21/02/2021: fixed some git issue with large .RDataTmp file
18/02/2021: updated and rerun 02_empirical_digests.R
17/02/2021: updated and consolidated code and files locally and on github,
downdgraded bioanalyzer to v0.5.1 again, the other version throws an error
22/01/2021: fixed xml encoding issues
20/01/2021: added new xml file for run 7, updated bioanalyzer to v0.6.2 and updated 02_empirical_digests.R
20/01/2021: added the project to a github repository
16/12/2020: added a modified read.bioanalyzer function to read in xml file with a missing lower marker in one sample
14/12/2020: small, mostly cosmetic updates in most scripts; updated project and input/output description;
finalized 02_empirical_digests.R
11/12/2020: added customized plot saving function in recto_REs_and_functions.R
10/12/2020: updated 02_empirical_digests.R,
added a customized ggplot plotting function in recto_REs_and_functions.R
added in silico functions for comparison in recto_REs_and_functions.R
12/11/2020: updated 02_empirical_digests.R & added a customized plotting function in recto_REs_and_functions.R
11/11/2020: updated to run with R v4.0.3
07-10/2020: added script 02_empirical_digests.R to read results from bioanalyzer and analyze them in R,
instead of cumbersome "manual" analysis in spreadsheets
20/07/2019: updated plot ratio scripts
10/07/2019: updated digest scripts
09/07/2019: updated the plots for test library 2
02/07/2019: updated digest scripts
28/06/2019: finalized coverage plot
27/06/2019: re-orderd scripts, included scripts for in silico digestion (previously stored in different R project)
23-27/06/2019: updated parameter plots (now based on correct de novo runs, with PE concatenating);
included & updated coverage plot
16-22/06/2019: updated parameter plots; started script to plot targeted and realized coverage
01-14/06/2019: included scripts to plot n_loci and n_snps_per_locus from de novo parameter optimization series
04/12/2018: rerun due to updated input data (Trematomus mainly)
29/11/2018: created script 01_plotting.R and ratio plots
28/11/2018: created R project & input data
## Input:
Various reference genomes from target species, or related species in fasta format under:
* ../refgenomes (note: these are not included in the online version on github and zenodo as they are too big; if you want to replicate these analyses you need to download the refgenomes from genbank)
Output files from the Agilent Bioanalyzer Software in XML format under:
* data/bioanalyzer_results
CSV file with all metadata related to the Bioanalyzer runs, created manually:
* data/bioanalyzer_results/run_overview.csv
CSV/TSV files in data/test_libraries with data from 5 test libraries:
* coverage_stats.csv: number of loci and coverage for each library under different M values as reported in gstacks.log.distribs files (note: the original gstacks.log.distribgs files are included as well, but actually not needed as long as you have this summary csv file)
* density_stats.csv: additional statistics for each library as reported in populations.log files (note: the original populations.log files are included as well, but actually not needed as long as you have this summary csv file)
* various n_snps_per_locus.tsv: files for each library/species and different r/M values with information about the loci and SNPs
R script that contains restriction enzyme information and various functions for analysis and plotting as used in the analyses scripts:
* scripts/recto_REs_and_functions.R
Additional R script to export plots consistently:
* scripts/printfig.R
## Analyses:
To be run sequentially, all listed under /scripts.
* 01_digests.R: script to perform in silico digestions for different target organisms
* 02_empirical_digests.R: script to read bioanalyzer results and plot output in comparison with in silico results
* 03_plot_n_loci.R: script based on Rochette & Catchen 2017 to plot the number of (polymorphic) loci for different M parameters
* 04_plot_n_snps_per_locus.R: script based on Rochette & Catchen 2017 to plot the number of SNPs per locus for different M parameters
* 05_marker_density.R: script to calculate how many snps are sequenced across the genome
* 06_plot_coverage.R: script to plot target and realized coverage
## Output:
In /data/in_silico_results:
* various tables.csv containing in silico digest results per target taxa
In /data:
* marker_density_results.csv containing the output from the 05_marker_density.R script
In /figures:
* genome digestion curves, empirical and in silico, created with 02_empirical_digests.R
* various n_loci_Mn plots per library, created with 03_plot_n_loci.R
* various n_snps per library/species, created with 04_n_snps_per_locus.R
* coverage plots, created by 06_plot_coverage.R
## Session info:
```{r libraries, include=FALSE}
packages <-c("here", "SimRAD", "seqinr", "bioanalyzeR", "tidyverse", "ggsci", "gridExtra")
lapply(packages, require, ch = T)
```
Package citations:
```{r pkgs, echo = F}
lapply(packages, citation)
```
R Session:
```{r sessioninfo, echo = F}
sessionInfo()
```