layout | title | permalink |
---|---|---|
page |
Setup |
/setup/ |
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.
-
Install the latest version of R from CRAN.
-
Install the latest version of RStudio here. Choose the free RStudio Desktop version for Windows, Mac, or Linux.
-
Start RStudio. The qtl2 package contains code for haplotype reconstruction, QTL mapping and plotting. Install qtl2 by copying and pasting the following code in the R console.
install.packages("qtl2")
{: .r}
Make sure that the installation was successful by loading the qtl2 library, either by
copy-pasting to the Console or by checking the box next to qtl2
in the RStudio Packages
tab. You shouldn't get any error messages.
library(qtl2)
{: .r}
- Create a new project in your Desktop called
mapping
.
- Click the
File
menu button, thenNew Project
. - Click
New Directory
. - Click
New Project
. - Type
mapping
as the directory name. Browse to your Desktop to create the project there. - Click the
Create Project
button.
- Use the
Files
tab to create adata
folder to hold the data, ascripts
folder to house your scripts, and aresults
folder to hold results. Alternatively, you can use the R console to run the following commands for step 2 only. You still need to create a project with step 1.
dir.create("./data")
dir.create("./scripts")
dir.create("./results")
{: .r}
- Please download the following large files before the workshop, and place them in
your
data
folder. You can download the files from the URLs below and move the files the same way that you would for downloading and moving any other kind of data.
- SQLite database of variants in Collaborative Cross founder mouse strains (v3): SNP, indel, and structural variants in the Collaborative Cross founders (3.87 GB)
- SQLite database with mouse gene annotations from Mouse Genome Informatics (v7): full set of mouse gene annotations from build 38 mm10 (568.98 MB)
- SQLite database with MGI mouse gene annotations from Mouse Genome Informatics (v8): like the previous, but including only non-duplicate gene records sourced from MGI (11.36 MB)
- DO QTL data from benzene study described in French, John E., et al. Env Health Perspectives (2015): 237-245. (240.8 MB)
Alternatively, you can copy and paste the following into the R console to download the data.
options(timeout=900) # set the download timeout to 900 seconds from the default 60 seconds to help with large file downloads
# these four commands download files from a url and places them in the data directory you created
download.file(url="https://ndownloader.figshare.com/files/18533342", destfile="./data/cc_variants.sqlite")
download.file(url="https://ndownloader.figshare.com/files/24607961", destfile="./data/mouse_genes.sqlite")
download.file(url="https://ndownloader.figshare.com/files/24607970", destfile="./data/mouse_genes_mgi.sqlite")
download.file(url="ftp://ftp.jax.org/dgatti/qtl2_workshop/qtl2_demo.Rdata", destfile="./data/qtl2_demo.Rdata")
options(timeout=60) # reset the download timeout to the default 60 seconds
# for Windows machine, add the argument mode=wb to the download.file() command
# for example
download.file(url="ftp://ftp.jax.org/dgatti/qtl2_workshop/qtl2_demo.Rdata", destfile="./data/qtl2_demo.Rdata", mode = "wb")
{: .r}
You will need these for the final lesson episodes on SNP association mapping and QTL analysis in Diversity Outbred mice.
Make sure that both the SNP and gene files downloaded correctly by running the following
code. If you get an error, use getwd()
to check the file path
(e.g. "~/Desktop/mapping/data/cc_variants.sqlite"
) carefully or download the files
again. Make sure to use setwd()
to change the file path to the location where you saved
the file.
Check part of the SNP file. It is a very large file, so checking only a sample of the file should do.
# create a function to query the SNP file, then use this new function
# to select SNPs on chromosome 1 from 10 to 11 Mbp
snp_func = create_variant_query_func(dbfile = "~/Desktop/mapping/data/cc_variants.sqlite")
snps = snp_func(chr = 1, start = 10, end = 11)
# check the dimensions of this sample of the SNP file
dim(snps)
{: .r}
You should get a result that is 13150 rows by 16 columns.
Check the gene file in the same way.
# create a function to query the gene file, then select genes in the same region as above
gene_func = create_gene_query_func(dbfile = "~/Desktop/mapping/data/mouse_genes_mgi.sqlite")
genes = gene_func(chr = 1, start = 10, end = 11)
dim(genes) # check the dimensions
{: .r}
You should get a result that is 18 rows by 15 columns.