-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathreadme.Rmd
75 lines (53 loc) · 3.7 KB
/
readme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
title: "Large Data in R: README"
output: github_document
---
[](https://mybinder.org/v2/gh/mbsabath/large_data_in_R/HEAD?urlpath=rstudio)
This repository contains all course materials for the Large Data in R Workshop. You can clone it to your personal computer to be able to easily follow along and run the exercises.
You'll need to have some basic skill with git to be able to do this. Instructions for setting up git are [here](https://docs.github.com/en/github/getting-started-with-github/set-up-git).
## Environment Set Up
**PLEASE DO THIS BEFORE THE WORKSHOP**
There are multiple ways to set up your environment for this course. Our focus will be more on the
concepts underlying Large Data in R and how to work through problems, rather then executing specific
blocks of code.
The easiest way to follow along with your own R studio instance is to click on the binder tag at the top of the screen. It'll open up an Rstudio instance in your browser with everything we need aleady installed.
However if you'd like to follow along on your own machine and ensure that you have all packages you need
installed, I've provided a `conda` environment in this repo for your use.
Note: This course assumes that you are comfortable using the command line and are working on a unix based system (MacOS or Linux). The assistance I'll be able to provide for attendees with windows computers will be limited.
#### STEP 0 - Clone This Repo
If you're reading this, you've found the git repository with materials for this course. The easiest way to download all of the materials and have them properly arranged is to clone this repo. To do this run the following command
```
git clone [email protected]:mbsabath/large_data_in_R
```
This will create a local copy of this repo on your computer. `cd` to that directory for the rest of this setup.
#### STEP 1 - Install Conda
If you're working on a computer that doesn't currently have conda installed, you can install miniconda [using this link](https://docs.conda.io/en/latest/miniconda.html). I recommend the Python 3.9 version.
#### STEP 2 - Create The Environment
Included in the repo is the file `large_data_env.yml`. This file lists the packages needed for this course. Conda is great for environment management and environment sharing since it handles installing all of the dependencies needed, and can support set up on multiple operating systems. Creating conda environments for your projects is a separate subject, but is a great way to make your research projects easy for others to use and to support reproducibility. To install this environment run:
```
conda env create -f large_data_env.yml
```
You will be prompted to download and install a number of packages, please install things as prompted.
If everything worked, you should see an environment named `large_data_in_R` listed when you run
```
conda env list
```
#### STEP 3 - Run R and Install One Additional Package
To activate the environment, run the following command:
```
source activate large_data_in_R
```
If this is successful, your terminal prompt will change to look something like this:
```
(large_data_in_R) <username> large_data_in_R %
```
To run Rstudio using the environment, it's important to run it from the terminal. To start Rstudio from the terminal, enter
```
rstudio
```
from the terminal where you've activated the envionment. The rstudio window that opens will have all required pacakges already installed, with the exception of the `chunked` and `fst` packages, which have some challenges when installing through conda. To install that package, please run:
```{R, eval = F}
install.packages("chunked")
install.packages("fst")
```
Once those are is installed, your environment is good to go!