-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathreproducible_environment.Rmd
237 lines (162 loc) · 8.84 KB
/
reproducible_environment.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
# Reproducible Environment and Quality Assurance
## Using Containers, Continuous Integration and Delivery
To ensure that the scripts can run in a consistent environment, we create Docker containers with known versions of Development Environment and the corresponding library versions used in generating the data analysis/products.
We will also need to keep the containers up to date and easily available for download. Hence we utilize Continuous Integration and Delivery capabilities such as Quay.io to build images from specified Dockerfiles.
In this section, we shall walk you through the setup and running of different Docker environment.
## Installation
### ... for Windows
Follow the setup instructions here: https://docs.docker.com/docker-for-windows/install/
Note: If your machine doesn't met the requirement for "Docker For Windows", try setting up "Docker Toolbox":
https://docs.docker.com/toolbox/toolbox_install_windows/
### ... for Linux
Follow the setup instructions for your flavor of Linux here: https://docs.docker.com/engine/installation/linux/
### ... for MacOS
Follow the setup instructions here: https://store.docker.com/editions/community/docker-ce-desktop-mac
Or if you use Homebrew Cask,
```
$ brew cask install docker
```
## Ensure that Docker is Running
Start running the Docker app. Check that it is running on the command line:
```
$ docker info
Containers: 3
Running: 0
Paused: 0
Stopped: 3
Images: 1
Server Version: 1.13.1
...
```
## Using Docker for Python Notebooks
### Getting a Python Jupyter Notebook Container Image
There are at least two ways of getting an image:
- Pulling from a repository (such as quay.io)
- Loading from a file
### Pulling from a repository
You can pull down the image with:
```
$ docker pull quay.io/dksg/python3-notebook:1.0.0
```
Once that finishes downloading, you should see something like:
```
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/dksg/python3-notebook 1.0.0 f01e49a5a922 3 days ago 2.61 GB
```
### Loading from a file
This is an alternative method. Skip this if you already have pulled from a repository successfully. Otherwise, follow the steps below:
1. Copy the tar file (get this from a DK corelead) to your local directory (e.g. quay.io_SLASH_dksg_SLASH_python3-notebook_1.0.0.tar)
2. In your local directory, run the following docker command:
```
docker load --input quay.io_SLASH_dksg_SLASH_python3-notebook_<tagged_version>.tar
```
3. This will return a loaded image id.
4. Tag the newly added image with the version from the filename by running the following:
```
docker tag <loaded image id> quay.io/dksg/python3-notebook:<tagged_version>
```
### Running a Jupyter Notebook from the pulled/loaded image
Take the `IMAGE ID` from previous step and start it up with this command:
```
docker run -p 8888:8888 -v /path/to/local/directory:/home/jovyan/work f01e49a5a922
```
**Note:** /path/to/local/directory should be replaced by **an existing local directory in your laptop.**
This is where your notebooks (.ipynb) will be stored.
e.g. docker run -p 8888:8888 -v /Users/johndoe/datadive:/home/jovyan/work quay.io/dksg/python3-notebook:1.0.0
You will get instructions for link to paste into your browser address box. If you're using Docker Toolbox, you should use the custom IP address (default http://192.168.99.100/)
Once the notebook is running, you may create a new notebook and try the samples in this tutorial:
https://plot.ly/python/ipython-notebook-tutorial/
Note: The following python script may be needed to run first in order to run the above tutorial samples:
```
import plotly
plotly.offline.init_notebook_mode() # run at the start of every ipython notebook
```
## Using Docker for R Notebooks
### Getting an R Jupyter Notebook Container Image
There are at least two ways of getting an image:
- Pulling from a repository (such as quay.io)
- Loading from a file
#### Pulling from a repository
You can pull down the image with:
```
$ docker pull quay.io/dksg/r-notebook:<tagged_version>
```
Once that finishes downloading, you should see something like:
```
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/dksg/r-notebook 1.0.1 f01e49a5a922 3 days ago 2.61 GB
```
#### Loading from a file
This is an alternative method. Skip this if you already have pulled from a repository successfully. Otherwise, follow the steps below:
1. Copy the tar file (get this from a DK corelead) to your local directory (e.g. quay.io_SLASH_dksg_SLASH_r-notebook_1.0.1.tar)
2. In your local directory, run the following docker command:
```
docker load --input quay.io_SLASH_dksg_SLASH_r-notebook_<tagged_version>.tar
```
3. This will return a loaded image id.
4. Tag the newly added image with the version from the filename by running the following:
```
docker tag <loaded image id> quay.io/dksg/r-notebook:<tagged_version>
```
### Running a Jupyter Notebook from the pulled/loaded image
Take the `IMAGE ID` from previous step and start it up with this command:
```
docker run -p 8888:8888 -v /path/to/local/directory:/home/jovyan/work f01e49a5a922
```
**Note:** /path/to/local/directory should be replaced by **an existing local directory in your laptop.**
This is where your notebooks (.ipynb) will be stored.
e.g. docker run -p 8888:8888 -v /Users/johndoe/datadive:/home/jovyan/work quay.io/dksg/r-notebook:<tagged_version>
You will get instructions for link to paste into your browser address box. If you're using Docker Toolbox, you should use the custom IP address (default http://192.168.99.100/)
### Once the notebook is running, you may create a new notebook and try the following samples:
https://plot.ly/r/using-r-in-jupyter-notebooks/#examples
## Using Docker for RStudio
### Getting an RStudio Container Image
There are at least two ways of getting an image:
- Pulling from a repository (such as quay.io)
- Loading from a file
#### Pulling from a repository
You can pull down the image with:
```
$ docker pull quay.io/dksg/<project_name>-rstudio:<tagged_version>
```
Once that finishes downloading, you should see something like:
```
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/dksg/<project_name>-rstudio 1.0.2 1c1e06209032 13 hours ago 1.166 GB
```
#### Loading from a file
This is an alternative method. Skip this if you already have pulled from a repository successfully. Otherwise, follow the steps below:
1. Copy the tar file (get this from a DK corelead) to your local directory (e.g. quay.io_SLASH_dksg_SLASH_ojoy-rstudio_1.0.2.tar)
2. In your local directory, run the following docker command:
```
docker load --input quay.io_SLASH_dksg_SLASH_<project_name>-rstudio_<tagged_version>.tar
```
3. Once loaded, you should be able to see the new image when you run "docker images":
```
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/dksg/<project_name>-rstudio 1.0.2 1c1e06209032 13 hours ago 1.166 GB
```
### Running RStudio from the pulled/loaded image
Start it up with this command:
```
docker run -p 8787:8787 -v /path/to/local/directory:/home/rstudio/foobar quay.io/dksg/<project_name>-rstudio:<tagged_version>
```
**Note:** /path/to/local/directory should be replaced by **an existing local directory in your laptop.**
This is where your data/scripts will be stored.
e.g. docker run -d -p 8787:8787 -v /Users/johndoe/datadive:/home/rstudio/foobar quay.io/dksg/<project_name>-rstudio:<tagged_version>
You should be able to access RStudio in the browser via http://localhost:8787. If you're using Docker Toolbox, you should use the custom IP address (default http://192.168.99.100:8787)
Username: rstudio
Password: rstudio
## Managing libraries
If there's a python or R library that you need, you can install it in your container, but unless the library is persisted to the image, your scripts that use the library will not run on somebody else's machine. Each project will have a person assigned as a *library curator* and they will be able to include the library in the project's docker image. Workflow should be:
1. You're puttering along when you realise that you want to add your favourite nlp library.
2. You install it in your container, and try it out. It works great!
3. Show it to your project's curator and convince them that it's a useful library. Their default mode is lazy and they will try to point you to an existing library. You show them the hot shiny feature the one you want has.
4. The curator changes the requirements file in our docker file Github repo, Quay auto-magically builds a new image, and when people need to run your code, they need to use this new image.
## Testing
## Recommended Roles
## Tips