forked from OHI-Science/data-science-training
-
Notifications
You must be signed in to change notification settings - Fork 0
/
rstudio.Rmd
482 lines (308 loc) · 22.6 KB
/
rstudio.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
# R & RStudio, RMarkdown {#rstudio}
```{r setup, include = FALSE, cache = FALSE}
knitr::opts_chunk$set(error = TRUE)
library(htmltools)
```
## Overview
**Objectives**
In this lesson we will:
- get oriented to the RStudio interface
- work with R in the console
- be introduced to built-in R functions
- learn to use the help pages
- explore RMarkdown
- configure git on our computers
**Resources**
This lesson is a combination of excellent lessons by others (thank you Jenny Bryan and Data Carpentry!) that I have combined and modified for our workshop today. I definitely recommend reading through the original lessons and using them as reference:
[Dr. Jenny Bryan's lectures from STAT545 at UBC](https://stat545-ubc.github.io/)
- [R basics, workspace and working directory, RStudio projects](http://stat545-ubc.github.io/block002_hello-r-workspace-wd-project.html)
- [Basic care and feeding of data in R](http://stat545-ubc.github.io/block006_care-feeding-data.html)
RStudio has great resources about its IDE (IDE stands for integrated development environment):
- [webinars](https://www.rstudio.com/resources/webinars/)
- [cheatsheets](https://www.rstudio.com/resources/cheatsheets/)
## Why learn R with RStudio
You are all here today to learn how to code. Coding made me a better scientist because I was able to think more clearly about analyses, and become more efficient in doing so. Data scientists are creating tools that make coding more intuitive for new coders like us, and there is a wealth of awesome instruction and resources available to learn more and get help.
Here is an analogy to start us off. **If you were a pilot, R is an an airplane.** You can use R to go places! With practice you'll gain skills and confidence; you can fly further distances and get through tricky situations. You will become an awesome pilot and can fly your plane anywhere.
And **if R were an airplane, RStudio is the airport**. RStudio provides support! Runways, communication, community, and other services, and just makes your overall life easier. So it's not just the infrastructure (the user interface or IDE), although it is a great way to learn and interact with your variables, files, and interact directly with GitHub. It's also data science philosophy, R packages, community, and more. So although you can fly your plane without an airport and we could learn R without RStudio, that's not what we're going to do.
> We are learning R together with RStudio and its many supporting features.
Something else to start us off is to mention that you are learning a new language here. It's an ongoing process, it takes time, you'll make mistakes, it can be frustrating, but it will be overwhelmingly awesome in the long run. We all speak at least one language; it's a similar process, really. And no matter how fluent you are, you'll always be learning, you'll be trying things in new contexts, learning words that mean the same as others, etc, just like everybody else. And just like any form of communication, there will be miscommunications that can be frustrating, but hands down we are all better off because of it.
While language is a familiar concept, programming languages are in a different context from spoken languages, but you will get to know this context with time. For example: you have a concept that there is a first meal of the day, and there is a name for that: in English it's "breakfast". So if you're learning Spanish, you could expect there is a word for this concept of a first meal. (And you'd be right: 'desayuno'). **We will get you to expect that programming languages also have words (called functions in R) for concepts as well**. You'll soon expect that there is a way to order values numerically. Or alphabetically. Or search for patterns in text. Or calculate the median. Or reorganize columns to rows. Or subset exactly what you want. We will get you increase your expectations and learn to ask and find what you're looking for.
<!---TODO:
- airport
- debugging
- packages
- community
--->
## R at the console, RStudio goodies
Launch RStudio/R.
![](img/RStudio_IDE.png)
Notice the default panes:
* Console (entire left)
* Environment/History (tabbed in upper right)
* Files/Plots/Packages/Help (tabbed in lower right)
FYI: you can change the default location of the panes, among many other things: [Customizing RStudio](https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio).
An important first question: **where are we?**
If you've just opened RStudio for the first time, you'll be in your Home directory. This is noted by the `~/` at the top of the console. You can see too that the Files pane in the lower right shows what is in the Home directory where you are. You can navigate around within that Files pane and explore, but note that you won't change where you are: even as you click through you'll still be Home: `~/`.
![](img/RStudio_IDE_homedir.png)
OK let's go into the Console, where we interact with the live R process.
Make an assignment and then inspect the object you just created.
```{r}
x <- 3 * 4
x
```
In my head I hear, e.g., "x gets 12".
All R statements where you create objects -- "assignments" -- have this form: `objectName <- value`.
I'll write it in the console with a hashtag `#`, which is the way R comments so it won't be evaluated.
```{r eval = FALSE}
## objectName <- value
## This is also how you write notes in your code to explain what you are doing.
```
Object names cannot start with a digit and cannot contain certain other characters such as a comma or a space. You will be wise to adopt a [convention for demarcating words](http://en.wikipedia.org/wiki/Snake_case) in names.
```{r}
# i_use_snake_case
# other.people.use.periods
# evenOthersUseCamelCase
```
Make an assignment
```{r}
this_is_a_really_long_name <- 2.5
```
To inspect this variable, instead of typing it, we can press the up arrow key and call your command history, with the most recent commands first. Let's do that, and then delete the assignment:
```{r}
this_is_a_really_long_name
```
Another way to inspect this variable is to begin typing `this_`...and RStudio will automagically have suggested completions for you that you can select by hitting the tab key, then press return.
One more:
```{r}
science_rocks <- 100
```
Let's try to inspect:
```{r, eval=FALSE}
sciencerocks
# Error: object 'sciencerocks' not found
```
### Error messages are your friends
Implicit contract with the computer / scripting language: Computer will do tedious computation for you. In return, you will be completely precise in your instructions. Typos matter. Case matters. Pay attention to how you type.
Remember that this is a language, not unsimilar to English! There are times you aren't understood -- it's going to happen. There are different ways this can happen. Sometimes you'll get an error. This is like someone saying 'What?' or 'Pardon'? Error messages can also be more useful, like when they say 'I didn't understand this specific part of what you said, I was expecting something else'. That is a great type of error message. Error messages are your friend. Google them (copy-and-paste!) to figure out what they mean.
`r htmltools::img(src='img/practicalDev_googleErrorMessage.jpg', width=400)`
And also know that there are errors that can creep in more subtly, when you are giving information that is understood, but not in the way you meant. Like if I'm telling a story about tables and you're picturing where you eat breakfast and I'm talking about data. This can leave me thinking I've gotten something across that the listener (or R) interpreted very differently. And as I continue telling my story you get more and more confused... So write clean code and check your work as you go to minimize these circumstances!
### Logical operators and expressions
A moment about **logical operators and expressions**. We can ask questions about the objects we just made.
- `==` means 'is equal to'
- `!=` means 'is not equal to'
- `<` means ` is less than'
- `>` means ` is greater than'
- `<=` means ` is less than or equal to'
- `>=` means ` is greater than or equal to'
```{r}
science_rocks == 2
science_rocks <= 30
science_rocks != 5
```
> Shortcuts
You will make lots of assignments and the operator `<-` is a pain to type. Don't be lazy and use `=`, although it would work, because it will just sow confusion later. Instead, utilize **RStudio's keyboard shortcut: Alt + - (the minus sign)**.
Notice that RStudio automagically surrounds `<-` with spaces, which demonstrates a useful code formatting practice. Code is miserable to read on a good day. Give your eyes a break and use spaces.
RStudio offers many handy [keyboard shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts). Also, Alt+Shift+K brings up a keyboard shortcut reference card.
> My most common shortcuts include command-Z (undo), and combinations of arrow keys in combination with shift/option/command (moving quickly up, down, sideways, with or without highlighting.
When assigning a value to an object, R does not print anything. You can force R to print the value by using parentheses or by typing the object name:
```{r, purl=FALSE}
weight_kg <- 55 # doesn't print anything
(weight_kg <- 55) # but putting parenthesis around the call prints the value of `weight_kg`
weight_kg # and so does typing the name of the object
```
Now that R has `weight_kg` in memory, we can do arithmetic with it. For
instance, we may want to convert this weight into pounds (weight in pounds is 2.2 times the weight in kg):
```{r, purl=FALSE}
2.2 * weight_kg
```
We can also change a variable's value by assigning it a new one:
```{r, purl=FALSE}
weight_kg <- 57.5
2.2 * weight_kg
```
This means that assigning a value to one variable does not change the values of
other variables. For example, let's store the animal's weight in pounds in a new
variable, `weight_lb`:
```{r, purl=FALSE}
weight_lb <- 2.2 * weight_kg
```
and then change `weight_kg` to 100.
```{r, purl=FALSE}
weight_kg <- 100
```
What do you think is the current content of the object `weight_lb`? 126.5 or 220? Why?
<!---TODO::
Add more practice
--->
## R functions, help pages
R has a mind-blowing collection of built-in functions that are used with the same syntax: function name with parentheses around what the function needs in order to do what it was built to do. When you type a function like this, we say we are "calling the function". `function_name(argument1 = value1, argument2 = value2, ...)`.
<!---This example is from [R for Data Science](http://r4ds.had.co.nz/pipes) using a children's poem called [Little Bunny Foo Foo](https://en.wikipedia.org/wiki/Little_Bunny_Foo_Foo).
We can call a function without passing it anything (nothing inside the closed parentheses), and assign it to a variable called `foo_foo`.
```{r eval = FALSE, tidy = FALSE}
## foo_foo <- little_bunny()
```
And since `foo_foo` is an object, you can pass it to other functions:
```{r eval = FALSE, tidy = FALSE}
## hop(foo_foo, through = forest)
## scoop(foo_foo, up = field_mice)
## bop(foo_foo, on = head)
```
What would happen if I tried to run one of those lines above? I would get an error because they aren't real functions, and R tells me so:
```{r eval = FALSE, tidy = FALSE}
foo_foo <- little_bunny()
# Error in little_bunny() : could not find function "little_bunny"
```
And that's great, this error message is helpful: R doesn't know what the `little_bunny` function is, and to be honest, neither do we. We didn't expect that it would know what to do. OK, so now let's look at a real function.
--->
Let's try using `seq()` which makes regular sequences of numbers and, while we're at it, demo more helpful features of RStudio.
Type `se` and hit TAB. A pop up shows you possible completions. Specify `seq()` by typing more to disambiguate or using the up/down arrows to select. Notice the floating tool-tip-type help that pops up, reminding you of a function's arguments. If you want even more help, press F1 as directed to get the full documentation in the help tab of the lower right pane.
Type the arguments `1, 10` and hit return.
```{r}
seq(1, 10)
```
We could probably infer that the `seq()` function makes a sequence, but let's learn for sure. Type (and you can autocomplete) and let's explore the help page:
```{r, eval=F}
?seq
help(seq) # same as ?seq
```
The help page tells the name of the package in the top left, and broken down into sections:
- Description: An extended description of what the function does.
- Usage: The arguments of the function and their default values.
- Arguments: An explanation of the data each argument is expecting.
- Details: Any important details to be aware of.
- Value: The data the function returns.
- See Also: Any related functions you might find useful.
- Examples: Some examples for how to use the function.
```{r}
seq(from = 1, to = 10) # same as seq(1, 10); R assumes by position
seq(from = 1, to = 10, by = 2)
```
The above also demonstrates something about how R resolves function arguments. You can always specify in `name = value` form. But if you do not, R attempts to resolve by position. So above, it is assumed that we want a sequence `from = 1` that goes `to = 10`. Since we didn't specify step size, the default value of `by` in the function definition is used, which ends up being 1 in this case. For functions I call often, I might use this resolve by position for the first
argument or maybe the first two. After that, I always use `name = value`.
The examples from the help pages can be copy-pasted into the console for you to understand what's going on. Remember we were talking about expecting there to be a function for something you want to do? Let's try it.
### Your turn
> Exercise: Talk to your neighbor(s) and look up the help file for a function that you know or expect to exist. Here are some ideas: `?getwd()`, `?plot()`, `min()`, `max()`, `?mean()`, `?log()`).
And there's also help for when you only sort of remember the function name: double-questionmark:
```{r, eval=F}
??install
```
Not all functions have (or require) arguments:
```{r}
date()
```
## Clearing the environment
Now look at the objects in your environment (workspace) -- in the upper right pane. The workspace is where user-defined objects accumulate.
![](img/RStudio_IDE_env.png)
You can also get a listing of these objects with a few different R commands:
```{r}
objects()
ls()
```
If you want to remove the object named `weight_kg`, you can do this:
```{r}
rm(weight_kg)
```
To remove everything:
```{r}
rm(list = ls())
```
or click the broom in RStudio's Environment pane.
### Your turn
> Exercise: Clear your workspace, then create a few new variables. Create a variable that is the mean of a sequence of 1-20. What's a good name for your variable? Does it matter what your 'by' argument is? Why?
## RMarkdown
Now we are going to also introduce RMarkdown. This is really key for collaborative research, so we're going to get started with it early and then use it for the rest of the day.
An RMarkdown file will allow us to weave markdown text with chunks of R code to be evaluated and output content like tables and plots.
File -> New File -> RMarkdown... -> Document of output format HTML, OK.
`r img(src='img/rstudio_new-rmd-doc-html.png', width=300)`
You can give it a Title like "My Project". Then click OK.
OK, first off: by opening a file, we are seeing the 4th pane of the RStudio console, which is essentially a text editor. This lets us organize our files within RStudio instead of having a bunch of different windows open.
Let's have a look at this file — it's not blank; there is some initial text is already provided for you. Notice a few things about it:
- There are white and grey sections. R code is in grey sections, and other text is in white.
![](img/rmarkdown.png)
<br>
Let's go ahead and "Knit HTML" by clicking the blue yarn at the top of the RMarkdown file.
<br>
![](img/rmarkdown_side_by_side.png)
What do you notice between the two?
Notice how the grey **R code chunks** are surrounded by 3 backticks and `{r LABEL}`. These are evaluated and return the output text in the case of `summary(cars)` and the output plot in the case of `plot(pressure)`.
Notice how the code `plot(pressure)` is not shown in the HTML output because of the R code chunk option `echo=FALSE`.
More details...
This RMarkdown file has 2 different languages within it: **R** and **Markdown**.
We don't know that much R yet, but you can see that we are taking a summary of some data called 'cars', and then plotting. There's a lot more to learn about R, and we'll get into it for the next few days.
The second language is Markdown. This is a formatting language for plain text, and there are only about 15 rules to know.
Notice the syntax for:
- **headers** get rendered at multiple levels: `#`, `##`
- **bold**: `**word**`
There are some good [cheatsheets](https://github.com/adam-p/markdown-here/wiki/Markdown-Here-Cheatsheet) to get you started, and here is one built into RStudio: Go to Help > Markdown Quick Reference
<br />
<br />
**Important**: note that the hashtag `#` is used differently in Markdown and in R:
- in R, a hashtag indicates a comment that will not be evaluated. You can use as many as you want: `#` is equivalent to `######`. It's just a matter of style. I use two `##` to indicate a comment so that it's clearer what is a comment versus what I don't want to run at the moment.
- in Markdown, a hashtag indicates a level of a header. And the number you use matters: `#` is a "level one header", meaning the biggest font and the top of the hierarchy. `###` is a level three header, and will show up nested below the `#` and `##` headers.
Learn more: http://rmarkdown.rstudio.com/
### Your Turn
1. In Markdown, Write some italic text, and make a numbered list. And add a few subheaders.
Use the Markdown Quick Reference (in the menu bar: Help > Markdown Quick Reference).
1. Reknit your html file.
### Code chunks
OK. Now let's practice with some of those commands that we were working on this morning.
Create a new chunk in your RMarkdown first in one of these ways:
- click "Insert > R" at the top of the editor pane
- type by hand
\```{r}
\```
- if you haven't deleted a chunk that came with the new file, edit that one
Now, let's write some R code.
```
x <- seq(1:15)
```
Now, hitting return does not execute this command; remember, it's just a text file. To execute it, we need to get what we typed in the the R chunk (the grey R code) down into the console. How do we do it? There are several ways (let's do each of them):
1. copy-paste this line into the console.
1. select the line (or simply put the cursor there), and click 'Run'. This is available from
a. the bar above the file (green arrow)
b. the menu bar: Code > Run Selected Line(s)
c. keyboard shortcut: command-return
1. click the green arrow at the right of the code chunk
### Your turn
Add a few more commands to your file from this morning. Execute them by trying the three ways above. Then, save your R Markdown file.
<!--- remove this stuff and instead set up Git/GitHub.
## Working directory
Any process running on your computer has a notion of its "working directory". In R, this is where R will look, by default, for files you ask it to load. It is also where, by default, any files you write to disk will go. You have a sense of this because whenever you go to save a Word doc or download, it asks where. You often have to navigate to put it exactly where you want it. There are a few ways to check your working directory in RStudio.
You can explicitly check your working directory with:
```{r, eval=FALSE}
getwd()
```
It is also displayed at the top of the RStudio console.
As a beginning R user, it's OK let your home directory or any other weird directory on your computer be R's working directory. _Very soon_, I urge you to evolve to the next level, where you organize your analytical projects into directories and, when working on Project A, set R's working directory to Project A's directory.
You can set R's working directory at the command line like so. You could also do this in a script.
```{r eval = FALSE}
setwd("~/myCoolProject")
```
But there's a better way. A way that also puts you on the path to managing your R work like an expert.
## RStudio projects
Keeping all the files associated with a project organized together -- input data, R scripts, analytical results, figures -- is such a wise and common practice that RStudio has built-in support for this via its _projects_. More here: [Using Projects](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects).
Let's make one to use for the rest of today.
> Do this: File > New Project ... New Directory > Empty Project. The directory name you choose here will be the project name. Call it whatever you want (or follow me for convenience).
I created a directory and, therefore RStudio project, called `data-carpentry` in a folder called `tmp` in my home directory, FYI.
What do you notice about your RStudio pane? Look in the right corner--'data-carpentry'.
Now check that the "home" directory for your project is the working directory of our current R process:
```{r eval=FALSE}
getwd()
# "/Users/julialowndes/tmp/data-carpentry"
```
**About paths**: The above is the absolute path: it starts at the `/Users` root and is specific to my computer (`julialowndes`) and where I saved it. So if I did an analysis with this filepath, it wouldn't work on your computer before you altered the filepath.
But with an RStudio project, your paths within this project can be relative, which means they *start* in the `data-carpentry` folder, wherever it is. So we can just use filepaths within our project from a relative place, and it can work on your computer or mine, without worrying about the absolute paths. (Analogy: I can give you directions from this building to the pub, since we're all here in this shared space already. I can't give you all directions from your home to this building and then the pub, because you all live in different places. But I can give directions relative to this building).
### You try
Ask for help and recognize useful answers
TODO
--->
## Troubleshooting
Here are some additional things we didn't have time to discuss:
### I just entered a command and nothing's happening
It may be because you didn't complete a command: is there a little `+` in your console? R is saying that it is waiting for you to finish. In the example below, I need to close that parenthesis.
```{r, eval=FALSE}
> x <- seq(1, 10
+
```
### How do I update RStudio?
To see if you have the most current version of RStudio, go to the Help bar > Check for Updates. If there is an update available, you'll have the option to Quit and Download, which will take you to http://www.rstudio.com/download. When you download and install, choose to replace the previous version.