-
Notifications
You must be signed in to change notification settings - Fork 2
/
index.qmd
117 lines (90 loc) · 5.5 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: "PHC 6701: R for Data Science // Advanced R"
author: "Gabriel J. Odom, PhD, ThD"
---
# About
These are the written lecture materials for the class PHC 6701 (commonly known as "Advanced R") at Florida International University's Stempel College of Public Health. The source code and data sets for this book are available here: <https://github.com/gabrielodom/PHC6701_r4ds>.
The videos for the course material are all in playlists on YouTube. The playlists are numbered to correspond with the chapters of this book. The channel is here: <https://www.youtube.com/@OdomScience/playlists>
# Getting Started
Complete all of the steps below **before** the first day of class (I'll be available via Zoom if you get stuck on any of these steps):
1. Set up your computer:
+ Windows: <https://derailment.netlify.app/2019-11-01-configure-a-windows-pc-for-data-science/> (ignore the stuff about LaTeX)
+ Mac: <https://derailment.netlify.app/2019-10-29-configure-a-mac-for-data-science/> (ignore the stuff about LaTeX)
2. Install R/Rstudio:
+ Windows: <https://derailment.netlify.app/2019-12-10-installing-r-rstudio-on-windows/>
+ Mac: <https://derailment.netlify.app/2019-11-16-installing-r-rstudio-on-a-mac/>
3. Configure RStudio: <https://derailment.netlify.app/2019-12-22-configuring-rstudio/>
4. (OPTIONAL) Join the community of R programmers on Slack: <https://rfordatasci.com/>.
5. Sign up for GitHub **using your student email address**: <https://docs.github.com/en/get-started/quickstart/creating-an-account-on-github>
# Pre-Course Learning Assessment
Here are some questions about some R basics. I don't expect you to know the answer to any of these questions now (but it's nice if you do). By the end of the semester, however, you will be able to answer all of these questions correctly.
Questions for R component of the final exam:
1. Everything in R is an __________.
2. Almost everything R does uses a __________.
3. You want to create an atomic vector of stoplight colours named by their meaning. Which option should you use?
a. `list(stop = "red", caution = "yellow", go = "green")`
b. `list(go = "red", drive_faster = "yellow", stop_and_check_Twitter = "green")`
c. `tibble(stop = "red", caution = "yellow", go = "green")`
d. `c(stop = "red", caution = "yellow", go = "green")`
4. You want to save your stoplight colours as an object named `colours_char`. What is the appropriate function (based on our style guide) to use in order to assign a value to this object?
5. Which functions can be used to extract the "red" component of the colours_char atomic vector? Select all that apply (each correct answer is worth 1/8th of the point).
a. `colours_char["stop"]`
b. `colours_char[["stop"]]`
c. `colours_char$"stop"`
d. `colours_char$stop`
e. `colours_char[1]`
f. `colours_char[[1]]`
g. `colours_char@"stop"`
h. `colours_char@stop`
6. What are the four most common types of atomic vectors in R, ordered from least to most complex?
7. We now create a non-atomic vector to store patient demographics named `patients_ls`. What expressions can be used to extract the "age" information (in the 5th position of the vector)? Select all that apply (each correct answer is worth 1/8th of the point).
a. `patients_ls["age"]`
b. `patients_ls[["age"]]`
c. `patients_ls$"age"`
d. `patients_ls$age`
e. `patients_ls[5]`
f. `patients_ls[[5]]`
g. `patients_ls@"age"`
h. `patients_ls@age`
8. What helper functions tell us what kind of object an object named `x` is?
a. `str(x); whatIs(x); typeof(x)`
b. `str(x); class(x); typeof(x)`
c. `str(x); class(x); whatIs(x)`
d. `whatIs(x); class(x); typeof(x)`
9. *True or False:* a list is a vector.
10. If you see the error `"could not find function "read_csv""` or `"could not find function "%>%""`, what code will probably fix it?
11. *True or False:* a tibble is a list.
12. *True or False:* a tibble is an atomic vector.
13. In R, what is the difference between a Matrix and a Data Frame?
14. Choose the Tidyverse functions that we have used to operate on columns of a tibble. Select all that apply.
a. `filter()`
b. `select() / rename()`
c. `mutate()`
d. `group_by()`
e. `summarise()`
f. `pull()`
g. `arrange()`
h. `pivot_wider() / pivot_longer()`
i. `left_join() / full_join()`
15. Choose the Tidyverse functions that we have used to operate on rows of a tibble. Select all that apply.
a. `filter()`
b. `select() / rename()`
c. `mutate()`
d. `group_by()`
e. `summarise()`
f. `pull()`
g. `arrange()`
h. `pivot_wider() / pivot_longer()`
i. `left_join() / full_join()`
16. Write the R code to create a function that normalizes a numeric vector. Include complete documentation and a basic example for this function.
17. Some of your data has a free text field with ZIP codes, stored in an object called `cityStateZIP_char`. What will `str_extract(cityStateZIP_char, pattern = "\\d{5}")` do?
18. You have 100 `.csv` files in a directory called `data_raw/` that contains no other files. Write a basic batch import command using the `purrr` package.
19. Label the three main groups of layers in the following basic ggplot graph.
```{r eval=FALSE}
ggplot(data = mtcars) + # A
aes(x = disp, y = mpg, color = cyl) + # B
geom_point(alpha = 0.75) # C
```
Line "A" is the ______________________________ layer.
Line "B" is the ______________________________ layer.
Line "C" is the ______________________________ layer.