-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathplots.Rmd
202 lines (146 loc) · 7.18 KB
/
plots.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
---
title: "Introduction to Plotting in R"
author: "Barboza-Salerno"
date: "`r Sys.Date()`"
output:
md_document:
variant: markdown_github
---
There are several packages in R that handle plotting. Plotting in R can be very sophisticated and can be a very useful tool to visualize data. One example to showcase the plotting capability of R is this graph, which was created with about 150 lines of R code by Paul Butler, showing the facebook friendship connections around the world.
Here we will introduce bar plots, box plots and scatter plots in the base graphics system. Then we will briefly mention the lattice graphics. Because of the complexity of R’s plotting system, we can only scratch the surface on this topic here. You can learn more about plotting in R from resources listed at the end of this page. While we will mostly focus on the base graphics system in this course, you are strongly recommended to look into the ggplot2 system. These days, more and more people are using ggplot2 instead of the base graphics system.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Introduction
This script demonstrates basic plotting in R base
## Creating a dataset
This demonstrates how to create a dataset in R. This will be the only time we do this, from now on we will import data in the form of .csv, .sav, .dta, .xlsx, or .sasbat.
```{r makedat}
sex_at_birth <- c("male", "male", "female", "male", "female", "female")
age <- c(10, 12, 3, 0, 1, 16)
cbl_score <- c(50, 39, 48, 54, 70, 55)
dep_score <- c(40, 29, 37, 44, 60, 45)
total_score <- cbl_score + dep_score
mydat <- data.frame(sex_at_birth, age, cbl_score, dep_score, total_score)
View(mydat)
```
## Boxplots
The boxplot() function can be used to create box plots.
The following command generates a box plot for age in our data:
```{r box, echo=TRUE}
boxplot(mydat$age)
```
Now, we change the x-axis label:
```{r boxlabel, echo=TRUE}
boxplot(mydat$age, xlab = "Age")
```
Now, we change the color
```{r boxcolor, echo=TRUE}
boxplot(mydat$age, xlab = "Age", col = "steelblue")
```
We can also split the box plots by groups. For example, the command
```{r boxbysex, echo=TRUE}
boxplot(mydat$age ~ mydat$sex_at_birth, ylab="Age")
```
Now we can use the boxplot() attributes to make a nice looking boxplot:
Below we
- first remove the data$ from the labels and add the data attribute explicitly
- then we change the color
- then we change the color by **sex_at_birth**
- then we relabel the boxes by color of **sex_at_birth**
*Note* Please do NOT think you will understand all of this immediately. If you want to learn it, take it step-by-step and understand what each command is doing before moving on!
```{r boxbetter, echo=TRUE}
boxplot(age ~ sex_at_birth, data=mydat, ylab="Age")
boxplot(age ~ sex_at_birth, data=mydat, ylab="Age", col="steelblue")
boxplot(age ~ sex_at_birth, data=mydat, ylab="Age", col=c("indianred", "steelblue"))
boxplot(age ~ sex_at_birth,
data=mydat,
ylab="Age", xlab = "Sex at Birth",
col=c("indianred", "steelblue"),
names = c("Female", "Male"),
main = "Boxplot Example")
```
# Scatterplots
Another commonly-used plot is a scatter plot. It is a plot of data points of one variable versus another. To plot the cbl scores versus depressive symptoms scores, we use the command:
```{r scatter, echo=TRUE}
plot(mydat$cbl_score, mydat$dep_score)
```
### Point-Type Parameter pch
By default, each point is represented by an open circle. This can be changed by setting the pch parameter. We can also change the axis labels using the xlab and ylab commands:
```{r scatlab, echo=TRUE}
plot(cbl_score ~ dep_score, data=mydat, pch=19, ylab="Depressive Symptoms",
xlab="Child Behavior Symptoms", las=1)
```
We see that the pch=19 option sets the points to be filled circles, and the las=1 option makes the y-axis numeric labels perpendicular to the y-axis. Other pch values correspond to other point types, and you can look them up [here](https://www.ling.upenn.edu/~joseff/rstudy/week4.html#pch). Actually, there is an easy way to look it up. Just type the command
```{r}
plot(1:25,rep(1,25),pch=1:25)
```
It plots the points (1,1), (2,1), (3,1), …, (25,1), with the first point having pch=1, second point with pch=2 and so on.
Now we can add color and other options as above. Note, the options are the same for ALL Base R plots (i.e., histograms, barcharts, plots, boxplots, etc.)
```{r scatcol, echo=TRUE}
plot(cbl_score ~ dep_score,
data=mydat, pch=19, col="blue",
ylab="Depressive Symptoms",
xlab="Child Behavior Symptoms",
las=3,
main = "Scatterplot Example")
```
Now let's color the scatterplot by sex at birth. To do so, R requires the categorical data to be represented as a factor. We can transform the variable in the plot itself (as I do below).
```{r scatshape, echo=TRUE}
plot(cbl_score ~ dep_score, data=mydat, pch=19,
col=as.factor(sex_at_birth),
ylab="Depressive Symptoms",
xlab="Child Behavior Symptoms", las=1)
```
We can also change the variable in the dataset from character to factor and then plot it (this is easier).
```{r}
(mydat$sex_at_birth <- as.factor(mydat$sex_at_birth))
plot(cbl_score ~ dep_score, data=mydat, pch=19,
col=sex_at_birth,
ylab="Depressive Symptoms",
xlab="Child Behavior Symptoms", las=1)
```
## CHALLENGE
Finally, we want to create a legend that shows the color scheme for the variable sex_at_birth. We also include horizontal lines representing the averages for each score (i.e., cbl and depressive symptoms). Follow along -->
```{r scatfinal, echo=TRUE}
plot(cbl_score ~ dep_score, data=mydat, pch=19,
col = c("violet", "blue")[as.factor(mydat$sex_at_birth)],
ylab="Depressive Symptoms",
xlab="Child Behavior Symptoms", las=1,
main = "Scatterplot Example")
legend(30,70,c("Female","Male"),col=c("violet", "blue"),pch=19)
abline(h=mean(mydat$cbl_score),v=mean(mydat$dep_score))
```
```{r scatfinalbigger, echo=TRUE}
plot(cbl_score ~ dep_score, data=mydat, pch=19,
col = c("blue", "violet")[mydat$sex_at_birth],
ylab="Depressive Symptoms",
xlab="Child Behavior Symptoms", las=1, cex = 1.5,
main = "Scatterplot Example")
legend(30,70,c("Female","Male"),col=c("blue", "violet"),pch=19)
abline(h=mean(mydat$cbl_score),v=mean(mydat$dep_score))
```
```{r boxbetter1, echo=TRUE}
c("blue", "violet")[mydat$sex_at_birth]
mydat$sex_at_birth
```
```{r boxbetter2, echo=TRUE}
hist(mydat$age)
# in the R console type ?hist to bring up the help screen
# fancy the hist of age
table(mydat$sex_at_birth)
barplot(table(mydat$sex_at_birth),ylim=c(0,5), ylab="percent",main="Barplot of Gender")
```
## Writing and Reading Data
We can save the datafile to the computer for later as follows:
```{r}
write.csv(mydat, "mydat.csv")
```
The default is to save the file to the working directory. So, it is useful to set the working directory at the outset. My working directory is
```{r}
getwd()
```
Now, let's read the csv file and re-run all charts...
```{r}
mydat <- read.csv("mydat.csv")
```