-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path4-Practice.Rmd
129 lines (97 loc) · 4.96 KB
/
4-Practice.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: "Data Tidying: Practice"
output: github_document
---
### Cleaning up
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(knitr)
library(lubridate)
```
For the remainder of our time, you may continue to work on this example data set or begin working with your own data. If you would like to continue with this example, open the a copy of this Rmd file and edit it. Include all of your answers in the Rmd file, as well as any other code and relevant discussion. We will stick around for awhile to answer any questions you might have.
# Hands-on exercise using Wisconsin Breast Cancer dataset
Now, we will read a slightly complicated Breast Cancer dataset. You may use the import data set drop-down option to read the data in, but be sure to save the code generated by the dialog and record it in the code chunk below.


```{r readdata, eval=FALSE}
# Modify the chunk statement such that no output is shown for this chunk
# import the data set here
...
# Add column names:
cnames <- c("ID", "Diagnosis",
"radius", "Texture", "Perimeter", "area",
"smoothness", "compactness", "concavity", "concave_points",
"symmetry","fractaldim",
"radiusSE", "TextureSE", "PerimeterSE", "areaSE",
"smoothnessSE", "compactnessSE", "concavitySE", "concave_pointsSE",
"symmetrySE","fractaldimSE",
"radiusW", "TextureW", "PerimeterW", "areaW",
"smoothnessW", "compactnessW", "concavityW", "concave_pointsW",
"symmetryW","fractaldimW")
# add column names here
...
```
The `wdbc` data set has `r # code calculating number of samples goes here` samples and `r # code calculating number of covariates here` covariates. There are mean, standard error and worst observations for the following measures (see the [wdbc.names](https://github.com/ravichas/TidyingData/blob/master/Data/wdbc.names) file for more details):
* Texture
* Perimeter
* area
* smoothness
* compactness
* concavity
* concave_points
* symmetry
* fractaldim
Figures showing the raw relationship between diagnosis (benign or metastatic) tumors and some of these measures follows:
```{r raw data, eval = FALSE}
# do not echo the code from this chunk, but do show the figures
# Diagnosis by radius
ggplot(wdbc, aes(Diagnosis, radius)) +
geom_jitter()
# Diagnosis by Texture
ggplot(wdbc, aes(Diagnosis, Texture, color = TextureSE)) +
geom_jitter()
# Diagnosis by Perimiter
ggplot(wdbc, aes(Diagnosis, Perimeter, color = PerimeterSE)) +
geom_jitter()
# Diagnosis by smoothness
ggplot(wdbc, aes(Diagnosis, smoothness, color = smoothnessSE)) +
geom_jitter()
# we didn't cover plotting with ggplot(), but feel free to add more figures if you would like
```
```{r summary for report, eval = FALSE}
# run this chunk, but do not print any of it's output to the document
# number of individuals with benign tumors
nBenign <- ...
# number of individuals with metastatic tumors
nMetastatic <- ...
# individuals with PerimeterSE > 5
nPerimeterSE_gt5 <- ... # number of indiviuals with PerimeterSE > 5
pctPerimeterSE_gt5 <- ... # % of indiviuals with PerimeterSE > 5 (wrt entire data set)
pctPerimeterSE_gt5_M <- ... # % of indiviuals diagnosed as 'M' (wrt individuals where PerimeterSE > 5)
##### creation of summaryTable #####
# Create a subset of wdbc containing only individuals with Diagnosis == 'B'
# and keep only Texture, Perimeter, area, smoothness, compactness, concavity,
# concave_points, symmetry, and fractaldim
meanBenign <- ...
# Create a subset of wdbc containing only individuals with Diagnosis == 'B'
# and keep only Texture, Perimeter, area, smoothness, compactness, concavity,
# concave_points, symmetry, and fractaldim
meanMetast <- ...
# fill table in with mean values among individuals with benign/metastatic diagnosis
summaryTable <- tibble(measure = c('Texture', 'Perimeter', 'area', 'smoothness',
'compactness', 'concavity', 'concave_points',
'symmetry', 'fractaldim'),
meanB = apply(meanBenign, 2, mean),
meanM = apply(meanMetast, 2, mean))
```
The perimter of the cells appears to be fairly consistent within most samples, but `r # round pctPerimeterSE_gt5 to one decimal place`% of the samples had a standard error over 5. Of these, `r # round pctPerimeterSE_gt5_M to one decimal place`% were diagnosed as metastatic.
```{r table, echo = FALSE}
if(!exists('summaryTable'))
summaryTable <- tibble(measure = c('Texture', 'Perimeter', 'area', 'smoothness',
'compactness', 'concavity', 'concave_points',
'symmetry', 'fractaldim'),
meanB = 'incomplete',
meanM = 'incomplete')
kable(summaryTable, caption = "This table shows the summary statistics for each of the measures")
```