-
Notifications
You must be signed in to change notification settings - Fork 40
/
day1_hw_answer-key.R
79 lines (56 loc) · 4.22 KB
/
day1_hw_answer-key.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
## Day 1 Homework Exercises
### R syntax and data structures
# 1. Try changing the value of the variable `x` to 5. What happens to `number`?
x <- 5
# 2. Now try changing the value of variable `y` to contain the value 10. What do you need to do, to update the variable `number`?
y <- 10
number <- x + y
# 3. Try to create a vector of numeric and character values by combining the two vectors that we just created (`glengths` and `species`). Assign this combined vector to a new variable called `combined`. Hint: you will need to use the combine `c()` function to do this. Print the `combined` vector in the console, what looks different compared to the original vectors?
combined <- c(glengths, species)
# 4. Let's say that in our experimental analyses, we are working with three different sets of cells: normal, cells knocked out for geneA (a very exciting gene), and cells overexpressing geneA. We have three replicates for each celltype.
# a. Create a vector named `samplegroup` with nine elements: 3 control ("CTL") values, 3 knock-out ("KO") values, and 3 over-expressing ("OE") values.
samplegroup <- c("CTL", "CTL", "CTL", "KO", "KO", "KO", "OE", "OE", "OE")
# b. Turn `samplegroup` into a factor data structure.
samplegroup <- factor(samplegroup)
# 5. Create a data frame called `favorite_books` with the following vectors as columns:
titles <- c("Catch-22", "Pride and Prejudice", "Nineteen Eighty Four")
pages <- c(453, 432, 328)
favorite_books <- data.frame(titles, pages)
# 6. Create a list called `list2` containing `species`, `glengths`, and `number`.
list2 <- list(species, glengths, number)
### Functions and arguments
# 1. Let's use base R function to calculate **mean** value of the `glengths` vector. You might need to search online to find what function can perform this task.
mean(glengths)
# 2. Create a new vector `test <- c(1, NA, 2, 3, NA, 4)`. Use the same base R function from exercise 1 (with addition of proper argument), and calculate mean value of the `test` vector. The output should be `2.5`.
# *NOTE:* In R, missing values are represented by the symbol `NA` (not available). It’s a way to make sure that users know they have missing data, and make a conscious decision on how to deal with it. There are ways to ignore `NA` during statistical calculations, or to remove `NA` from the vector. More information related to missing data can be found at this link -> https://www.statmethods.net/input/missingdata.html.
test <- c(1, NA, 2, 3, NA, 4)
mean(test, na.rm=TRUE)
# 3. Another commonly used base function is `sort()`. Use this function to sort the `glengths` vector in **descending** order.
sort(glengths, decreasing = TRUE)
# 4. Write a function called `multiply_it`, which takes two inputs: a numeric value `x`, and a numeric value `y`. The function will return the product of these two numeric values, which is `x * y`. For example, `multiply_it(x=4, y=6)` will return output `24`.
multiply_it <- function(x,y) {
product <- x * y
return(product)
}
### Reading in and inspecting data
# 1. Download this tab-delimited .txt file and save it in your project’s data folder.
# i. Read it in to R using read.table() and store it as the variable proj_summary, keeping in mind that:
# a. all the columns have column names
# b. you want the first column to be used as rownames (hint: look up the row.names = argument)
# ii. Display the contents of proj_summary in your console
proj_summary <- read.table(file = "data/project-summary.txt", header = TRUE, row.names = 1)
# 2. Use the class() function on glengths and metadata, how does the output differ between the two?
class(glengths)
class(metadata)
# 3. Use the summary() function on the proj_summary dataframe
# i. What is the median rRNA_rate?
# ii. How many samples got the “low” level of treatment
summary(proj_summary)
# 4. How long is the samplegroup factor?
length(samplegroup)
# 5. What are the dimensions of the proj_summary dataframe?
dim(proj_summary)
# 6. When you use the rownames() function on metadata, what is the data structure of the output?
str(rownames(metadata))
# 7. How many elements in (how long is) the output of colnames(proj_summary)? Don’t count, but use another function to determine this.
length(colnames(proj_summary))