10-refresher.Rmd

# R refresher {#sec-refresher}

The objectives of this chapter is to review some R syntax, functions
and data structures that will be needed for the following chapters.

## Administration

- Setting up an [RStudio project](https://uclouvain-cbio.github.io/WSBIM1207/sec-startr.html#getting-set-up)
- [Install](https://uclouvain-cbio.github.io/WSBIM1207/sec-startr.html#r-packages)
  packages from CRAN and
  [Bioconductor](https://uclouvain-cbio.github.io/WSBIM1207/sec-bioinfo.html#sec-bioconductor).

```{r, eval = FALSE}
BiocManager::install("UCLouvain-CBIO/rWSBIM1322")
```

- [Avoid saving and loading your
  workspace](https://uclouvain-cbio.github.io/WSBIM1207/sec-startr.html#getting-set-up)
  (the `.RData` file).
- [UTF-8 character encoding](https://uclouvain-cbio.github.io/WSBIM1207/sec-startr.html#getting-set-up).
- Starting a markdown document.

## Basic data structures and operations

- [vectors](https://uclouvain-cbio.github.io/WSBIM1207/sec-startr.html#introduction-to-r),
  generating and subsetting vectors.
- Missing values.
- [Factors](https://uclouvain-cbio.github.io/WSBIM1207/sec-startdata.html#factors)
- [Dataframes](https://uclouvain-cbio.github.io/WSBIM1207/sec-startr.html#introduction-to-r) (and tibbles)
- [Matrices](https://uclouvain-cbio.github.io/WSBIM1207/sec-startdata.html#matrices)
- Arrays
- [Lists](https://uclouvain-cbio.github.io/WSBIM1207/sec-startdata.html#lists)

**Summary**


|            | number of dimensions | number of data types |
|------------+----------------------+----------------------|
| **vector** | 1 (length)           | 1                    |
| **matrix** | 2                    | 1                    |
| **array**  | n                    | 1                    |
| **dataframe** | 2                 | n                    |
| **list**   | 1 (length)           | n                    |


## Tidyverse

- The [dplyr](https://uclouvain-cbio.github.io/WSBIM1207/sec-dplyr.html) package
- [Piping](https://uclouvain-cbio.github.io/WSBIM1207/sec-dplyr.html#pipes)
- Wide and long data, and their conversion with the `pivot_longer` and
  `pivot_wider` functions.

## Saving and exporting

- `saveRDS()` and `readRDS()` binary data.
- [Exporting
  data](https://uclouvain-cbio.github.io/WSBIM1207/sec-dplyr.html#exporting-data-1)
  with `write.csv` and `read.csv` (or `write_csv` and `read_csv`) and
  same for other types of spreadsheets.
- [Saving
  figures](https://uclouvain-cbio.github.io/WSBIM1207/sec-vis.html#exporting-plots)
  (`ggsave` and file devices such as `png()`, `pdf()`, ...).
- Package versions: `sessionInfo()`

## Programming

- [Writing
  functions](https://uclouvain-cbio.github.io/WSBIM1207/sec-prog.html#writing-new-functions)
- [Conditionals](https://uclouvain-cbio.github.io/WSBIM1207/sec-prog.html#conditionals)
  `if`/`else`
- [Iteration](https://uclouvain-cbio.github.io/WSBIM1207/sec-prog.html#iteration):
  `for` loops and `apply` functions

## Additional exercises

`r msmbstyle::question_begin()`

Complete the following function. It is supposed to take two inputs,
`x` and `y` and, depending whether the `x > y` or `x <= y`, it
generates the permutation `sample(x, y)` in the first case or draws a
sample from `rnorm(1000, x, y)` in the second case. Finally, it
returns the sum of all values.

```{r, eval = FALSE}
fun <- function(x, y) {
    res <- NA
    if () { ## 1
        res <- sample(,) ## 2
    } else {
        res <- rnorm(, , ) ## 3
    }
    return() ## 4
}
```

`r msmbstyle::question_end()`

`r msmbstyle::question_begin()`
Read the `interro2.rds` from the `rWSBIM1207` package (version 0.1.9
of later) file into R. The path to the file can be found with the
`rWSBIM1207::interro2.rds()` function.

This dataframe provides the scores for 4 tests for 10 students.

- Write a function that calculates the average score for the 3 best
  tests.
- Calculate this average score for the 10 students.

This can be done using the `apply` function or using `dplyr`
functions. For the latter, see also `rowwise()`.

Note the situation of students that have only presented 3 out of
4 tests (i.e they have a `NA` for one test). It is up to you to decide
whether you simply take the mean of the 3, or whether you prefer to
drop the worst of 3 and calculate the mean of the 2 best marks. Make
sure you are aware of what your implementation returns and, ideally,
state it explicitly in your response.

`r msmbstyle::question_end()`


```{r, include = FALSE}
stopifnot(packageVersion("rWSBIM1207") >= "0.1.9")

moy <- function(x) {
    x <- sort(x, decreasing = TRUE)[1:3] ## gets rid of NAs
    mean(x, na.rm = TRUE)
}

interro2 <- readRDS(rWSBIM1207::interro2.rds())
interro2$m <- apply(interro2[, -1], 1, moy)


interro2 <- readRDS(rWSBIM1207::interro2.rds())

library("tidyverse")

interro2 |>
    rowwise() |>
    mutate(m = moy(c(interro1, interro2, interro3, interro4)))

interro2 |>
    pivot_longer(names_to = "interro",
                 values_to = "score", -noma) |>
    group_by(noma) |>
    summarise(m = moy(score)) |>
    full_join(interro2)
```


`r msmbstyle::question_begin()`

- Create a matrix of dimenions 100 by 100 containing data from a
  normal distribution of mean 0 and standard deviation 1.

- Compute the means of each row and each column using the `apply()`
  and `rowMeans()`/`colMeans()` functions. Confirm that both provide
  the same results.

- Compute the difference between the column means and the row
  means. Does the result make sense?

`r msmbstyle::question_end()`


`r msmbstyle::question_begin()`

- Using the data `kem2_se` data from the `rWSBIM1322` package, compute
  de delta values or each gene (delta is the difference between the
  highest and lowest values). To do this, write a function `delta()`
  that takes a vector of numerics as input and returns the delta
  value, and apply it on the object's assay.

- Re-use your `delta()` function and apply it on each sample.

`r msmbstyle::question_end()`