We'll be using RStudio throughout the workshop, so we'll start with a brief orientation for those not familiar with it.
The rest of this session concerns literate programming with RMarkdown, a combination of the "markup" language Markdown with chunks of code (generally in R). When compiled (using the R packages knitr and rmarkdown, and with the tool pandoc), the code chunks are executed and replaced with their results (for example figures), to produce a final document that could be an HTML, PDF or Word document.
RStudio is a "graphical user interface" (GUI) for R, developed wholly separately from R, by the company RStudio.
When you open RStudio, you'll see four "panes":
- R console
- code editor
- environment/history
- files/plots/packages/help
You can reorganize the layout of the panes by going to Tools → Global Options → Pane Layout.
To get a list of RStudio keyboard shortcuts, go to Help → Keyboard Shortcuts Help.
You can type code directly into the R console. But mostly I'll have a script and copy-paste from it (using the RStudio keyboard shortcuts). Or really I should be using RMarkdown.
With RStudio, the best way to encapsulate a project in its own
directory is to create an "R Project". This creates a .Rproj
file in
the directory. When you open that file in RStudio, it will make that
directory your working directory, and you can save various
project-specific options.
Create an R project for this workshop by going to File → New Project. Then select "New Directory", and finally "Empty Project". Give a directory name and choose where it should be placed.
As an applied statistician working with a lot of different scientific collaborators, I spend a lot of my time writing reports describing analysis results. When starting out, I'd often just send a collaborator an email with my report as the body of the email and with a bunch of attached figures. I moved to writing formal reports, in Word or LaTeX. But there was a lot of copy-paste of figures, and messing about to get page breaks just right.
I now write all such reports using RMarkdown, with the output being a single HTML file, which can be opened in any web browser. I'll show a couple of examples. The advantages are:
- The results are fully reproducible, and can be easily revised if data or analysis choices change.
- With a single HTML page, I don't have to worry about page breaks and try to get figures to fit onto a page. Figures can be as long as I want, and my collaborators generally appreciate that level of detail.
- The figures can also be interactive.
RMarkdown is a mixture of text (written in "Markdown") and chunks of code. When compiled, the code chunks are executed and replaced with their results.
In RStudio, to create a new RMarkdown document, go to File → New File → R Markdown. Then provide a title (you can change it later) and choose the default output format (though we'll stick with the default, HTML).
A new document will be created, with some default content that provide useful reminders of some of the options.
Immediately save the document, giving the file a name and ensuring that it's saved in the correct place.
-
The YAML header at the top, defining the title, author, and date. Additional options can be included here.
-
The document is a mixture of text (in Markdown) and chunks of code (in R)
-
Markdown features such as for
<URLs>
,**bold**
,## Sections
, and`in-line code`
. And three "backticks" for multi-line chunks of code. -
The chunks of code starting with a line like
```{r pressure, echo=FALSE}
are the key code chunks that get executed when the document is compiled. Each chunk can have a name (here, "
pressure
"), which is optional but strongly recommended. For code that produces a graph, the graph will be inserted into the document that's produced. -
Click the "Knit" button (with the cutest-icon-ever ball of yarn) to compile the document. Compilation involves, knitr, pandoc, and (for PDFs) latex.
I'll do some live coding to create a more extended example of an RMarkdown document.
I'm going to use these packages:
Key things to illustrate:
- Some Markdown features
- Code chunks that do some work
- Code chunks that make plots
- In-line code
- Making an interactive data table
- Making an interactive plot
There are a bunch of different chunk options, to control the code chunks. Here are some common ones.
echo=FALSE
results="hide"
include=FALSE
eval=FALSE
warning=FALSE
message=FALSE
fig.width=[#]
fig.height=[#]
That first chunk of code in the default RMarkdown document shows how to set global chunk options. It's the most obscure part of RMarkdown and the package knitr, which is probably why it's included in the default document.
You can override any global chunk options in specific chunks.
In-line code (with single-back ticks and an r
) need to be within one
line; they can't span across lines.
I'll often precede a paragraph with a code chunk with include=FALSE
,
defining various variables, to simplify the in-line code.
Never hard-code a result or summary statistic again! Everything straight from the data.
The YAML header can include in-line code or Markdown fanciness, for example:
---
title: "knitr/R Markdown example"
author: "[Karl Broman](http://kbroman.org)"
date: "`r Sys.Date()`"
output: html_document
---
-
Don't use absolute paths
-
If you must use absolute paths, define the various directories with variables at the top of your document
-
For simulations, use
set.seed
in your first chunk. -
Include a final code chunk with
getwd()
and eithersessionInfo()
ordevtools::session_info()
.
-
The default is for RMarkdown to use the
png()
graphics device. -
Specify another graphics device with the chunk option
dev
-
Pass arguments to the graphics device with the chunk option
dev.args
. For example:```{r figure, dev.args=list(pointsize=18)}
-
In addition to
fig.height
andfig.width
, considerout.height
andout.width
. Theout.
ones are for the graphics device; thefig.
ones are for the size in the document produced.
Tables are a constant pain. I'll often just print a data frame as a crude table.
Other options:
kable()
in the knitr package.pandoc.table()
in the pander packagextable()
in the xtable package.DT()
in the DT package.
There are several html document options that I like, to be inserted into the YAML header:
---
title: "Doc options"
output:
html_document:
toc: true
toc_float: true
code_folding: hide
---
You can execute code in many languages besides R, including python, SQL, bash, and javascript.
A key issue is that variables are not remembered between these chunks and can't be passed between R chunks and back except by writing to and reading from files.
```{python python_chunk}
x = 5**3
print(x)
y = 'hello, python world'
print(y.split(' '))
```
- Karl's Knitr in a Knutshell tutorial
- Dynamic Documents with R and knitr (book)
- R Markdown documentation