Skip to content

Commit

Permalink
differences for PR #473
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jul 25, 2023
1 parent 83a1189 commit 8006633
Show file tree
Hide file tree
Showing 6 changed files with 1,051 additions and 18 deletions.
34 changes: 17 additions & 17 deletions 04-tidyr.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,16 +118,16 @@ interviews %>%
# A tibble: 10 × 4
key_ID village interview_date instanceID
<dbl> <chr> <dttm> <chr>
1 46 Chirodzo 2016-11-17 00:00:00 uuid:35f297e0-aa5d-4149-9b7b-4965004cfc37
2 43 Chirodzo 2016-11-17 00:00:00 uuid:b4dff49f-ef27-40e5-a9d1-acf287b47358
3 67 Chirodzo 2016-11-16 00:00:00 uuid:6c15d667-2860-47e3-a5e7-7f679271e419
4 199 Chirodzo 2017-06-04 00:00:00 uuid:ffc83162-ff24-4a87-8709-eff17abc0b3b
5 9 Chirodzo 2016-11-16 00:00:00 uuid:846103d2-b1db-4055-b502-9cd510bb7b37
6 56 Chirodzo 2016-11-16 00:00:00 uuid:973c4ac6-f887-48e7-aeaf-4476f2cfab76
7 54 Chirodzo 2016-11-16 00:00:00 uuid:273ab27f-9be3-4f3b-83c9-d3e1592de919
8 45 Chirodzo 2016-11-17 00:00:00 uuid:e3554d22-35b1-4fb9-b386-dd5866ad5792
9 58 Chirodzo 2016-11-16 00:00:00 uuid:a7a3451f-cd0d-4027-82d9-8dcd1234fcca
10 57 Chirodzo 2016-11-16 00:00:00 uuid:a7184e55-0615-492d-9835-8f44f3b03a71
1 64 Chirodzo 2016-11-16 00:00:00 uuid:28cfd718-bf62-4d90-8100-55fafbe45d06
2 36 Chirodzo 2016-11-17 00:00:00 uuid:c90eade0-1148-4a12-8c0e-6387a36f45b1
3 34 Chirodzo 2016-11-17 00:00:00 uuid:14c78c45-a7cc-4b2a-b765-17c82b43feb4
4 21 Chirodzo 2016-11-16 00:00:00 uuid:cc7f75c5-d13e-43f3-97e5-4f4c03cb4b12
5 46 Chirodzo 2016-11-17 00:00:00 uuid:35f297e0-aa5d-4149-9b7b-4965004cfc37
6 54 Chirodzo 2016-11-16 00:00:00 uuid:273ab27f-9be3-4f3b-83c9-d3e1592de919
7 69 Chirodzo 2016-11-16 00:00:00 uuid:f86933a5-12b8-4427-b821-43c5b039401d
8 66 Chirodzo 2016-11-16 00:00:00 uuid:a457eab8-971b-4417-a971-2e55b8702816
9 61 Chirodzo 2016-11-16 00:00:00 uuid:2401cf50-8859-44d9-bd14-1bf9128766f2
10 200 Chirodzo 2017-06-04 00:00:00 uuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7
```

We notice that the layout or format of the `interviews` data is in a format that
Expand Down Expand Up @@ -367,13 +367,13 @@ other with "solar panel" in the `items_owned` column.
separate_rows(items_owned, sep = ";") %>%
```

You may notice that one of the columns is called `´NA´`. This is because some
of the respondents did not own any of the items that was in the interviewer's
list. We can use the `replace_na()` function to change these `NA` values to
something more meaningful. The `replace_na()` function expects for you to give
it a `list()` of columns that you would like to replace the `NA` values in,
and the value that you would like to replace the `NA`s. This ends up looking
like this:
You may notice that the `items_owned` column contains `NA` values.
This is because some of the respondents did not own any of the items that was in
the interviewer's list. We can use the `replace_na()` function to change these
`NA` values to something more meaningful. The `replace_na()` function expects
for you to give it a `list()` of columns that you would like to replace the `NA`
values in, and the value that you would like to replace the `NA`s. This ends up
looking like this:


```r
Expand Down
293 changes: 293 additions & 0 deletions data-visualisation-handout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
---
title: Code Handout - Data Visualisation with ggplot2
output:
html_document:
df_print: paged
code_download: yes
---

This document contains all of the functions that we have covered thus far in the
course. It will be updated every week, after we've added new skills. Each
function is presented alongside an example of how it is used.

All of the examples below are in the context of the Palmer Penguins, found
[here (link)](https://allisonhorst.github.io/palmerpenguins/index.html).



## Foundations of `ggplot()`

- `ggplot()` -- a function to create the shell of a visualization, where
specific variables are mapped to different aspects of the plot


```r
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species))
```

- `aes()` -- aesthetics that can be used when creating a `ggplot()`, where the
aesthetics can either be hard coded (e.g. `color = "blue"`) or associated with
a variable (e.g. `color = sex`).

- The following are the aesthetic options for *most* plots:
- `x`
- `y`
- `alpha` -- changes transparency
- `color` -- produces colored outline
- `fill` -- fills with color
- `group` -- used with categorical variables, similar to color

- **`+`** -- an important aspect creating a `ggplot()` is to note that the
`geom_XXX()` function is separated from the `ggplot()` function with a plus
sign, `+`.

- `ggplot()` plots are constructed in series of layers, where the plus sign
separates these layers.
- Generally, the `+` sign can be thought of as the end of a line, so you
should always hit enter/return after it. While it is not mandatory to move
to the next line for each layer, doing so makes the code a lot easier to
organize and read.


```r
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point()
```

## Geometric Objects to Visualize the Data

- `geom_histogram( )` -- adds a histogram to the plot,
where the observations are binned into ranges of values and then frequencies
of observations are plotted on the y-axis
- You can specify the number of bins you want with the `bins` argument


```r
penguins %>%
ggplot(aes(x = bill_length_mm)) +
geom_histogram(bins = 20)
```

- `geom_boxplot( )` -- adds a boxplot to the plot, where observations are
aggregated (summarized), the min, Q1, median, Q3, and maximum are plotted as the
box and whiskers, and "outliers" are plotted as points.
- You can plot a vertical boxplot by specifying the `x` variable, or a
horizontal boxplot by specifying the `y` variable.
- Note: the min and max may not be included in the whiskers, if they are
deemed to be "outliers" based on the $1.5 \\times \\text{IQR}$ rule.


```r
## Horizontal boxplot
penguins %>%
ggplot(aes(x = bill_length_mm)) +
geom_boxplot()

## Vertical boxplot
penguins %>%
ggplot(aes(y = bill_length_mm)) +
geom_boxplot()
```

- `geom_density()` -- adds a density curve to the plot, where the probability
density is plotted on the y-axis (so the density curve has a total area of one).
- By default this creates a density curve without shading. By specifying a
color in the `fill` argument, the density curve is shaded.
- Can be thought of as the "one group" violin plot!


```r
penguins %>%
ggplot(aes(x = bill_length_mm)) +
geom_density(fill = "tomato")
```

- `geom_violin()` -- plots violins for each level of a categorical variable
- Can be thought of as a hybrid mix of `geom_boxplot()` and `geom_density()`,
as the density is displayed, but it is reflected to provide a plot similar in
nature to a boxplot.
- To obtain violins stacked vertically, declare the categorical variable as `y`.
To obtain side-by-side violins, declare the categorical variable as `x`.


```r
## Stacked vertically
penguins %>%
ggplot(aes(x = bill_length_mm, y = species)) +
geom_violin()

## Side-by-side
penguins %>%
ggplot(aes(y = bill_length_mm, x = species)) +
geom_violin()
```

- `geom_bar()` -- creates a barchart of a categorical variable
- Can produce stacked barcharts by specifying a variable as the `fill`
aesthetic.
- Can change from stacked barchart to a side-by-side barchart by specifying
`position = "dodge"`.
- If your data are already in counts (e.g. output from `count()`), then you
can specify the `stat = "identity"` argument inside `geom_bar()`.


```r
## Stacked barchart
penguins %>%
ggplot(aes(x = species)) +
geom_bar(aes(fill = sex))

## Side-by-side barchart
penguins %>%
ggplot(aes(x = species)) +
geom_bar(aes(fill = sex),
position = "dodge")

## If data are raw counts
penguins %>%
count(species, sex) %>%
ggplot(aes(x = species, y = n)) +
geom_bar(aes(fill = sex),
stat = "identity",
position = "dodge")
```

- `geom_point()` -- plots each observation as an (x, y) point, used to create
scatterplots
- Can use `alpha` to increase the transparency of the points, to reduce
overplotting.
- Can specify `aes`thetics inside of `geom_point()` for local aesthetics (point
level) or inside of `ggplot()` for global aesthetics (plot level)


```r
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species))
```

- `geom_jitter()` -- plots each observation as an (x, y) point and adds a small
amount of jitter around the point
- Useful so that we can see each point in the locations where there are
overlapping points.
- Can specify the `width` and `height` of the jittering using the optional
arguments.


```r
penguins %>%
ggplot(aes(y = body_mass_g, x = species)) +
geom_violin() +
geom_jitter(aes(color = sex), width = 0.25, height = 0.25)
```

- `geom_smooth()` -- plots a line over a set of points, draws the readers eye
to a specific trend
- The methods we will use are "lm" for a linear model (straight line), and
"loess" for a wiggly line
- By default, the smoother gives you gray SE bars, to remove these add
`se = FALSE`


```r
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point() +
geom_smooth(method = "lm")
```

- `facet_wrap()` -- creates subplots of your original plot, based on the levels
of the variable you input
- To facet by one variable, use `~variable`.
- To facet by two variables, use `variable1 ~ variable2`.
- If you prefer for your facets to be organized in rows or columns, use the
`nrow` and/or `ncol` arguments.


```r
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~island, nrow = 1)
```

## Plot Characteristics

- `labs()` -- specifies the plot labels, possible labels are: x, y, color, fill,
title, and subtitle


```r
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Bill Length (mm)",
y = "Bill Depth (mm)",
color = "Penguin Species")
```

- `theme_bw()` -- changes the plotting background to the classic dark-on-light
ggplot2 theme.
- This theme may work better for presentations displayed with a projector.
- Other theme options are `theme_minimal()`, `theme_light()`, and `theme_void()`.


```r
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Bill Length (mm)",
y = "Bill Depth (mm)",
color = "Penguin Species") +
theme_bw()
```

- `theme()` --
- Possible options are:
- `panel.grid` -- controls the grid lines (`panel.grid = element_blank()`
removes grid lines)
- `text` -- specifies font size for the entire plot (e.g.
`text = element_text(size = 16)`
- `axis.text.x` -- specifies the font size for the x-axis text
- `axis.text.y` -- specifies the font size for the y-axis text
- `plot.title` -- specifies aspects of the plot title, can use
`plot.title = element_text(hjust = 0.5)` to centre the title


```r
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Bill Length (mm)",
y = "Bill Depth (mm)",
color = "Penguin Species") +
theme_bw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12))
```

## Exporting Plots

- `ggsave()` -- convenient function for saving a plot
- Unless specified, defaults to the last plot that was made.
- Uses the size of the current graphics device to determine the size of the
plot.


```r
plot1 <- penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~island, nrow = 1)

ggsave(path = "images/faceted_plot.png", plot = plot1)
```


Loading

0 comments on commit 8006633

Please sign in to comment.