Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create copy of plotting data #524

Merged
merged 4 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion episodes/04-tidyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@

This graphic visually represents the three rules that define a "tidy" dataset:

![](fig/tidy-data-wickham.png)

Check warning on line 61 in episodes/04-tidyr.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/tidy-data-wickham.png
*R for Data Science*, Wickham H and Grolemund G ([https://r4ds.had.co.nz/index.html](https://r4ds.had.co.nz/index.html))
© Wickham, Grolemund 2017
This image is licenced under Attribution-NonCommercial-NoDerivs 3.0 United States (CC-BY-NC-ND 3.0 US)
Expand Down Expand Up @@ -128,7 +128,7 @@
think! The gif below shows how these two formats relate to each other, and
gives you an idea of how we can use R to shift from one format to the other.

![](fig/tidyr-pivot_wider_longer.gif)

Check warning on line 131 in episodes/04-tidyr.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/tidyr-pivot_wider_longer.gif
Long and wide dataframe layouts mainly affect readability. You may find that
visually you may prefer the "wide" format, since you can see more of the data on
the screen. However, all of the R functions we have used thus far expect for
Expand Down Expand Up @@ -435,7 +435,7 @@
this data frame to our `data_output` directory.

```{r, purl=FALSE, eval=FALSE}
write_csv (interviews_plotting, file = "data_output/interviews_plotting.csv")
write_csv(interviews_plotting, file = "data_output/interviews_plotting.csv")
```

```{r, purl=FALSE, eval=TRUE, echo=FALSE}
Expand Down
30 changes: 25 additions & 5 deletions episodes/05-ggplot2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,16 @@ source("data/download_data.R")

:::: instructor

- This lesson is a broad overview of ggplot2 and focuses on (1) getting familiar
with the layering system of ggplot2, (2) using the argument `group` in the
`aes()` function, (3) basic customization of the plots.
- This episode is a broad overview of ggplot2 and focuses on (1) getting
familiar with the layering system of ggplot2, (2) using the argument `group`
in the `aes()` function, (3) basic customization of the plots.
- The episode depends on data created in the Data Wrangling with tidyr
episode. If you did not get to or through all of the tidyr episode,
you can have the learners access the data by either downloading it or
quickly creating it using the tidyr code below. You will probably want to
copy the code into the Etherpad.
- If you did skip the tidyr episode, you might want to go over the exporting
data section in that episode.

::::::::::::

Expand Down Expand Up @@ -50,10 +57,21 @@ interviews_plotting <- read_csv("data_output/interviews_plotting.csv")
```

If you were unable to complete the previous lesson or did not save the data,
then you can create it now.
then you can create it now. Either download it using `read_csv()` (Option 1)
or create it with the **dplyr** and **tidyr** code (Option 2).

::: tab

### Option 1: Download the data

```{r, purl=FALSE, eval=FALSE}
## Not run, but can be used to load in data from previous lesson!
interviews_plotting <- read_csv("https://raw.githubusercontent.com/datacarpentry/r-socialsci/main/episodes/data/interviews_plotting.csv")
```

### Option 2: Create the data

```{r, purl=FALSE, eval=FALSE}
## Can be used to load in data from previous lesson!
interviews_plotting <- interviews %>%
## pivot wider by items_owned
separate_rows(items_owned, sep = ";") %>%
Expand All @@ -74,6 +92,8 @@ interviews_plotting <- interviews %>%
mutate(number_items = rowSums(select(., bicycle:car)))
```

:::

## Plotting with **`ggplot2`**

**`ggplot2`** is a plotting package that makes it simple to create complex plots
Expand Down
34 changes: 31 additions & 3 deletions episodes/data/download_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,44 @@ if (!dir.exists("data"))
if (! file.exists("data/SAFI_clean.csv")) {
download.file("https://ndownloader.figshare.com/files/11492171",
"data/SAFI_clean.csv", mode = "wb")

# Clean data
df <- read.csv("data/SAFI_clean.csv",
stringsAsFactors = FALSE)

# Remove white space
df$respondent_wall_type <- trimws(df$respondent_wall_type, which = "both")
# Replace duplicate ids
df[[2, 1]] <- 2
df[[53, 1]] <- 53

write.csv(df, "data/SAFI_clean.csv", row.names = FALSE)
}

# Plotting data -----------------------------------------------------------

# Create plotting data for ggplot episode
library(tidyr)
library(dplyr)

if (! file.exists("data/interviews_plotting.csv")) {
# Copy code from ggplot episode to create data
interviews_plotting <- df %>%
separate_rows(items_owned, sep = ";") %>%
replace_na(list(items_owned = "no_listed_items")) %>%
mutate(items_owned_logical = TRUE) %>%
pivot_wider(names_from = items_owned,
values_from = items_owned_logical,
values_fill = list(items_owned_logical = FALSE)) %>%
separate_rows(months_lack_food, sep = ";") %>%
mutate(months_lack_food_logical = TRUE) %>%
pivot_wider(names_from = months_lack_food,
values_from = months_lack_food_logical,
values_fill = list(months_lack_food_logical = FALSE)) %>%
mutate(number_months_lack_food = rowSums(select(., Jan:May))) %>%
mutate(number_items = rowSums(select(., bicycle:car)))

write.csv(interviews_plotting, "data/interviews_plotting.csv", row.names = FALSE)
}


Loading
Loading