Skip to content

Commit

Permalink
Add ggplot exercises
Browse files Browse the repository at this point in the history
  • Loading branch information
leonjessen committed Nov 1, 2018
1 parent e179bc3 commit 46d266f
Show file tree
Hide file tree
Showing 2 changed files with 83 additions and 23 deletions.
42 changes: 19 additions & 23 deletions 04_ggplot/exercises/ggplot_exercises.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,7 @@ Ok, so now it is your turn!

_Remember, for the following exercises, inspiration for the code is available in the slides for this session_

Since we now know how to load data and we are to work with how we manipulate (wrangle) data, now would be a good time for you to make your first script (small program).

- In the upper right corner of RStudio, there is a small icon looking like a piece of paper with a green plus on it. Click it and choose the first option "R Script" (You can also use the short cut as described, on a mac it is command+shift+n). This will open a new empty text file in the RStudio editor, which we can put code into, save it and run it

- A recipe for a script could look like the following. Copy/paste the code below into your new empty script file (Note, the '#' means the line is ignored, so we use this for commenting our code)
- Recall, that your basic script looks like this

```{r, message=FALSE}
# Clear workspace
Expand All @@ -40,37 +36,37 @@ library('tidyverse')
```

- Click the save icon and save your new script as "my_script.R"

- In RStudio in the icon line just above your script file, there is an icon again looking like a piece of paper, with a blue arrow and the word "Source". Click "Source", this will run each of the lines in the script (ignoring lines beginning with '#'), so for now, it will simply clear the workspace of any variables and load the Tidyverse library every time you "Source" the script
- Try to write in the code for your plots under the 'Visualise data' line

### Question 1

1. In the slides for this dplyr session, find where we load the kommune data directly from the web and write the command in the Console (Hint: All readr read-a-file-functions, start with 'read_')
2. In the slides for the readr session, find where we write a data set as a tab-separated-values file and save the kommune data to disk as 'kommune_data.tsv', write the command in the Console (Hint: All readr write-a-file-functions, start with 'write_'
3. In your script under 'Load data' write the command for loading your data file 'kommune_data.tsv' from disk and save it in a variable called `my_data` (Hint: This is completely analogue to reading a file from the web)
4. Source the script and in the Console, simply write `my_data` and hit return
1. Use the kommune data and the `count()` function to count the number of municipalities in the different regions

__Q1__ How many rows and columns are in the kommune data set?
__Q1A__ Which region has the fewest number of municipalities?

__Q1B__ Make a barplot of the number of municipalities in each region

### Question 2

1. Under 'Wrangle data' in your script, using the `mutate()` function write the command for calculating a new variable `inc_exp_ratio`, which is the ratio between `Indt_indkskat` and `Folkeskudg_elev`, i.e. `Indt_indkskat` divided by `Folkeskudg_elev` (Remember to save the result to your `my_data` variable)
2. In the Concole, write `my_data` and hit return
1. Using the kommune data, make a scatter plot of `Pct_prv_sk_elev` as a function of `Videreg_udd`

__Q2__ What is the value of this new variable for 'Koebenhavn'?
__Q2__ Are people with high education more likely to send their kids to private school?

### Question 3

1. Under 'Wrangle data' in your script, using the `filter()` function, write the command for identifying all the municipalities, where the value of your new variable is larger than 1
2. Under 'Wrangle data' in your script, using the `filter()` function, write the command for identifying all the municipalities, where more than half have a long eduction and less than 1 in 5 pupils attend private scool

__Q3A__ How many municipalities have a ratio larger than 1?
1. Using the kommune data, make a boxplot of the distribution of `Folkeskudg_elev` per `Region`

__Q3B__ In how many municipalities do more than half have a long eduction and less than 1 in 5 pupils attend private scool?
__Q3__ Which `Region` has the highest median expense per public school pupil?

### Question 4

1. Under 'Wrangle data' in your script, using the `group_by()`, `summarise()` and `arrange` functions, write the command for identifying calculating the average % of students attending private school stratified on `Region` and sort them be falling values (largest first, smallest last)
1. Make a scatter plot of `Folkeskudg_elev` as a function of `Indt_indkskat` and add a linear model

__Q4__ What is the trend? Are rich municipalities spending more or less per public school pupil, compared to municipalities with a lower average income?

### Question 5

1. `R` has different data sets build in `diamonds`, `mtcars`, `iris`, `starwars` to name some

__Q5__ Choose a data set and make a nice visualisation, see if you can ask a question from the data and find the answer using visualisation

__Q4__ What is the order of Regions?
64 changes: 64 additions & 0 deletions 04_ggplot/exercises/ggplot_exercises.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Exercises: Visualising data (ggplot)
================

Ok, so now it is your turn!

*Remember, for the following exercises, inspiration for the code is available in the slides for this session*

- Recall, that your basic script looks like this

``` r
# Clear workspace
# ------------------------------------------------------------------------------
rm(list=ls())

# Load libraries
# ------------------------------------------------------------------------------
library('tidyverse')

# Load data (session 2 - readr)
# ------------------------------------------------------------------------------

# Wrangle data (session 3 - dplyr)
# ------------------------------------------------------------------------------

# Visualise data (session 4 - ggplot)
# ------------------------------------------------------------------------------

# Write data (session 2 - readr)
# ------------------------------------------------------------------------------
```

- Try to write in the code for your plots under the 'Visualise data' line

### Question 1

1. Use the kommune data and the `count()` function to count the number of municipalities in the different regions

**Q1A** Which region has the fewest number of municipalities?

**Q1B** Make a barplot of the number of municipalities in each region

### Question 2

1. Using the kommune data, make a scatter plot of `Pct_prv_sk_elev` as a function of `Videreg_udd`

**Q2** Are people with high education more likely to send their kids to private school?

### Question 3

1. Using the kommune data, make a boxplot of the distribution of `Folkeskudg_elev` per `Region`

**Q3** Which `Region` has the highest median expense per public school pupil?

### Question 4

1. Make a scatter plot of `Folkeskudg_elev` as a function of `Indt_indkskat` and add a linear model

**Q4** What is the trend? Are rich municipalities spending more or less per public school pupil, compared to municipalities with a lower average income?

### Question 5

1. `R` has different data sets build in `diamonds`, `mtcars`, `iris`, `starwars` to name some

**Q5** Choose a data set and make a nice visualisation, see if you can ask a question from the data and find the answer using visualisation

0 comments on commit 46d266f

Please sign in to comment.