diff --git a/04_ggplot/exercises/ggplot_exercises.Rmd b/04_ggplot/exercises/ggplot_exercises.Rmd index 90969e1..027f3d8 100644 --- a/04_ggplot/exercises/ggplot_exercises.Rmd +++ b/04_ggplot/exercises/ggplot_exercises.Rmd @@ -11,11 +11,7 @@ Ok, so now it is your turn! _Remember, for the following exercises, inspiration for the code is available in the slides for this session_ -Since we now know how to load data and we are to work with how we manipulate (wrangle) data, now would be a good time for you to make your first script (small program). - -- In the upper right corner of RStudio, there is a small icon looking like a piece of paper with a green plus on it. Click it and choose the first option "R Script" (You can also use the short cut as described, on a mac it is command+shift+n). This will open a new empty text file in the RStudio editor, which we can put code into, save it and run it - -- A recipe for a script could look like the following. Copy/paste the code below into your new empty script file (Note, the '#' means the line is ignored, so we use this for commenting our code) +- Recall, that your basic script looks like this ```{r, message=FALSE} # Clear workspace @@ -40,37 +36,37 @@ library('tidyverse') ``` -- Click the save icon and save your new script as "my_script.R" - -- In RStudio in the icon line just above your script file, there is an icon again looking like a piece of paper, with a blue arrow and the word "Source". Click "Source", this will run each of the lines in the script (ignoring lines beginning with '#'), so for now, it will simply clear the workspace of any variables and load the Tidyverse library every time you "Source" the script +- Try to write in the code for your plots under the 'Visualise data' line ### Question 1 -1. In the slides for this dplyr session, find where we load the kommune data directly from the web and write the command in the Console (Hint: All readr read-a-file-functions, start with 'read_') -2. In the slides for the readr session, find where we write a data set as a tab-separated-values file and save the kommune data to disk as 'kommune_data.tsv', write the command in the Console (Hint: All readr write-a-file-functions, start with 'write_' -3. In your script under 'Load data' write the command for loading your data file 'kommune_data.tsv' from disk and save it in a variable called `my_data` (Hint: This is completely analogue to reading a file from the web) -4. Source the script and in the Console, simply write `my_data` and hit return +1. Use the kommune data and the `count()` function to count the number of municipalities in the different regions -__Q1__ How many rows and columns are in the kommune data set? +__Q1A__ Which region has the fewest number of municipalities? + +__Q1B__ Make a barplot of the number of municipalities in each region ### Question 2 -1. Under 'Wrangle data' in your script, using the `mutate()` function write the command for calculating a new variable `inc_exp_ratio`, which is the ratio between `Indt_indkskat` and `Folkeskudg_elev`, i.e. `Indt_indkskat` divided by `Folkeskudg_elev` (Remember to save the result to your `my_data` variable) -2. In the Concole, write `my_data` and hit return +1. Using the kommune data, make a scatter plot of `Pct_prv_sk_elev` as a function of `Videreg_udd` -__Q2__ What is the value of this new variable for 'Koebenhavn'? +__Q2__ Are people with high education more likely to send their kids to private school? ### Question 3 -1. Under 'Wrangle data' in your script, using the `filter()` function, write the command for identifying all the municipalities, where the value of your new variable is larger than 1 -2. Under 'Wrangle data' in your script, using the `filter()` function, write the command for identifying all the municipalities, where more than half have a long eduction and less than 1 in 5 pupils attend private scool - -__Q3A__ How many municipalities have a ratio larger than 1? +1. Using the kommune data, make a boxplot of the distribution of `Folkeskudg_elev` per `Region` -__Q3B__ In how many municipalities do more than half have a long eduction and less than 1 in 5 pupils attend private scool? +__Q3__ Which `Region` has the highest median expense per public school pupil? ### Question 4 -1. Under 'Wrangle data' in your script, using the `group_by()`, `summarise()` and `arrange` functions, write the command for identifying calculating the average % of students attending private school stratified on `Region` and sort them be falling values (largest first, smallest last) +1. Make a scatter plot of `Folkeskudg_elev` as a function of `Indt_indkskat` and add a linear model + +__Q4__ What is the trend? Are rich municipalities spending more or less per public school pupil, compared to municipalities with a lower average income? + +### Question 5 + +1. `R` has different data sets build in `diamonds`, `mtcars`, `iris`, `starwars` to name some + +__Q5__ Choose a data set and make a nice visualisation, see if you can ask a question from the data and find the answer using visualisation -__Q4__ What is the order of Regions? diff --git a/04_ggplot/exercises/ggplot_exercises.md b/04_ggplot/exercises/ggplot_exercises.md new file mode 100644 index 0000000..3f7bb39 --- /dev/null +++ b/04_ggplot/exercises/ggplot_exercises.md @@ -0,0 +1,64 @@ +Exercises: Visualising data (ggplot) +================ + +Ok, so now it is your turn! + +*Remember, for the following exercises, inspiration for the code is available in the slides for this session* + +- Recall, that your basic script looks like this + +``` r +# Clear workspace +# ------------------------------------------------------------------------------ +rm(list=ls()) + +# Load libraries +# ------------------------------------------------------------------------------ +library('tidyverse') + +# Load data (session 2 - readr) +# ------------------------------------------------------------------------------ + +# Wrangle data (session 3 - dplyr) +# ------------------------------------------------------------------------------ + +# Visualise data (session 4 - ggplot) +# ------------------------------------------------------------------------------ + +# Write data (session 2 - readr) +# ------------------------------------------------------------------------------ +``` + +- Try to write in the code for your plots under the 'Visualise data' line + +### Question 1 + +1. Use the kommune data and the `count()` function to count the number of municipalities in the different regions + +**Q1A** Which region has the fewest number of municipalities? + +**Q1B** Make a barplot of the number of municipalities in each region + +### Question 2 + +1. Using the kommune data, make a scatter plot of `Pct_prv_sk_elev` as a function of `Videreg_udd` + +**Q2** Are people with high education more likely to send their kids to private school? + +### Question 3 + +1. Using the kommune data, make a boxplot of the distribution of `Folkeskudg_elev` per `Region` + +**Q3** Which `Region` has the highest median expense per public school pupil? + +### Question 4 + +1. Make a scatter plot of `Folkeskudg_elev` as a function of `Indt_indkskat` and add a linear model + +**Q4** What is the trend? Are rich municipalities spending more or less per public school pupil, compared to municipalities with a lower average income? + +### Question 5 + +1. `R` has different data sets build in `diamonds`, `mtcars`, `iris`, `starwars` to name some + +**Q5** Choose a data set and make a nice visualisation, see if you can ask a question from the data and find the answer using visualisation