From 87fb8c60ff5af76c01469667911633488678d096 Mon Sep 17 00:00:00 2001 From: "Zhian N. Kamvar" Date: Tue, 11 Jul 2023 06:45:34 -0700 Subject: [PATCH] delete instructor handouts --- instructors/data-visualisation-handout.Rmd | 280 --------------------- instructors/data-wrangling-handout.Rmd | 268 -------------------- instructors/intro-R-handout.Rmd | 196 --------------- instructors/starting-with-data-handout.Rmd | 220 ---------------- 4 files changed, 964 deletions(-) delete mode 100644 instructors/data-visualisation-handout.Rmd delete mode 100644 instructors/data-wrangling-handout.Rmd delete mode 100644 instructors/intro-R-handout.Rmd delete mode 100644 instructors/starting-with-data-handout.Rmd diff --git a/instructors/data-visualisation-handout.Rmd b/instructors/data-visualisation-handout.Rmd deleted file mode 100644 index e333fea6b..000000000 --- a/instructors/data-visualisation-handout.Rmd +++ /dev/null @@ -1,280 +0,0 @@ ---- -title: Code Handout - Data Visualisation with ggplot2 -output: - html_document: - df_print: paged - code_download: yes ---- - -This document contains all of the functions that we have covered thus far in the -course. It will be updated every week, after we've added new skills. Each -function is presented alongside an example of how it is used. - -All of the examples below are in the context of the Palmer Penguins, found -[here (link)](https://allisonhorst.github.io/palmerpenguins/index.html). - -```{r, include=FALSE} -knitr::opts_chunk$set(fig.width = 3, fig.height = 3, message = FALSE, warning = FALSE, eval = FALSE) -``` - -## Foundations of `ggplot()` - -- `ggplot()` -- a function to create the shell of a visualization, where - specific variables are mapped to different aspects of the plot - -```{r} -penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) -``` - -- `aes()` -- aesthetics that can be used when creating a `ggplot()`, where the - aesthetics can either be hard coded (e.g. `color = "blue"`) or associated with - a variable (e.g. `color = sex`). - - - The following are the aesthetic options for *most* plots: - - `x` - - `y` - - `alpha` -- changes transparency - - `color` -- produces colored outline - - `fill` -- fills with color - - `group` -- used with categorical variables, similar to color - -- **`+`** -- an important aspect creating a `ggplot()` is to note that the - `geom_XXX()` function is separated from the `ggplot()` function with a plus - sign, `+`. - - - `ggplot()` plots are constructed in series of layers, where the plus sign - separates these layers. - - Generally, the `+` sign can be thought of as the end of a line, so you - should always hit enter/return after it. While it is not mandatory to move - to the next line for each layer, doing so makes the code a lot easier to - organize and read. - -```{r, fig.width=6} -penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + - geom_point() -``` - -## Geometric Objects to Visualize the Data - -- `geom_histogram( )` -- adds a histogram to the plot, - where the observations are binned into ranges of values and then frequencies - of observations are plotted on the y-axis - - You can specify the number of bins you want with the `bins` argument - -```{r} -penguins %>% - ggplot(aes(x = bill_length_mm)) + - geom_histogram(bins = 20) -``` - -- `geom_boxplot( )` -- adds a boxplot to the plot, where observations are - aggregated (summarized), the min, Q1, median, Q3, and maximum are plotted as the - box and whiskers, and "outliers" are plotted as points. - - You can plot a vertical boxplot by specifying the `x` variable, or a - horizontal boxplot by specifying the `y` variable. - - Note: the min and max may not be included in the whiskers, if they are - deemed to be "outliers" based on the $1.5 \\times \\text{IQR}$ rule. - -```{r} -## Horizontal boxplot -penguins %>% - ggplot(aes(x = bill_length_mm)) + - geom_boxplot() - -## Vertical boxplot -penguins %>% - ggplot(aes(y = bill_length_mm)) + - geom_boxplot() -``` - -- `geom_density()` -- adds a density curve to the plot, where the probability - density is plotted on the y-axis (so the density curve has a total area of one). - - By default this creates a density curve without shading. By specifying a - color in the `fill` argument, the density curve is shaded. - - Can be thought of as the "one group" violin plot! - -```{r, warning=FALSE, message=FALSE} -penguins %>% - ggplot(aes(x = bill_length_mm)) + - geom_density(fill = "tomato") -``` - -- `geom_violin()` -- plots violins for each level of a categorical variable - - Can be thought of as a hybrid mix of `geom_boxplot()` and `geom_density()`, - as the density is displayed, but it is reflected to provide a plot similar in - nature to a boxplot. - - To obtain violins stacked vertically, declare the categorical variable as `y`. - To obtain side-by-side violins, declare the categorical variable as `x`. - -```{r} -## Stacked vertically -penguins %>% - ggplot(aes(x = bill_length_mm, y = species)) + - geom_violin() - -## Side-by-side -penguins %>% - ggplot(aes(y = bill_length_mm, x = species)) + - geom_violin() -``` - -- `geom_bar()` -- creates a barchart of a categorical variable - - Can produce stacked barcharts by specifying a variable as the `fill` - aesthetic. - - Can change from stacked barchart to a side-by-side barchart by specifying - `position = "dodge"`. - - If your data are already in counts (e.g. output from `count()`), then you - can specify the `stat = "identity"` argument inside `geom_bar()`. - -```{r} -## Stacked barchart -penguins %>% - ggplot(aes(x = species)) + - geom_bar(aes(fill = sex)) - -## Side-by-side barchart -penguins %>% - ggplot(aes(x = species)) + - geom_bar(aes(fill = sex), - position = "dodge") - -## If data are raw counts -penguins %>% - count(species, sex) %>% - ggplot(aes(x = species, y = n)) + - geom_bar(aes(fill = sex), - stat = "identity", - position = "dodge") -``` - -- `geom_point()` -- plots each observation as an (x, y) point, used to create - scatterplots - - Can use `alpha` to increase the transparency of the points, to reduce - overplotting. - - Can specify `aes`thetics inside of `geom_point()` for local aesthetics (point - level) or inside of `ggplot()` for global aesthetics (plot level) - -```{r} -penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) + - geom_point(aes(color = species)) -``` - -- `geom_jitter()` -- plots each observation as an (x, y) point and adds a small - amount of jitter around the point - - Useful so that we can see each point in the locations where there are - overlapping points. - - Can specify the `width` and `height` of the jittering using the optional - arguments. - -```{r} -penguins %>% - ggplot(aes(y = body_mass_g, x = species)) + - geom_violin() + - geom_jitter(aes(color = sex), width = 0.25, height = 0.25) -``` - -- `geom_smooth()` -- plots a line over a set of points, draws the readers eye - to a specific trend - - The methods we will use are "lm" for a linear model (straight line), and - "loess" for a wiggly line - - By default, the smoother gives you gray SE bars, to remove these add - `se = FALSE` - -```{r, fig.width=6} -penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + - geom_point() + - geom_smooth(method = "lm") -``` - -- `facet_wrap()` -- creates subplots of your original plot, based on the levels - of the variable you input - - To facet by one variable, use `~variable`. - - To facet by two variables, use `variable1 ~ variable2`. - - If you prefer for your facets to be organized in rows or columns, use the - `nrow` and/or `ncol` arguments. - -```{r, fig.width=12} -penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + - geom_point() + - geom_smooth(method = "lm") + - facet_wrap(~island, nrow = 1) -``` - -## Plot Characteristics - -- `labs()` -- specifies the plot labels, possible labels are: x, y, color, fill, - title, and subtitle - -```{r, fig.width=6} -penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + - geom_point() + - geom_smooth(method = "lm") + - labs(x = "Bill Length (mm)", - y = "Bill Depth (mm)", - color = "Penguin Species") -``` - -- `theme_bw()` -- changes the plotting background to the classic dark-on-light - ggplot2 theme. - - This theme may work better for presentations displayed with a projector. - - Other theme options are `theme_minimal()`, `theme_light()`, and `theme_void()`. - -```{r} -penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + - geom_point() + - geom_smooth(method = "lm") + - labs(x = "Bill Length (mm)", - y = "Bill Depth (mm)", - color = "Penguin Species") + - theme_bw() -``` - -- `theme()` -- - - Possible options are: - - `panel.grid` -- controls the grid lines (`panel.grid = element_blank()` - removes grid lines) - - `text` -- specifies font size for the entire plot (e.g. - `text = element_text(size = 16)` - - `axis.text.x` -- specifies the font size for the x-axis text - - `axis.text.y` -- specifies the font size for the y-axis text - - `plot.title` -- specifies aspects of the plot title, can use - `plot.title = element_text(hjust = 0.5)` to centre the title - -```{r} -penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + - geom_point() + - geom_smooth(method = "lm") + - labs(x = "Bill Length (mm)", - y = "Bill Depth (mm)", - color = "Penguin Species") + - theme_bw() + - theme(axis.text.x = element_text(size = 12), - axis.text.y = element_text(size = 12)) -``` - -## Exporting Plots - -- `ggsave()` -- convenient function for saving a plot - - Unless specified, defaults to the last plot that was made. - - Uses the size of the current graphics device to determine the size of the - plot. - -```{r} -plot1 <- penguins %>% - ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + - geom_point() + - geom_smooth(method = "lm") + - facet_wrap(~island, nrow = 1) - -ggsave(path = "images/faceted_plot.png", plot = plot1) -``` - - diff --git a/instructors/data-wrangling-handout.Rmd b/instructors/data-wrangling-handout.Rmd deleted file mode 100644 index 2b769e5c3..000000000 --- a/instructors/data-wrangling-handout.Rmd +++ /dev/null @@ -1,268 +0,0 @@ ---- -title: Code Handout - Data Wrangling with dplyr & tidyr -output: - html_document: - df_print: paged - code_download: yes ---- - -```{r, include=FALSE} -knitr::opts_chunk$set(fig.width = 3, fig.height = 3, message = FALSE, warning = FALSE, eval = FALSE) -``` - -This document contains all of the functions that we have covered thus far in the -course. It will be updated every week, after we've added new skills. Each -function is presented alongside an example of how it is used. - -All of the examples below are in the context of the Palmer Penguins, found -[here (link)](https://allisonhorst.github.io/palmerpenguins/index.html). - -## Packages - -- `library()` -- loads packages into your `R` session - -```{r, message=FALSE, warning=FALSE} -library(tidyverse) -library(palmerpenguins) -``` - -## Inspecting Data - -- `glimpse()` -- shows a summary of the dataset, the number of rows and columns, - variable names, and the first 10 entries of each variable - -```{r} -glimpse(penguins) -``` - -## Working with Data - -- `<-` -- "assignment arrow", assigns a value (vector, dataframe, single value) - to the name of a variable - -```{r} -penguins_2007 <- penguins %>% - filter(year == 2007) -``` - -- `c()` -- the "concatenate" function combines inputs to form a vector, the - values have to be the same data type. - -```{r} -cat_variables <- c("Species", "Island", "Sex") -``` - -\\newpage - -## Verbs of Data Wrangling - -- `select()` -- selects variables (columns) from a dataframe - -```{r} -penguins %>% -select(species) -``` - -- `filter()` -- filters observations (rows) out of / into a dataframe, where - the inputs (arguments) are the conditions to be satisfied in the data that are - kept - -```{r} -## It's nice to have a new line for each condition, so your code is easier to read! -penguins %>% -filter(species == "Adelie", - body_mass_g > 3000, - year == 2008) - -``` - -**Logical operators:** Filtering for certain observations (e.g. flights from a -particular airport) is often of interest in data frames where we might want to -examine observations with certain characteristics separately from the rest of -the data. To do so, you can use the `filter` function and a series of **logical -operators**. The most commonly used logical operators for data analysis are as -follows: - -- `==` means "equal to" - -- `!=` means "not equal to" - -- `>` or `<` means "greater than" or "less than" - -- `>=` or `<=` means "greater than or equal to" or "less than or equal to" - -- `mutate()` -- creates new variables or modifies existing variables - -```{r} -penguins %>% - filter(is.na(bill_length_mm) != TRUE, - is.na(bill_depth_mm) != TRUE) %>% - mutate(body_mass_kg = body_mass_g / 1000) -``` - -- `group_by()` -- groups the dataframe based on levels of a categorical variable, - usually used alongside `summarize()` - -```{r, eval=FALSE} -penguins %>% - group_by(island) -``` - -- summarize()`-- creates data summaries of variables in a dataframe, for grouped summaries use alongside`group\_by()\` - -```{r} -penguins %>% - filter(is.na(body_mass_g) != TRUE) %>% - group_by(island) %>% - summarize(mean_mass = mean(body_mass_g)) - -``` - -- `ungroup()` -- removes the grouping of a dataframe, typically used after group - summaries when additional ungrouped operations are required - -```{r} -penguins %>% - filter(is.na(body_mass_g) != TRUE) %>% - group_by(island) %>% - summarize(mean_mass = mean(body_mass_g)) %>% - ungroup() -``` - -- `arrange()` -- orders a dataframe based on the values of a numerical variable, - paired with `desc()` to order in descending order - -```{r} -penguins %>% - filter(is.na(body_mass_g) != TRUE) %>% - group_by(island) %>% - summarize(mean_mass = mean(body_mass_g)) %>% - arrange(desc(mean_mass)) -``` - -- `%>%` -- the "pipe" operator, joins sequences of data wrangling steps together, - works with any function that has `data = ` as the first argument - -```{r} -penguins %>% - select(species, island, body_mass_g, sex, year) %>% - filter(island == "Torgersen", - is.na(body_mass_g) != TRUE) %>% - group_by(species, year) %>% - summarize(mean_mass = mean(body_mass_g), - median_mass = median(body_mass_g), - observations = n()) %>% - arrange(desc(mean_mass)) -``` - -## Other Data Wrangling Tools - -- `count()` -- counts the number of observations (rows) of the different levels - of a categorical variable - - can add `sort = TRUE` to sort the table in descending order (similar to - using `arrange(desc())` ) - -```{r} -penguins %>% -count(species) -``` - -- `mean()` -- finds the mean of a numerical variable, not resistant to `NA` values, - so either filter out prior or use `na.omit = TRUE` argument - - - Other summary functions include: - - `var()` -- find the variance of a numerical variable - - `sd()` -- finds the standard deviation of a numerical variable - - `IQR()` -- find the innerquartile range (Q3 - Q1) of a numerical variable - - `median()` -- finds the median of a numerical variable - -- `is.na()` -- returns a vector of `TRUE` and `FALSE` values corresponding to - whether a particular row of a variable was `NA` (missing) - -```{r} -penguins %>% - mutate(missing_weight = is.na(body_mass_g)) -``` - -- `sample_n()` -- selects $n$ rows from the dataframe, based on the value of - `size` specified - -```{r} -penguins %>% - sample_n(size = 10) -``` - -- `replace_na()` -- replaces NA values with the value specified - - The values to be replaced must be passed to the function (input) as a - `list()` object. - -```{r} -penguins %>% - replace_na(list(bill_length_mm = "no_measurement", - bill_depth_mm = "no_measurement")) %>% - glimpse() -``` - -- `separate_rows()` -- separates a variable with multiple values based on the - delimiter specified. - - - Variables whose entries are stored as a list with commas or semicolons are - great candidates for this function! - -- `rowSums()` -- forms row sums for numeric variables - - - Note: In the lesson `rowSums()` was used on a `logical` variable, because - logical values can be numerically represented as 0 (FALSE) and 1 (TRUE) - -```{r} -x <- tibble(x1 = 3, x2 = c(4:1, 2:5)) -rowSums(x) -``` - -## Pivoting Dataframes - -- `pivot_wider()` -- transforms a dataframe from long to wide format - - takes three principal arguments: - 1. the data - 2. the *names\_from* column variable whose values will become new column names - 3. the *values\_from* column variable whose values will fill the new column - variables. - - Further arguments include `values_fill` which, if set, fills in missing - values with the value provided. - -```{r} -wide <- penguins %>% - mutate(island_logical = TRUE) %>% - pivot_wider(names_from = species, - values_from = island_logical, - values_fill = list(island_logical = FALSE)) - -glimpse(wide) -``` - -- `pivot_longer()` -- transforms a dataframe from wide to long format - - takes four principal arguments: - 1. the data - 2. *cols* are the names of the columns we use to fill the a new values variable - (or to drop). - 3. the *names\_to* column variable we wish to create from the *cols* provided. - 4. the *values\_to* column variable we wish to create and fill with values - associated with the *cols* provided. - -```{r} -wide %>% - pivot_longer(cols = Adelie:Gentoo, - names_to = "species", - values_to = "island_logical") -``` - -## Extracting Data - -- `write_csv()` -- writes a dataframe to a csv file, output into the file path - specified - -```{r} -write_csv(wide, path = "data/penguins_wide.csv") -``` - - diff --git a/instructors/intro-R-handout.Rmd b/instructors/intro-R-handout.Rmd deleted file mode 100644 index 37fc9d57b..000000000 --- a/instructors/intro-R-handout.Rmd +++ /dev/null @@ -1,196 +0,0 @@ ---- -title: Code Handout - Introduction to R -output: md_document ---- - -```{r, include=FALSE} -knitr::opts_chunk$set(fig.width = 3, fig.height = 3, message = FALSE, warning = FALSE, eval = FALSE) -``` - -This document contains all of the functions that were covered in the -*Introduction to R* workshop. Each function is presented alongside an example of -how it can be used. - -## Creating Objects - -- `<-` -- "assignment arrow", assigns a value (vector, dataframe, single value) - to the name of a variable - -```{r} - -x <- 3 -y <- c(1, 2, 3) -z <- x + y -``` - -- `c()` -- the "concatenate" function combines inputs to form a vector, the - values have to be the same data type. - -```{r} -animals <- c("bird", "cat", "dog") -numbers <- c(1, 14, 57, 89) -logicals <- c(TRUE, FALSE, TRUE, TRUE) -``` - -## Inspecting Objects - -- `str()` -- compact display of the structure of an R object - -```{r} -str(animals) -``` - -- `class()` -- returns the type of element of any R object - -```{r} -class(logicals) -``` - -- `typeof()` -- returns the data type or storage mode of any R object - -```{r} -typeof(numbers) -``` - -## Functions in R - -- `args()` -- returns the arguments of a function - -```{r} -args(round) -``` - -- named arguments -- the name of the argument the function expects - - You can choose to not name your arguments, **if** you know the **exact** - order they should be in! - - However, we generally discourage this. - -```{r} -## Either of these work, since the digits argument is named explicitly. -round(3.14159, digits = 2) -round(digits = 2, 3.14159) - -## This does not work, since the arguments are not named and in the incorrect order. -round(2, 3.14159) -``` - -## Functions to Summarize Data - -- `sqrt()` -- returns the square root of a numeric variable - -```{r} -sqrt(numbers) -``` - -- `mean()` -- returns the mean of a numeric variable - - You can add the `na.rm` argument, to remove `NA` values before calculating - the mean. - -```{r} -sqrt(numbers) -``` - -- `max()` -- returns the maximum of a numeric variable - - You can add the `na.rm` argument, to remove `NA` values before calculating - the max. - -```{r} -sqrt(numbers) -``` - -- `sum()` -- returns the sum of a numeric variable - - You can add the `na.rm` argument, to remove `NA` values before calculating - the sum. - -```{r} -sqrt(numbers) -``` - -- `length()` -- returns the length of a vector (of any datatype) - -```{r} -length(animals) -``` - -## Subsetting Data - -- `[]` -- used to subset elements from a vector - -```{r} -animals[3] -## selects the third element - -animals[2:3] -## selects the second and third element - -animals[c(1, 3)] -## selects the first and third element -``` - -- relational operators -- return logical values indicating where a relation is - satisfied. The most commonly used logical operators for data analysis are as follows: - - `==` means "equal to" - - `!=` means "not equal to" - - `>` or `<` means "greater than" or "less than" - - `>=` or `<=` means "greater than or equal to" or "less than or equal to" - -```{r} -animals == "dog" - -animals != "cat" - -numbers > 4 - -numbers <= 12 -``` - -- logical operators -- join subset criteria together - - `&` means "and" -- where two criteria must **both** be satisfied - - `|` means "or" -- where at least one criteria must be satisfied - -```{r} -numbers > 4 & numbers < 20 - -animals == "dog" | animals == "cat" -``` - -- `%in%` -- the "inclusion operator", allows you to test if any of the elements - of a search vector (on the left hand side) are found in the target vector (on - the right hand side). - - The levels of the target vector must be included in a vector (`c()`). - -```{r} -possessions <- c("car", "bicycle", "radio", "television", "mobile_phone") - -possessions %in% c("car", "bicycle", "motorcycle") -``` - -## Missing Data - -- `is.na()` -- returns a vector of logical values indicating which elements of - a vector have `NA` values - - Often combined with `!`, where the `!` negates the previous statement (e.g. - `!TRUE` is equal to `FALSE`). - -```{r} -missing <- c(1, 3, NA, 7, 12, NA) - -is.na(missing) - -!is.na(missing) -``` - -- `na.omit()` -- removes the observations with `NA` values - -```{r} -na.omit(missing) -``` - -- `complete.cases()` -- returns a vector of logical values indicating which - elements of a vector **are not** missing (`NA`) values - -```{r} -complete.cases(missing) -``` - - diff --git a/instructors/starting-with-data-handout.Rmd b/instructors/starting-with-data-handout.Rmd deleted file mode 100644 index fea985324..000000000 --- a/instructors/starting-with-data-handout.Rmd +++ /dev/null @@ -1,220 +0,0 @@ ---- -title: Code Handout - Starting with Data -output: - html_document: - df_print: paged - code_download: yes ---- - -```{r, include=FALSE} -knitr::opts_chunk$set(fig.width = 3, fig.height = 3, message = FALSE, warning = FALSE, eval = FALSE) -``` - -This document contains all of the functions that were covered in the -*Introduction to R* workshop. Each function is presented alongside an example of -how it can be used. - -All of the examples below are in the context of the Palmer Penguins, found -[here (link)](https://allisonhorst.github.io/palmerpenguins/index.html). - -## Packages - -- `library()` -- loads packages into your `R` session - -```{r, message=FALSE, warning=FALSE} -library(tidyverse) -library(lubridate) -``` - -## Importing Data - -- `read_csv()` -- function to import a csv file. - - First argument is the path to the data, passed as a character - (inside quotations). - - You can specify what values should be considered missing, using the `na` - argument. - -```{r} -penguins <- read_csv("data/penguins.csv") -``` - -## Inspecting Data - -- `dim()` - returns a vector with the number of rows as the first element, - and the number of columns as the second element (the **dim**ensions of - the object) - -```{r} -dim(penguins) -``` - -- `nrow()` - returns the number of rows -- `ncol()` - returns the number of columns - -```{r} -nrow(penguins) -ncol(penguins) -``` - -- `head()` - displays the first 6 rows of the dataframe -- `tail()` - displays the last 6 rows of the dataframe - -```{r} -head(penguins) -tail(penguins) -``` - -- `names()` - returns the all of the names of an object (both row and column) -- `colnames()` - returns column names for dataframes (without row names) - -```{r} -names(penguins) -colnames(penguins) -``` - -- `glimpse()` - provides a preview of the data, where column names are presented - with their associated data types, and the entries from each column are printed - in each row - -```{r} -glimpse(penguins) -``` - -- `str()` - returns the structure of the object and information about the class, - the names and data types of each column, and a preview of the first entries of - each column - -```{r} -str(penguins) -``` - -- `summary()` - provides summary statistics for each column - - Note: summary statistics for character variables are not meaningful, as they - simply state the number of observations (length) of the variable - -```{r} -summary(penguins) -``` - -## Subsetting Data - -- `[]` -- selects rows and columns from a dataframe - - The first entry is the row number, the second entry is the column number(s), - and they are separated with a comma. - -```{r} -## Selects the element in the first row, second column -penguins[1, 2] - -## Selects every element in the fourth row -penguins[4, ] - -## Selects every element in the third column -penguins[, 3] -``` - -- `[[]]` -- selects a column from a dataframe - - Inside the brackets you can pass either the number of the column or the - name of the column (in quotations) - -```{r} -penguins[[1]] - -penguins[["island"]] -``` - -- `$` -- selects a column from a dataframe, where the name of the dataframe is - on the left and the name of the column is on the right - -```{r} -penguins$body_mass_g -``` - -## Working with Different Data Types - -- `factor()` -- creates a categorical variable from a character or numeric - variable, variable has a factor datatype - - the values (level) of the factor levels is specified in the `levels` - argument, where the levels must be specified in a vector (using `c()`) - - Note: the order you wish for the levels to appear is how you should list - them in the `levels` argument, you can also specify `ordered = TRUE` to - ensure the levels remain in this order - -```{r} -penguins$year_fct <- factor(penguins$year, - levels = c("2007", "2008", "2009"), - ordered = TRUE) -``` - -- `as.factor()` -- creates a categorical variable from a character or numeric - variable, variable has a factor datatype - - does not allow for you to specify the order of the levels - - defaults to alphabetical ordering for factor levels - -```{r} -penguins$year_fct <- as.factor(penguins$year) -``` - -- `levels()` -- returns the levels of a variable with a factor datatype, in the - order they were stored - - Note: this function will not work for character datatypes - -```{r} -levels(penguins$year_fct) -``` - -- `nlevels()` -- returns the number of levels of a variable with a factor - datatype - - Note: this function will not work for character datatypes - -```{r} -nlevels(penguins$year_fct) -``` - -- `as.character()` -- creates a character variable from a numeric or factor - variable - -```{r} -penguins$species_chr <- as.character(penguins$species) -``` - -- `ymd()` -- transforms dates stored as character or numeric variables to dates - - Note: to use this function, dates must be stored in year-month-day format - - The function does well with heterogeneous formats (as seen below), but - formats where some of the entries are not in double digits may not be parsed - correctly. - -```{r} -x <- c("2009-01-01", "2009-01-02", "2009-01-03") -ymd(x) -``` - -- `day()` -- extracts the day (number) of a date variable - -```{r} -day(x) -``` - -- `month()` -- extracts the month (number) of a date variable - -```{r} -month(x) -``` - -- `year()` -- extracts the year of a date variable - -```{r} -year(x) -``` - -## Visualizing Data - -- `plot()` -- a generic function for plotting R objects - - In this lesson `plot()` was used to create bargraphs of categorical - variables. - -```{r} -plot(penguins$species) -``` - -