-
Notifications
You must be signed in to change notification settings - Fork 128
/
README.Rmd
95 lines (70 loc) · 3.56 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
# forcats <img src='man/figures/logo.png' align="right" height="139" />
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/forcats)](https://cran.r-project.org/package=forcats)
[![R-CMD-check](https://github.com/tidyverse/forcats/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/forcats/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/forcats/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/forcats?branch=main)
<!-- badges: end -->
## Overview
R uses __factors__ to handle categorical variables, variables that have a fixed and known set of possible values. Factors are also helpful for reordering character vectors to improve display. The goal of the __forcats__ package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values. Some examples include:
* `fct_reorder()`: Reordering a factor by another variable.
* `fct_infreq()`: Reordering a factor by the frequency of values.
* `fct_relevel()`: Changing the order of a factor by hand.
* `fct_lump()`: Collapsing the least/most frequent values of a factor into "other".
You can learn more about each of these in `vignette("forcats")`. If you're new to factors, the best place to start is the [chapter on factors](https://r4ds.hadley.nz/factors.html) in R for Data Science.
## Installation
```
# The easiest way to get forcats is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just forcats:
install.packages("forcats")
# Or the the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/forcats")
```
## Cheatsheet
<a href="https://raw.githubusercontent.com/rstudio/cheatsheets/main/factors.pdf"><img src="https://github.com/rstudio/cheatsheets/raw/main/pngs/thumbnails/forcats-cheatsheet-thumbs.png" width="320" height="252"/></a>
## Getting started
forcats is part of the core tidyverse, so you can load it with `library(tidyverse)` or `library(forcats)`.
```{r setup, message = FALSE}
library(forcats)
library(dplyr)
library(ggplot2)
```
```{r}
starwars %>%
filter(!is.na(species)) %>%
count(species, sort = TRUE)
```
```{r}
starwars %>%
filter(!is.na(species)) %>%
mutate(species = fct_lump(species, n = 3)) %>%
count(species)
```
```{r unordered-plot}
ggplot(starwars, aes(x = eye_color)) +
geom_bar() +
coord_flip()
```
```{r ordered-plot}
starwars %>%
mutate(eye_color = fct_infreq(eye_color)) %>%
ggplot(aes(x = eye_color)) +
geom_bar() +
coord_flip()
```
## More resources
For a history of factors, I recommend [_stringsAsFactors: An unauthorized biography_](https://simplystats.github.io/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng and [_stringsAsFactors = \<sigh\>_](https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend [_Wrangling categorical data in R_](https://peerj.com/preprints/3163/), by Amelia McNamara and Nicholas Horton.
## Getting help
If you encounter a clear bug, please file a minimal reproducible example on [Github](https://github.com/tidyverse/forcats/issues). For questions and other discussion, please use [community.rstudio.com](https://community.rstudio.com/).