-
Notifications
You must be signed in to change notification settings - Fork 0
/
arrange.Rmd
192 lines (151 loc) · 4.8 KB
/
arrange.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: ""
author: ""
date: ""
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, df_print="info_paged")
```
```{r, echo=FALSE}
info_paged_print <- function(x, options) {
tibble_info <- paste0("<div class=\"tibble-info\">A tibble: ", nrow(x), " x ", ncol(x), "</div>")
group_info <- paste0("<div class=\"group-info\">Groups: ",
paste0(group_vars(x), collapse = ", "),
" [", nrow(group_keys(x)), "]", "</div>")
if (dplyr::is_grouped_df(x)) {
tab_info <- paste0("<div class=\"info\">", tibble_info, " ", group_info, "</div>")
cat(tab_info)
} else {
cat(paste0("<div class=\"info\">", tibble_info, "</div>"))
}
knitr::asis_output(
rmarkdown:::paged_table_html(x, options = attr(x, "options")),
meta = list(
dependencies = rmarkdown:::html_dependency_pagedtable()
)
)
}
knitr::opts_hooks$set(df_print = function(options) {
if (options$df_print == "info_paged") {
options$render = info_paged_print
options$comment = ""
options$results = "asis"
}
options
})
```
```{css, echo=FALSE}
.tibble-info,
.group-info {
display: inline-block;
padding: 15px;
}
.info {
margin-top: 5px;
margin-bottom: 5px;
border: 1px solid #ccc;
border-radius: 4px;
font-weight: 600;
color: #999898;
}
```
```{r, message = FALSE, echo=FALSE}
library(dplyr)
library(readxl)
df <- read_excel(here::here("online_retail_II.xlsx"))
```
data-masking
# - *fundamentals*
`arrange()` reorders the rows of a data frame by the values of one or several columns.
In case of numerical ones, the default puts the smallest values first.
```{r}
df %>%
arrange(Quantity)
```
We need to use a minus (`-`) or to wrap the column in `desc()` if we are interested in seeing the largest ones on top.
```{r}
df %>%
arrange(-Quantity)
df %>%
arrange(desc(Quantity))
```
The documentation doesn't specify what happens in case of ties but we can assume that rows with a smaller row index come first, preserving their original row order then.
In case of character columns, the order is alphabetical as defined by the default "C" locale.
```{r}
df %>%
arrange(Description)
df %>%
arrange(desc(Description))
```
# - *.locale*
We can change it to the locale of our choice with the `.locale` argument.
```{r}
df %>%
arrange(Description, .locale = "en")
```
In case we want to arrange character columns by a custom order, it is possible by transforming them in ordered factors.
```{r}
df %>%
mutate(Description_Factor = factor(Description, levels = sample(unique(df$Description),
length(unique(df$Description))), ordered = TRUE)) %>%
arrange(Description_Factor)
```
`arrange()` is a data-masking function, so it accepts expressions as input.
```{r}
df %>%
arrange(Quantity ^ 2)
df %>%
arrange(min_rank(Quantity))
```
NAs are placed at the bottom, but we can have them on top by using this syntax (that exploits the data-masking nature of `arrange()` putting the FALSEs, that are equal to 0, first).
```{r}
df %>%
arrange(!is.na(Description))
```
If we use more columns, the subsequent ones will be used to resolve ties (notice how rows 2 and 3 swap place in the second line of code).
```{r}
df %>%
arrange(desc(Quantity))
df %>%
arrange(desc(Quantity), StockCode)
```
# - *with group_by()*
`arrange()` ignores grouping. In this way we can have rows from different groups at the top.
```{r}
df %>%
group_by(Description) %>%
arrange(Quantity)
```
In case we want to order by the grouping column as well, we need to add it, before or after the other columns as the situation requires.
```{r}
df %>%
group_by(Description) %>%
arrange(Quantity, Description)
df %>%
group_by(Description) %>%
arrange(Description, Quantity)
```
## - *.by_group*
Instead of specifying the grouping column, we can set the `.by_group` argument to TRUE.
This will use the grouping column as the first one specified, like in the previous example.
```{r}
df %>%
group_by(Description) %>%
arrange(Quantity, .by_group = TRUE)
```
## - *using expressions*
If we use an expression within `arrange()`, that will be evaluated on all the data frame and non per groups.
Therefore in the following example the verb doesn't modify the rows' order as `sum(Quantity)` returns the same value for all the rows.
```{r}
df %>%
group_by(Description) %>%
arrange(desc(sum(Quantity)))
```
So we need to be explicit with a `mutate()` call and use the resulting additional column to arrange by.
```{r}
df %>%
group_by(Description) %>%
mutate(Total_Quantity = sum(Quantity)) %>%
arrange(desc(Total_Quantity))
```