-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
431 lines (316 loc) · 9.79 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
---
title:
output: github_document
---
```{r setup, echo = FALSE}
knitr::opts_chunk$set(
warning = FALSE,
echo = TRUE,
collapse = TRUE,
message = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
<!-- badges: start -->
[![R-CMD-check](https://github.com/NicChr/timeplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/NicChr/timeplyr/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/timeplyr)](https://CRAN.R-project.org/package=timeplyr)
<!-- badges: end -->
# timeplyr
# **Fast Tidy Tools for Date and Datetime Manipulation**
This package provides a set of functions to make working with date and datetime data much easier!
While most time-based packages are designed to work
with clean and pre-aggregate data, timeplyr contains a set of tidy tools to
complete, expand and summarise both raw and aggregate date/datetime data.
Significant efforts have been made to ensure that grouped calculations are
fast and efficient thanks to the excellent functionality within the
[collapse](https://sebkrantz.github.io/collapse/reference/collapse-package.html) package.
## Installation
You can install and load `timeplyr` using the below code.
```{r gh-installation, message = FALSE, eval = FALSE}
# CRAN version
install.packages("timeplyr")
# Development version
remotes::install_github("NicChr/timeplyr")
```
```{r package_load}
library(timeplyr)
```
# Basic examples
## Convert `ts`, `mts`, `xts`, `zoo`and `timeSeries` objects using `ts_as_tibble`
```{r}
library(tidyverse)
eu_stock <- EuStockMarkets %>%
ts_as_tibble()
eu_stock
```
## Easily plot time series using `time_ggplot`
```{r}
eu_stock %>%
time_ggplot(time, value, group)
```
For the next examples we use flights departing from New York City in 2013.
```{r}
library(nycflights13)
library(lubridate)
flights <- flights %>%
mutate(date = as_date(time_hour))
```
```{r, echo=FALSE}
theme_set(theme_minimal())
```
## `time_by`
### Group your time variable by any time unit
```{r}
flights_monthly <- flights %>%
select(date, arr_delay) %>%
time_by(date, "month")
flights_monthly
```
We can then use this to create a monthly summary of the number of flights
and average arrival delay
```{r}
flights_monthly %>%
summarise(n = n(),
mean_arr_delay = mean(arr_delay, na.rm = TRUE))
```
If the time unit is left unspecified, the `time` functions try to
find the highest time unit possible.
```{r}
flights %>%
time_by(time_hour)
```
## `time_complete()`
### Complete missing gaps in time
```{r}
flights %>%
count(time_hour) %>%
time_complete(time_hour)
```
### We can also make use of timeplyr time intervals
```{r}
quarters <- time_aggregate(flights$date, time_by = "quarter", as_interval = TRUE)
interval_count(quarters)
# Or simply
flights %>%
time_by(date, time_by = "quarter", as_interval = TRUE) %>%
count()
```
#### Ensure full weeks by setting from to the start of the week
```{r}
start <- dmy("17-Jan-2013")
flights %>%
time_by(date, "week",
from = floor_date(start, unit = "week")) %>%
count()
```
#### Check for missing gaps in time
```{r missing_dates}
missing_dates(flights$date) # No missing dates
```
```{r}
time_num_gaps(flights$time_hour, time_by = "hours") # Missing hours
```
To check for regularity use `time_is_regular`
```{r}
hours <- sort(flights$time_hour)
time_is_regular(hours, time_by = "hours")
time_is_regular(hours, time_by = "hours", allow_gaps = FALSE)
time_is_regular(hours, time_by = "hours", allow_dups = FALSE)
# By-group
time_num_gaps(flights$time_hour, g = flights$origin, time_by = "hours")
time_is_regular(flights$time_hour, g = flights$origin, time_by = "hours")
```
## `time_expand()`
Here we create monthly sequences for each destination
that accounts for the start and end dates of each destination
```{r}
flights %>%
group_by(dest) %>%
time_expand(date, time_by = "month") %>%
summarise(n = n(), start = min(date), end = max(date))
```
To create the same grid of months for each dest, we can do the following
```{r}
flights %>%
time_expand(date, dest, time_by = "month") %>%
summarise(n = n(), start = min(date), end = max(date), .by = dest)
```
The ability to create time sequences by group is
one of the most powerful features of timeplyr.
```{r}
flights %>%
time_by(date, "month", as_interval = TRUE) %>%
summarise(across(c(arr_time, dep_time), ~ mean(.x, na.rm = TRUE)))
```
# Grouped rolling time functions
## By-group rolling mean over the last 3 calendar months
```{r}
eu_stock <- eu_stock %>%
mutate(date = date_decimal(time))
eu_stock %>%
mutate(month_mean = time_roll_mean(value, window = months(3),
time = date,
g = group)) %>%
time_ggplot(date, month_mean, group)
```
## By-group rolling (locf) NA fill
```{r}
# Prerequisite: Create Time series with missing values
x <- ts(c(NA, 3, 4, NA, 6, NA, NA, 8))
g <- cheapr::seq_id(c(3, 5)) # Two groups of size 3 + 5
.roll_na_fill(x) # Simple locf fill
roll_na_fill(x, fill_limit = 1) # Fill up to 1 NA
roll_na_fill(x, g = g) # Very efficient on large data too
```
## `year_month` and `year_quarter`
timeplyr has its own lightweight 'yearmonth' and `yearquarter' classes
inspired by the excellent 'zoo' and 'tsibble' packages.
```{r}
today <- today()
year_month(today)
```
The underlying data for a `year_month` is the number of months
since 1 January 1970 (epoch).
```{r}
unclass(year_month("1970-01-01"))
unclass(year_month("1971-01-01"))
```
To create a sequence of 'year_months', one can use base arithmetic
```{r}
year_month(today) + 0:12
year_quarter(today) + 0:4
```
## `time_elapsed()`
Let's look at the time between consecutive flights for a specific flight number
```{r}
set.seed(42)
flight_201 <- flights %>%
distinct(time_hour, flight) %>%
filter(flight %in% sample(flight, size = 1)) %>%
arrange(time_hour)
tail(sort(table(time_elapsed(flight_201$time_hour, "hours"))))
```
Flight 201 seems to depart mostly consistently every 24 hours
We can efficiently do the same for all flight numbers
```{r}
# We use fdistinct with sort as it's much faster and simpler to write
all_flights <- flights %>%
fdistinct(flight, time_hour, sort = TRUE)
all_flights <- all_flights %>%
mutate(elapsed = time_elapsed(time_hour, g = flight, fill = 0))
# Flight numbers with largest relative deviation in time between flights
all_flights %>%
q_summarise(elapsed, .by = flight) %>%
mutate(relative_iqr = p75 / p25) %>%
arrange(desc(relative_iqr))
```
`time_seq_id()` allows us to create unique IDs for regular sequences
A new ID is created every time there is a gap in the sequence
```{r}
flights %>%
select(time_hour) %>%
arrange(time_hour) %>%
mutate(time_id = time_seq_id(time_hour)) %>%
filter(time_id != lag(time_id)) %>%
count(hour(time_hour))
```
We can see that the gaps typically occur at 11pm and the sequence
resumes at 5am.
### Other convenience functions are included below
## `calendar()`
#### Easily join common date information to your data
```{r}
flights_calendar <- flights %>%
select(time_hour) %>%
reframe(calendar(time_hour))
```
Now that gaps in time have been filled and we have
joined our date table, it is easy to count by any time dimension we like
```{r}
flights_calendar %>%
fcount(isoyear, isoweek)
flights_calendar %>%
fcount(isoweek = iso_week(time))
flights_calendar %>%
fcount(month_l)
```
## `.time_units`
See a list of available time units
```{r}
.time_units
```
## `age_years()`
Calculate ages (years) accurately
```{r}
age_years(dmy("28-02-2000"))
```
## `time_seq()`
A lubridate version of `seq()` for dates and datetimes
```{r}
start <- dmy(31012020)
end <- start + years(1)
seq(start, end, by = "month") # Base R version
time_seq(start, end, time_by = "month") # lubridate version
```
`time_seq()` doesn't mind mixing dates and datetimes
```{r}
time_seq(start, as_datetime(end), time_by = "2 weeks")
```
## `time_seq_v()`
A vectorised version of `time_seq()`
Currently it is vectorised over from, to and by
```{r}
# 3 sequences
time_seq_v(from = start,
to = end,
time_by = list("months" = 1:3))
# Equivalent to
c(time_seq(start, end, time_by = "month"),
time_seq(start, end, time_by = "2 months"),
time_seq(start, end, time_by = "3 months"))
```
## `time_seq_sizes()`
Vectorised function that calculates time sequence lengths
```{r}
seq_lengths <- time_seq_sizes(start, start + days(c(1, 10, 20)),
time_by = list("days" = c(1, 5, 10)))
seq_lengths
# Use time_seq_v2() if you know the sequence lengths
seqs <- time_seq_v2(seq_lengths, start, time_by = list("days" = c(1, 5, 10)))
seqs
```
Dealing with impossible dates and datetimes is very simple
```{r}
time_seq(start, end, time_by = "month", roll_month = "postday") # roll impossible months forward
time_seq(start, end, time_by = "month", roll_month = "NA") # no roll
time_seq(start, end, time_by = dmonths(1)) # lubridate version with durations
```
## `iso_week()`
Simple function to get formatted ISO weeks.
```{r}
iso_week(today())
iso_week(today(), day = TRUE)
iso_week(today(), year = FALSE)
```
## `time_cut()`
Create pretty time axes using `time_breaks()`
```{r}
times <- flights$time_hour
dates <- flights$date
date_breaks <- time_breaks(dates, n = 12)
time_breaks <- time_breaks(times, n = 12, time_floor = TRUE)
weekly_data <- flights %>%
time_by(time = date, time_by = "week",
to = max(time_span(date, time_by = "week")),
.name = "date") %>%
count()
weekly_data %>%
ggplot(aes(x = interval_start(date), y = n)) +
geom_bar(stat = "identity", fill = "#0072B2") +
scale_x_date(breaks = date_breaks, labels = scales::label_date_short())
flights %>%
ggplot(aes(x = time_hour)) +
geom_bar(fill = "#0072B2") +
scale_x_datetime(breaks = time_breaks, labels = scales::label_date_short())
```