-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
382 lines (274 loc) · 14 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
---
title: "arcstats"
output:
github_document:
toc: true
toc_depth: 3
always_allow_html: true
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
echo = TRUE,
cache = FALSE,
message = FALSE
)
```
<!-- badges: start -->
<!-- badges: end -->
The `arcstats` package allows to generate summaries of the minute-level physical activity (PA) data. The default parameters are chosen for the Actigraph activity counts collected with a wrist-worn device; however, the package can be used for all minute-level PA data with the corresponding timestamps vector.
Below, we demonstrate the use of `arcstats` with the attached, exemplary minute-level Actigraph PA counts data.
## Installation
You can install the released version of arcstats from [GitHub](https://github.com/). Note you may need to install `devtools` package if not yet installed (the line commented below).
``` r
# install.packages("devtools")
devtools::install_github("martakarass/arcstats")
```
## Documentation
A PDF with detailed documentation of all methods can be accessed [here](https://www.dropbox.com/s/cfnud6zxntub7a7/arcstats_0.1.1.pdf?dl=1).
# Using `arcstats` package to compute physical activity summaries
### Reading PA data
Four CSV data sets with minute-level activity counts data are attached to the `arcstats` package. The data file names are stored in `extdata_fnames`.
```{r}
library(arcstats)
library(data.table)
library(dplyr)
library(lubridate)
library(ggplot2)
## Read one of the data sets
fpath <- system.file("extdata", extdata_fnames[1], package = "arcstats")
dat <- as.data.frame(fread(fpath))
rbind(head(dat, 3), tail(dat, 3))
```
The data columns are:
- `Axis1` - sensor's X axis minute-level counts data,
- `Axis2` - sensor's Y axis minute-level counts data,
- `Axis3` - sensor's Z axis minute-level counts data,
- `vectormagnitude` - minute-level counts data defined as `sqrt(Axis1^2 + Axis2^2 + Axis3^2)`,
- `timestamp` - time-stamps corresponding to minute-level measures.
```{r, fig.width=8, fig.height=3.5}
## Plot activity counts
ggplot(dat, aes(x = ymd_hms(timestamp), y = vectormagnitude)) +
geom_line(size = 0.3, alpha = 0.8) +
labs(x = "Time", y = "Activity counts") +
theme_gray(base_size = 10) +
scale_x_datetime(date_breaks = "1 day", date_labels = "%b %d")
```
### Computing summaries with `activity_stats` method
```{r, results='hide'}
acc <- dat$vectormagnitude
acc_ts <- ymd_hms(dat$timestamp)
activity_stats(acc, acc_ts)
```
```{r, echo = FALSE}
library(knitr)
library(kableExtra)
tbl_styl_f <- function(x) kable_styling(x, font_size = 14)
```
```{r, echo = FALSE}
out <- activity_stats(acc, acc_ts)
# tbl_1 <- kable(out[, 1:6], table.attr = 'class="myTable"') %>% tbl_styl_f(.)
tbl_1 <- kable(out[, 1:6]) %>% tbl_styl_f(.)
tbl_2 <- kable(out[, 7:10]) %>% tbl_styl_f(.)
tbl_3 <- kable(out[, 11:14]) %>% tbl_styl_f(.)
tbl_1; tbl_2; tbl_3
```
### Output explained
To explain `activity_stats` method output, we define activity count, active/non-active minute, and active/non-active bout.
- Activity count (AC) - a minute-level metric of PA volume
- Active minute - AC labeled as active (non-sedentary). For Wrist-worn Actigraph
we use AC>=1853 (defaut threshold).
- Active bout - sequence of >=1 consecutive active minute(s).
- Valid day - a day with less than 10% of the non-wear time (see `?get_wear_flag` for wear/non-wear time detection details).
Meta information:
- `n_days` - number of days (unique day dates) of data collection.
- `n_valid_days` - number of days (unique day dates) of data collection determined as valid days.
- `wear_time_on_valid_days` - average number of wear-time minutes across valid days.
Summaries of PA volumes metrics:
- `tac` - TAC, Total activity counts per day - sum of AC measured on valid days divided by the number of valid days.
- `tlac` - TLAC, Total-log activity counts per day - sum of log(1+AC) measured on valid days divided by the number of valid days. Here 'log' denotes the natural logarithm.
- `ltac` - LTAC, Log-total activity counts - natural logarithm of TAC.
- `time_spent_active` - Average number of active minutes per valid day.
- `time_spent_nonactive` - Average number of sedentary minutes per valid day.
Summaries of PA fragmentation metrics:
- `astp` - ASTP, active to sedentary transition probability on valid days.
- `satp` - SATP, sedentary to active transition probability on valid days.
- `no_of_active_bouts` - Average number of active minutes per valid day.
- `no_of_nonactive_bouts` - Average number of sedentary minutes per valid day.
- `mean_active_bout` - Average duration (in minutes) of an active bout on valid days.
- `mean_nonactive_bout` - Average duration (in minutes) of a sedentary bout on valid days.
# Additional`activity_stats` method options
### Summarizing PA only in a chosen time-of-day subset
The `subset_minutes` argument allows to specify subset of a day's minutes where activity summaries should be computed. There are 1440 minutes in a 24-hour day where `1` denotes 1st minute of the day (from 00:00 to 00:01), and `1440` denotes the last minute (from 23:59 to 00:00).
Here, we summarize PA observed between 12:00 AM and 6:00 AM.
```{r, results = 'hide'}
subset_12am_6am <- 1 : (6 * 1440/24)
activity_stats(acc, acc_ts, subset_minutes = subset_12am_6am)
```
```{r, echo = FALSE}
out <- activity_stats(acc, acc_ts, subset_minutes = subset_12am_6am)
tbl_1 <- kable(out[, 1:6]) %>% tbl_styl_f(.)
tbl_2 <- kable(out[, 7:10]) %>% tbl_styl_f(.)
tbl_3 <- kable(out[, 11:14]) %>% tbl_styl_f(.)
tbl_1; tbl_2; tbl_3
```
By default, column names have a suffix added to denote that a subset of minutes was used (here, `_0to6only`). This can be disabled by setting `adjust_out_colnames` to `FALSE`.
```{r, results = 'hide'}
subset_12am_6am = 1 : (6/24 * 1440)
subset_6am_12pm = (6/24 * 1440 + 1) : (12/24 * 1440)
subset_12pm_6pm = (12/24 * 1440 + 1) : (18/24 * 1440)
subset_6pm_12am = (18/24 * 1440 + 1) : (24/24 * 1440)
out <- rbind(
activity_stats(acc, acc_ts, subset_minutes = subset_12am_6am, adjust_out_colnames = FALSE),
activity_stats(acc, acc_ts, subset_minutes = subset_6am_12pm, adjust_out_colnames = FALSE),
activity_stats(acc, acc_ts, subset_minutes = subset_12pm_6pm, adjust_out_colnames = FALSE),
activity_stats(acc, acc_ts, subset_minutes = subset_6pm_12am, adjust_out_colnames = FALSE))
rownames(out) <- c("12am-6am", "6am-12pm", "12pm-6pm", "6pm-12am")
out
```
```{r, echo = FALSE}
tbl_1 <- kable(out[, 1:6]) %>% tbl_styl_f(.)
tbl_2 <- kable(out[, 7:10]) %>% tbl_styl_f(.)
tbl_3 <- kable(out[, 11:14]) %>% tbl_styl_f(.)
tbl_1; tbl_2; tbl_3
```
### Summarizing PA with a chosen time-of-day excluded
The `exclude_minutes` argument allows to specify subset of a day's minutes excluded for computing activity summaries.
Here, we summarize PA while excluding observations between 11:00 PM and 5:00 AM.
```{r, results = 'hide'}
subset_11pm_5am <- c(
(23 * 1440/24 + 1) : 1440, ## 11:00 PM - midnight
1 : (5 * 1440/24) ## midnight - 5:00 AM
)
activity_stats(acc, acc_ts, exclude_minutes = subset_11pm_5am)
```
```{r, echo = FALSE}
out <- activity_stats(acc, acc_ts, exclude_minutes = subset_11pm_5am)
tbl_1 <- kable(out[, 1:6]) %>% tbl_styl_f(.)
tbl_2 <- kable(out[, 7:10]) %>% tbl_styl_f(.)
tbl_3 <- kable(out[, 11:14]) %>% tbl_styl_f(.)
tbl_1; tbl_2; tbl_3
```
### Summarizing PA excluding in-bed time
The `in_bed_time` and `out_bed_time` arguments allow to provide day-specific in-bed periods to be excluded from analysis.
Here, we summarize PA excluding in-bed time estimated by ActiLife software.
##### ActiLife-estimated in-bed data
The ActiLife-estimated in-bed data file is attached to the `arcstats` package. The sleep data columns relevant are,
- `Subject Name` - subject IDs coressponding to AC data, stored in `extdata_fnames`,
- `In Bed Time` - ActiLife-estimated start of in-bed interval for each day of the measurement,
- `Out Bed Time` - ActiLife-estimated end of in-bed interval. </br>
```{r}
## Read sleep details data file
SleepDetails_fname <- "BatchSleepExportDetails_2020-05-01_14-00-46.csv"
SleepDetails_fpath <- system.file("extdata", SleepDetails_fname, package = "arcstats")
SleepDetails <- as.data.frame(fread(SleepDetails_fpath))
## Filter sleep details data to keep data correcponding to current counts data file
SleepDetails_sub <-
SleepDetails %>%
filter(`Subject Name` == "ID_1") %>%
mutate(`In Bed Time` = mdy_hms(`In Bed Time`),
`Out Bed Time` = mdy_hms(`Out Bed Time`)) %>%
select(`Subject Name`, `In Bed Time`, `Out Bed Time`)
SleepDetails_sub
```
Finally, we use in/out-bed time `POSIXct` vectors in `activity_stats` method.
```{r, results = 'hide'}
in_bed_time <- SleepDetails_sub[, "In Bed Time"]
out_bed_time <- SleepDetails_sub[, "Out Bed Time"]
activity_stats(acc, acc_ts, in_bed_time = in_bed_time, out_bed_time = out_bed_time)
```
```{r, echo = FALSE}
out <- activity_stats(acc, acc_ts, in_bed_time = in_bed_time, out_bed_time = out_bed_time)
tbl_1 <- kable(out[, 1:6]) %>% tbl_styl_f(.)
tbl_2 <- kable(out[, 7:10]) %>% tbl_styl_f(.)
tbl_3 <- kable(out[, 11:14]) %>% tbl_styl_f(.)
tbl_1; tbl_2; tbl_3
```
# Components of `activity_stats` method
The primary method `activity_stats` is composed of several steps implemented in their respective functions. Below, we demonstrate how to produce `activity_stats` results step by step with these functions.
We reuse the objects:
- `acc` - numeric vector; minute-level activity counts data,
- `acc_ts` - POSIXct vector; minute-level time of `acc` data collection.
```{r}
df <- data.frame(acc = acc, acc_ts = acc_ts)
rbind(head(df, 3), tail(df, 3))
```
### Expand the length of minute-level AC vector to integer number of full 24-hour periods by NA-padding with `midnight_to_midnight`
- In the returned vector, the first observation corresponds to minute of `00:00-00:01` on the first day of data collection, and last observation corresponds to minute of `23:50-00:00` on the last day of data collection.
- Entries corresponding to non-measured minutes are filled with `NA`.
Here, collected data cover total of `7*24*1440 = 10080` minutes (from `2018-07-13 10:00:00` to `2018-07-20 09:59:00`), but spans `8*24*1440 = 11520` minutes of full midnight-to-midnight days (from `2018-07-13 00:00:00` to `2018-07-20 23:59:00`).
```{r}
acc <- midnight_to_midnight(acc = acc, acc_ts = acc_ts)
## Vector length on non NA-obs, vector length after acc
c(length(acc[!is.na(acc)]), length(acc))
```
### Get wear/non-wear flag with `get_wear_flag`
Function `get_wear_flag` computes wear/non-wear flag (`1/0`) for each minute of activity counts data. Method implements wear/non-wear detection algorithm proposed by Choi et al. (2011).
- The returned vector has value `1` for wear and `0` for non-wear flagged minute.
- If there is an `NA` entry in a data input vector, then the returned vector will have a corresponding entry set to `NA` too.
```{r}
wear_flag <- get_wear_flag(acc)
## Proportion of wear time across the days
wear_flag_mat <- matrix(wear_flag, ncol = 1440, byrow = TRUE)
round(apply(wear_flag_mat, 1, sum, na.rm = TRUE) / 1440, 3)
```
### Get valid/non-valid day flag with `get_valid_day_flag`
Function `get_valid_day_flag` computes valid/non-valid day flag (`1/0`) for each minute of activity counts data.
Here, 4 out of 8 days have more that 10% (144 minutes) of missing data.
```{r}
valid_day_flag <- get_valid_day_flag(wear_flag)
## Compute number of valid days
valid_day_flag_mat <- matrix(valid_day_flag, ncol = 1440, byrow = TRUE)
apply(valid_day_flag_mat, 1, mean, na.rm = TRUE)
```
### Impute missing data with `impute_missing_data`
Here, all four valid days have 100% of wear time. We hence demonstrate the `impute_missing_data` method by artificially replacing 1h (4%) of a valid day with non-wear and running `impute_missing_data` function then.
Function `impute_missing_data` imputes missing data from the "average day profile". An "average day profile" is computed as a minute-wise average of AC across valid days.
```{r}
## Copies of original objects for the purpose of demonstration
acc_cpy <- acc
wear_flag_cpy <- wear_flag
## Artificially replace 1h (4%) of a valid day with non-wear
repl_idx <- seq(from = 1441, by = 1, length.out = 60)
acc_cpy[repl_idx] <- 0
wear_flag_cpy[repl_idx] <- 0
## Impute data for minutes identified as non-wear in days identified as valid
acc_cpy_imputed <- impute_missing_data(acc_cpy, wear_flag_cpy, valid_day_flag)
## Compare mean activity count on valid days before and after imputation
c(mean(acc_cpy[which(valid_day_flag == 1)]),
mean(acc_cpy_imputed[which(valid_day_flag == 1)]))
```
### Create PA characteristics with `summarize_PA`
Finally, method `summarize_PA` computes physical all PA characteristics.
- Same as `activity_stats`, it returns a data frame with physical activity summaries.
- Same as `activity_stats`, it accepts arguments to compute physical activity summaries
- within a fixed subset of minutes (arg. `subset_minutes`),
- excluding a fixed subset of minutes (arg. `exclude_minutes`),
- excluding day-specific in-bed intervals (args `in_bed_time`, `out_bed_time`).
See `?summarize_PA` for more details.
Here, we summarize PA with the default parameters, across all valid days.
```{r, results = 'hide'}
summarize_PA(acc, acc_ts, wear_flag, valid_day_flag)
```
```{r, echo = FALSE}
out <- summarize_PA(acc, acc_ts, wear_flag, valid_day_flag)
tbl_1 <- kable(out[, 1:6]) %>% tbl_styl_f(.)
tbl_2 <- kable(out[, 7:10]) %>% tbl_styl_f(.)
tbl_3 <- kable(out[, 11:14]) %>% tbl_styl_f(.)
tbl_1; tbl_2; tbl_3
```
It returns the same results as the `activity_stats` function:
```{r, results = 'hide'}
activity_stats(dat$vectormagnitude, ymd_hms(dat$timestamp))
```
```{r, echo = FALSE}
out <- activity_stats(dat$vectormagnitude, ymd_hms(dat$timestamp))
tbl_1 <- kable(out[, 1:6]) %>% tbl_styl_f(.)
tbl_2 <- kable(out[, 7:10]) %>% tbl_styl_f(.)
tbl_3 <- kable(out[, 11:14]) %>% tbl_styl_f(.)
tbl_1; tbl_2; tbl_3
```