forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
79 lines (72 loc) · 3.06 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
``` {r setoptions,echo=FALSE}
library(knitr)
opts_chunk$set(echo=TRUE)
```
### Loading and preprocessing the data
``` {r}
activity <- read.csv("activity.csv")
activity$date <- as.Date(activity$date,format="%Y-%M-%d")
```
### What is mean and median total number of steps taken per day?
``` {r}
daily_steps <- tapply(activity$steps,activity$date,sum,na.rm=TRUE)
mean_steps <- mean(daily_steps,na.rm=TRUE)
median_steps <- median(daily_steps,na.rm=TRUE)
```
The mean number of steps in a day is `r format(mean_steps,scientific=F)` and median of the total number of steps in a day is `r format(median_steps,scientific=F)`.
Following is a histogram of the total number of steps in a day.
``` {r}
hist(daily_steps,xlab="Total number of steps in a day")
```
### What is the average daily activity pattern?
Following is the time series plot of the average number of steps (averaged across each day) in each 5-minute interval.
``` {r}
avg_interval <- tapply(activity$steps,activity$interval,mean,na.rm=TRUE)
plot(names(avg_interval),avg_interval,type="l",xlab="5 minute interval",ylab="Average number of steps")
```
``` {r}
m <- max(avg_interval,na.rm=TRUE)
i <- which(avg_interval==m) # index of the maximum value of average steps
max_interval <- names(avg_interval)[i] # interval with maximum number of average steps
```
The 5-minute interval with the maximum number of steps on average is the `r max_interval` interval.
### Imputing missing values
The missing values were filled with the mean value of steps for the corresponding 5 minute interval averaged across the days.
```{r}
na_num <- sum(is.na(activity$steps))
na_index <- which(is.na(activity$steps))
activity$new_steps <- activity$steps
for (i in na_index)
{
activity[i,"new_steps"] <- avg_interval[[as.character(activity[i,"interval"])]]
}
new_daily_steps <- tapply(activity$new_steps,activity$date,sum,na.rm=TRUE)
```
Following is the histogram after filling the missing values.
```{r}
hist(new_daily_steps,xlab="Total number of steps in a day")
```
```{r}
new_mean_steps <- mean(new_daily_steps)
new_median_steps <- median(new_daily_steps)
```
The average of total steps in a day in the new dataset (after imputing missing values) is `r format(new_mean_steps,scientific=F)` and the median is `r format(new_median_steps,scientific=F)`.
Both values are higher than the estimates calculated before imputing the missing values.
### Are there differences in activity patterns between weekdays and weekends?
Following is the comparison of activity patterns on weekdays and weekends.
``` {r}
library(dplyr)
activity <- mutate(activity,day=ifelse(weekdays(date)=="Saturday"|weekdays(date)=="Sunday","Weekend","Weekday"))
avg_weekday <- tapply(activity$new_steps,list(activity$interval,activity$day),mean)
library(reshape2)
weekavg <- melt(avg_weekday)
colnames(weekavg) <- c("Interval","Day","Steps")
library(ggplot2)
ggplot(data=weekavg,aes(x=Interval,y=Steps,color=Day)) + geom_line() + xlab("5-minute interval") + ylab("Number of steps") + facet_wrap(~Day,ncol=1,nrow=2)
```