-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathFirst file.Rmd
43 lines (33 loc) · 1.16 KB
/
First file.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
title: "First file"
author: "3859 Group"
date: "2020/12/3"
output: html_document
---
```{r setup, include=FALSE}
data <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv")
```
# Change the wind into a factor variable, attempt first fit of the model
```{r}
str(data)
data$cbwd <- as.factor(data$cbwd)
first_lm <- lm(pm2.5 ~., data = data)
summary(first_lm)
AIC(first_lm)
```
Steps to be done:
## Early steps
1. Calculate p-value for each predictor (Single parameter test)
2. Come out with some hypothese that which predictors are not significant (Nested model + ANOVA)
3. Test for interactions (details yet to come)
4. Model diagnostics
4.1 Linearity
4.2 Normality
4.3 Equal variance
4.4 Independence using time series (difficult)
5. Check for unusual observations (Cook's distance for outliers or graphs)
6. See if response transformation is needed (Variance stabilization or box-cox transformation)
7. Check if predictors have multicollinearity (VIF)
## Finishing up
8. Variable selection (which variables to keep, based on previous results and AIC, BIC or PRESS test)
9. Come to conclusion and finish report