-
Notifications
You must be signed in to change notification settings - Fork 0
/
05_tidymodels.Rmd
81 lines (64 loc) · 1.15 KB
/
05_tidymodels.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
title: "tidymodels"
author: "Axel R"
date: "2024-01-21"
output: html_document
---
```{r message=FALSE, warning=FALSE}
DataExplorer::create_report(taxi)
library(parsnip)
```
# Data Budget
## The initial SPlit
```{r}
set.seed(123)
taxi_split <- initial_split(taxi)
taxi_split
```
## Accessing the data
```{r}
taxi_train <- training(taxi_split)
taxi_test <- testing(taxi_split)
```
## Data splitting and spending
```{r}
set.seed(123)
taxi_split <- initial_split(taxi, prop = 0.8)
taxi_train <- training(taxi_split)
taxi_test <- testing(taxi_split)
nrow(taxi_train)
#> [1] 8000
nrow(taxi_test)
#> [1] 2000
```
## Validation set
```{r}
set.seed(123)
initial_validation_split(taxi, prop = c(0.6, 0.2))
```
## Stratification
```{r}
set.seed(123)
taxi_split <- initial_split(taxi, prop = 0.8, strata = tip)
taxi_split
```
```{r}
# Model
logistic_reg()
# Engine
logistic_reg() %>%
set_engine("glm")
#
decision_tree() %>%
set_mode("classification")
```
## Model Workflow
```{r}
tree_spec <-
decision_tree(cost_complexity = 0.002) %>%
set_mode("classification")
workflow() %>%
add_formula(tip ~ .) %>%
add_model(tree_spec) %>%
fit(data = taxi_train)
```