title	description	free_preview
Hot-deck Imputation	Learner will be able to impute values based on how similar those values are to other values using R.	true

Impute by Matching

type: VideoExercise
key: a297bfdc4c
xp: 50

@projector_key 8a75340ffe779b8d3cfdd8597519ffe0

Using HotDeck in R

type: NormalExercise
key: a52f360854
xp: 100

Here is a quick exercise to learn how to use the hotdeck procedure in R. We will use the mammal sleep data that is pre-loaded in the VIM package. A data frame with 62 observations on the following 10 variables:

BodyWgt a numeric vector
BrainWgt a numeric vector
NonD a numeric vector
Dream a numeric vector
Sleep a numeric vector
Span a numeric vector
Gest a numeric vector
Pred a numeric vector
Exp a numeric vector
Danger a numeric vector

@instructions

Install the VIM package.
Load the sleep data.
Choose the 'BrainWgt' variable to sort the data before imputation.
Impute the missing variables using hotdeck.

@hint Use the "ord_var" to choose the "BrainWgt" variable.

@pre_exercise_code

install.packages('VIM')
library('VIM')
data(sleep)

@sample_code

install.packages('VIM')
library('VIM')
data(sleep)
sleepI2 <- hotdeck(...)

@solution

install.packages('VIM')
library('VIM')
data(sleep)
sleepI2 <- hotdeck(sleep,ord_var="BodyWgt")

@sct

success_msg('Good job on completing the exercise!', praise = FALSE)

Simple Imputation Approaches in R

type: NormalExercise
key: fd410340f7
xp: 100

Now we will practice the simple imputation techniques that we have learned in R. We will try a few of these techniques out on a dataset about class grades. For this data, the recorded values are the average of sub-components: e.g the Tutorial variable is the average of all tutorials, the Final exam variable is the average of all questions in the final, written exam.

@instructions First let's visualize the missing variables. Create an aggregation plot using VIM. Does there appear to be a lot of missing values or a little?

Next, let's try mean imputation for the missing values. Try using mean imputation with the HMSIC package for the two variables that have missing data.

@hint

@pre_exercise_code

library(VIM)
library(Hmisc)

data <- read.csv(file = "class-grades.csv", header = TRUE) #load the data

@sample_code

agg_data <- (, numbers = TRUE, prop = c(TRUE, FALSE))

data$ImputedFinal <- (, )

data$ImputedTakeHome <- (, )

@solution

#aggregation plots
agg_data <- aggr(data, numbers = TRUE, prop = c(TRUE, FALSE))

#mean imputation
data$ImputedFinal <- impute(data$Final, mean)

data$ImputedTakeHome <- impute(data$TakeHome, mean)

@sct

#User should use impute()

#User should use agg()

print('Great job getting the exercise done!')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chapter1.md

chapter1.md

Impute by Matching

Using HotDeck in R

Simple Imputation Approaches in R

Files

chapter1.md

Latest commit

History

chapter1.md

File metadata and controls

Impute by Matching

Using HotDeck in R

Simple Imputation Approaches in R