title | description | free_preview |
Hot-deck Imputation |
Learner will be able to impute values based on how similar those values are to other values using R. |
true |
type: VideoExercise
key: a297bfdc4c
xp: 50
type: NormalExercise
key: a52f360854
xp: 100
Here is a quick exercise to learn how to use the hotdeck procedure in R. We will use the mammal sleep data that is pre-loaded in the VIM package. A data frame with 62 observations on the following 10 variables:
- BodyWgt a numeric vector
- BrainWgt a numeric vector
- NonD a numeric vector
- Dream a numeric vector
- Sleep a numeric vector
- Span a numeric vector
- Gest a numeric vector
- Pred a numeric vector
- Exp a numeric vector
- Danger a numeric vector
- Install the VIM package.
- Load the sleep data.
- Choose the 'BrainWgt' variable to sort the data before imputation.
- Impute the missing variables using hotdeck.
Use the "ord_var" to choose the "BrainWgt" variable.
sleepI2 <- hotdeck(...)
sleepI2 <- hotdeck(sleep,ord_var="BodyWgt")
success_msg('Good job on completing the exercise!', praise = FALSE)
type: NormalExercise
key: fd410340f7
xp: 100
Now we will practice the simple imputation techniques that we have learned in R. We will try a few of these techniques out on a dataset about class grades. For this data, the recorded values are the average of sub-components: e.g the Tutorial variable is the average of all tutorials, the Final exam variable is the average of all questions in the final, written exam.
First let's visualize the missing variables. Create an aggregation plot using VIM. Does there appear to be a lot of missing values or a little?
Next, let's try mean imputation for the missing values. Try using mean imputation with the HMSIC package for the two variables that have missing data.
data <- read.csv(file = "class-grades.csv", header = TRUE) #load the data
agg_data <- (, numbers = TRUE, prop = c(TRUE, FALSE))
data$ImputedFinal <- (, )
data$ImputedTakeHome <- (, )
#aggregation plots
agg_data <- aggr(data, numbers = TRUE, prop = c(TRUE, FALSE))
#mean imputation
data$ImputedFinal <- impute(data$Final, mean)
data$ImputedTakeHome <- impute(data$TakeHome, mean)
#User should use impute()
#User should use agg()
print('Great job getting the exercise done!')