diff --git a/episodes/04-tidyr.Rmd b/episodes/04-tidyr.Rmd
index 91109223..fa5ae70b 100644
--- a/episodes/04-tidyr.Rmd
+++ b/episodes/04-tidyr.Rmd
@@ -72,18 +72,25 @@ how they relate to these different types of data formats.
 ### Long and wide data formats
 
 In the `interviews` data, each row contains the values of variables associated
-with each record collected (each interview in the villages), where it is stated
+with each record collected (each interview in the villages). It is stated
 that the `key_ID` was "added to provide a unique Id for each observation"
-and the `instance_ID` "does this as well but it is not as convenient to use."
+and the `instanceID` "does this as well but it is not as convenient to use."
 
-However, with some inspection, we notice that there are more than one row in the
-dataset with the same `key_ID` (as seen below). However, the `instanceID`s
-associated with these duplicate `key_ID`s are not the same. Thus, we should
-think of `instanceID` as the unique identifier for observations!
+Once we have established that `key_ID` and `instanceID` are both unique we can use 
+either variable as an identifier corresponding to the 131 interview records.
 
 ```{r, purl=FALSE}
-interviews %>%
-  select(key_ID, village, interview_date, instanceID)
+interviews %>% 
+  select(key_ID) %>% 
+  distinct() %>% 
+  count()
+```
+
+```{r, purl=FALSE}
+interviews %>% 
+  select(instanceID) %>% 
+  distinct() %>% 
+  count()
 ```
 
 As seen in the code below, for each interview date in each village no