Skip to content

Commit

Permalink
Merge pull request #505 from BourkeCaitlin/unique-id
Browse files Browse the repository at this point in the history
update key_ID description in text - unique ids
  • Loading branch information
juanfung authored Mar 25, 2024
2 parents 15fe94a + b3f611b commit 385e77a
Showing 1 changed file with 15 additions and 8 deletions.
23 changes: 15 additions & 8 deletions episodes/04-tidyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -72,18 +72,25 @@ how they relate to these different types of data formats.
### Long and wide data formats

In the `interviews` data, each row contains the values of variables associated
with each record collected (each interview in the villages), where it is stated
with each record collected (each interview in the villages). It is stated
that the `key_ID` was "added to provide a unique Id for each observation"
and the `instance_ID` "does this as well but it is not as convenient to use."
and the `instanceID` "does this as well but it is not as convenient to use."

However, with some inspection, we notice that there are more than one row in the
dataset with the same `key_ID` (as seen below). However, the `instanceID`s
associated with these duplicate `key_ID`s are not the same. Thus, we should
think of `instanceID` as the unique identifier for observations!
Once we have established that `key_ID` and `instanceID` are both unique we can use
either variable as an identifier corresponding to the 131 interview records.

```{r, purl=FALSE}
interviews %>%
select(key_ID, village, interview_date, instanceID)
interviews %>%
select(key_ID) %>%
distinct() %>%
count()
```

```{r, purl=FALSE}
interviews %>%
select(instanceID) %>%
distinct() %>%
count()
```

As seen in the code below, for each interview date in each village no
Expand Down

0 comments on commit 385e77a

Please sign in to comment.