You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This lesson uses GPS and interview date to try to correct a mislabeled village. However, the GPS locations in the 3 villages are not distinct. With a scatter plot, we can see that the GPS locations are in 3 clusters, but each of those clusters has responses from multiple villages.
Is this a bad copy of the data? or is the actual GPS data bad?
If the GPS data is actually bad, maybe we should change the last exercise of episode 3 to only rely on interview date?
Edit: update the link to the exercise.
Conclusion by @bencomp from discussion below: let's update the exercise to only rely on interview date.
The text was updated successfully, but these errors were encountered:
The only explanation I can come up with is that the surveys are collected at sites, and the GPS coordinates are for the device used to collect the data, but that the survey asks questions on household and village that may not correspond to the location of data collection.
The figshare entry is unclear about this... it states that the province, district, ward, and village are all related to where the survey was conducted.
The closest reference I could find about the survey methodology was Bont et al. 2019, but this didn't have useful details either.
I agree with using interview date to resolve the mislabeled village.
I also ask to find the correct name for village 49 only using interview dates. That makes the exercise in episode 3 smaller and quicker, although it removes the need for sorting on multiple columns. Maybe we can add an exercise with sorting on multiple values for finding errors in the ward and district columns?
This lesson uses GPS and interview date to try to correct a mislabeled village. However, the GPS locations in the 3 villages are not distinct. With a scatter plot, we can see that the GPS locations are in 3 clusters, but each of those clusters has responses from multiple villages.
Is this a bad copy of the data? or is the actual GPS data bad?
If the GPS data is actually bad, maybe we should change the last exercise of episode 3 to only rely on interview date?
Edit: update the link to the exercise.
Conclusion by @bencomp from discussion below: let's update the exercise to only rely on interview date.
The text was updated successfully, but these errors were encountered: