Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis-ready dataset of population, employment, and travel times in Toronto #702

Open
paezha opened this issue Jun 23, 2024 · 5 comments

Comments

@paezha
Copy link

paezha commented Jun 23, 2024

Dataset name: TTS2016R
Dataset download URL: https://soukhova.github.io/TTS2016R/
Article that demonstrates the dataset: https://doi.org/10.1177/23998083241242844
Cleaning script: The data are analysis-ready.

Data dictionary: All variables are documented in the package.

@paezha paezha added the dataset label Jun 23, 2024
@jonthegeek
Copy link
Collaborator

@paezha The DOI is the article from #701. I see in the package that it's supposed to be https://doi.org/10.1177/23998083221146781, though. Thanks!

@lgibson7
Copy link
Member

lgibson7 commented Aug 10, 2024

  • I can download the dataset from the link provided.
  • The dataset will (probably) be less than 50MB when saved as a tidy CSV.
  • There is a link to an article that has something to do with the dataset.
  • I can imagine a data visualization related to this dataset.
  • This dataset has not already been used in TidyTuesday.
  • ALT text is provided for all (both) images.
  • There is a data dictionary describing the columns of the dataset.
  • The TidyTuesday maintainers are unlikely to get sued for using the dataset.

@lgibson7
Copy link
Member

Hi @paezha. Thanks for submitting this issue. Would you be willing to submit the data set through a PR? You can find the instructions on how to do so here.

@paezha
Copy link
Author

paezha commented Aug 14, 2024

Hi @lgibson7 - Happy to submit the dataset. It is already an R package, though, so I am unsure how many, if any, of the steps outlined here are needed. For example, the data files are already clean and saved in native R format.

@jonthegeek
Copy link
Collaborator

@paezha Regardless of the source, we share the datasets as one or more CSVs. When the data comes from a package, the cleaning script will likely be very short, along the lines of this:

# Clean data from pkgname (https://pkgurl)
toronto_population <- pkgname::toronto_population
toronto_employment <- pkgname::toronto_employment
toronto_travel <- pkgname::toronto_travel

It's very similar to situations where the data is cleanly available as CSVs, such as the recent American Idol dataset.

The cleaning might also be more complicated, to take a subset of the data or otherwise make it more CSV-friendly, such as what I did to share our own data from our ttmeta package.

I hope that helps explain the process!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants