Skip to content

Repo structure

Susan Paykin edited this page Jun 9, 2022 · 1 revision

This repository is organized around the following structure:

  • code: primary r and python scripts used in data wrangling and cleaning; any code used for processing or cleaning datasets goes here

  • data_final: where final datasets go, labeled according to the OEPS Data Standards

    • Health Resources: geocoded health resource datasets, used primarily for calculating access metrics
    • geometryFiles: Shapefiles (spatial data files) for all US states, counties, tracts, and ZIP Code Tract Areas (ZCTAs) as of 2018, for GIS and mapping applications. Also contains a HUD-USPS tract-ZIP crosswalk file.
    • metadata: contains all metadata files (data documentation) for each dataset included in the OEPS
    • moud: contains master MOUD provider location datasets, scraped from US SAMHSA (2019). Files include category of MOUD: methadone, buprenorphine, or naltrexone. The .csv and .gpkg files are tabular (non-spatial) and spatial formats of the same data. The file labeled _geocoded is geocoded with latitude and longitude variables.
    • historic: contains multiple geocoded csv files of methadone provider locations from historic databases, back to 1990.
  • data_raw: raw, unprocessed data -- mostly added by our team before it’s been cleaned

  • qaqc: contains scripts with automated quality checks. Currently run manually by admins. Do not change or edit this folder.

Clone this wiki locally