-
Notifications
You must be signed in to change notification settings - Fork 14
Data standards
All final .csv datasets are named using the following convention:
[Theme Abbreviation][2-digit number]_[Spatial Scale].csv
For example, the Policy theme dataset on Prison Incarceration Rates (PS01) at the county-level is PS01_C.csv
. The same dataset at the state level is PS01_S.csv
, at the tract-level would be PS01_T.csv
, and at the zip code level would be PS01_Z.csv
.
- Policy: PS
- Health: Health, Access
- Demographic: DS
- Economic: EC
- Physical Environment: BE
- COVID-19: COVID
- Tract: T
- Zip/ZCTA: Z
- County: C
- State: S
All datasets have geographic identifiers included as a variable. We use the following labeling convention for each spatial scale:
Variable | Variable ID | Description |
---|---|---|
State | STATEFP | 2-digit State FIPS code |
County | COUNTYFP | 5-digit County FIPS code (state + county) |
ZIP Code/ZCTA | ZCTA | 5-digit assigned ZCTA |
Census Tract | GEOID | 11-digit unique tract ID (state + county + tract) |
-
Watch for leading zeros. Some geographic identifiers for states, counties, zip codes, and tracts start with “0” or “00”; i.e. leading zeros. However, .csv and other text file formats drop leading zeros automatically upon opening. This means that a state FIPS code of “02” becomes “2”, a county code of “02004” becomes “2004”, a zip code of “07436” becomes “7436”, etc. If you are merging .csvs with any other data by their geographic identifier, you will need to add in the leading zeros (or conversely, drop the leading zeros in the other file) so that they match. This is particularly important when you are trying to merge with spatial format files (.shp, .gpkg, .geojson, etc), including the geographic boundary files.
-
Keep variable names to 10 characters or fewer for ease of data wrangling with shapefiles and GIS software. Some variable names are therefore shortened or abbreviated from the source data.
-
Numeric data are rounded to the nearest tenth (two decimal places).
-
Missing data are represented as “NA” or empty, depending on the language or platform you are working with. These should not be mistaken for or confused with the numeric “0”.
(tl;dr) If you are interested in contributing to the OEPS, please keep in mind the following key guidelines:
- Variables names should be no more than 10 characters
- Numeric observations should be rounded to the nearest tenth (two decimal places)
- Remove any index columns
- Remove quotations marks, commas, or other character punctuation
- Code missing as unavailable data as NA or empty