Data standards

File Names

All final .csv datasets are named using the following convention:

[Theme Abbreviation][2-digit number]_[Spatial Scale].csv

For example, the Policy theme dataset on Prison Incarceration Rates (PS01) at the county-level is PS01_C.csv. The same dataset at the state level is PS01_S.csv, at the tract-level would be PS01_T.csv, and at the zip code level would be PS01_Z.csv.

Theme Abbreviations

Policy: PS
Health: Health, Access
Demographic: DS
Economic: EC
Physical Environment: BE
COVID-19: COVID

Spatial Scales

Tract: T
Zip/ZCTA: Z
County: C
State: S

Geographic Identifiers (GEOIDs)

All datasets have geographic identifiers included as a variable. We use the following labeling convention for each spatial scale:

Variable	Variable ID	Description
State	STATEFP	2-digit State FIPS code
County	COUNTYFP	5-digit County FIPS code (state + county)
ZIP Code/ZCTA	ZCTA	5-digit assigned ZCTA
Census Tract	GEOID	11-digit unique tract ID (state + county + tract)

Data Formatting

Watch for leading zeros. Some geographic identifiers for states, counties, zip codes, and tracts start with “0” or “00”; i.e. leading zeros. However, .csv and other text file formats drop leading zeros automatically upon opening. This means that a state FIPS code of “02” becomes “2”, a county code of “02004” becomes “2004”, a zip code of “07436” becomes “7436”, etc. If you are merging .csvs with any other data by their geographic identifier, you will need to add in the leading zeros (or conversely, drop the leading zeros in the other file) so that they match. This is particularly important when you are trying to merge with spatial format files (.shp, .gpkg, .geojson, etc), including the geographic boundary files.
Keep variable names to 10 characters or fewer for ease of data wrangling with shapefiles and GIS software. Some variable names are therefore shortened or abbreviated from the source data.
Numeric data are rounded to the nearest tenth (two decimal places).
Missing data are represented as “NA” or empty, depending on the language or platform you are working with. These should not be mistaken for or confused with the numeric “0”.

Key Guidelines

(tl;dr) If you are interested in contributing to the OEPS, please keep in mind the following key guidelines:

Variables names should be no more than 10 characters
Numeric observations should be rounded to the nearest tenth (two decimal places)
Remove any index columns
Remove quotations marks, commas, or other character punctuation
Code missing as unavailable data as NA or empty

Provide feedback

Saved searches

Use saved searches to filter your results more quickly