Skip to content

Pipeline for harvesting and re-routing USSC individual offender files to cloud properties for downstream computation

Notifications You must be signed in to change notification settings

alexjakubow/ussc_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

25d2652 · Mar 15, 2023

History

4 Commits
Mar 15, 2023
Mar 15, 2023
Mar 15, 2023
Feb 28, 2023
Feb 28, 2023
Mar 15, 2023
Mar 15, 2023
Mar 15, 2023
Mar 15, 2023
Feb 28, 2023
Feb 28, 2023
Feb 28, 2023

Repository files navigation

USSC Pipeline

Overview

This pipeline automates the following tasks in sequence to prepare the US Sentencing Commission Individual Offender Datafiles for analysis:

  • Download files from website (results saved to data/01_source)
  • Extract file contents (results saved to data/02_raw)
  • Parse the SAS helper files (.sas) to ingest the raw data files (.dat) in fixed-width format
  • Reduce the dimensionality of the yearly-files using 2_io_download.R as a guide to subset on features (columns) of interest (variable sets for each yearly file are saved as supplementary .txt files in data/00_meta)
  • Save data files in .csv format (data/03_csv)
  • Push contents of data/03_csv to shared OneDrive folder

Execution notes

Using the .csv files

  • File year is saved in the column DATAYEAR (previously opafy in source repo)
  • Converted .csv files can be accessed directly from the shared OneDrive folder ussc_pipeline/data/03_csv

Full reproduction

If you want to reproduce this entire pipeline from scratch:

  • Clone this repo
  • Modify _targets.R
    • comment-out or delete the last tar_target(...) command starting on line 53
  • Run renv::restore() from console in RStudio to load necessary project dependencies via renv

About

Pipeline for harvesting and re-routing USSC individual offender files to cloud properties for downstream computation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published