Skip to content

Latest commit

 

History

History

2019-12-03

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Philadelphia Parking Violations

This week's data is from Open Data Philly - there is over 1 GB of data, but I have filtered it down to <100 MB due to GitHub restrictions. I accomplished this mainly by filtering to data for only 2017 in Pennsylvania that had lat/long data. If you would like to use the entire dataset, please see the link above.

H/t to Jess Streeter for sharing this week's data!

Some visualizations from Philly Open Data and a news article by NBC Philadelphia.

Get the Data

tickets <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-03/tickets.csv")

# Or read in with tidytuesdayR package (https://github.com/dslc-io/tidytuesdayR)
# Either ISO-8601 date or year/week works!
# Install via pak::pak("dslc-io/tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load("2019-12-03")
tuesdata <- tidytuesdayR::tt_load(2019, week = 49)

tickets <- tuesdata$tickets

Cleaning

# Load Libraries ----------------------------------------------------------

library(here)
library(tidyverse)


# Read in raw Data --------------------------------------------------------

df <- read_csv(here("2019", "2019-12-03", "parking_violations.csv"))

small_df <- df %>% 
  mutate(date = lubridate::date(issue_datetime),
         year = lubridate::year(date)) %>% 
  filter(year == 2017, state == "PA") %>% 
  
  # removing date/year as duplicative
  # removing state as all PA
  # gps as filtering only lat/long present
  # division is > 60% missing
  # location as a very large amount of metadata without as much use
  
  select(-date, -year, -gps, -location, -state, -division) %>% 
  filter(!is.na(lat))

pryr::object_size(small_df)

# Write to csv ------------------------------------------------------------

write_csv(small_df, here("2019", "2019-12-03", "tickets.csv"))