Skip to content

Regression and Statistical analysis of crimes in Chicago from 2015 until 2020.

Notifications You must be signed in to change notification settings

leo-cavalcante/crimes-in-chicago

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ironhack Logo

"Crimes in Chicago" statistically significant insights and Linear Regression

Leonardo Cavalcante Araújo, Vinamrata Yadav, Natalia Calderón

Data Analytics Full-Time FEB2021, Paris & March 12nd 2021

Content

Chicago crimes

Project Description

Group project developed in trio, during a weekend and 2 weekdays (totalising 4 days).

Objective

The project had 2 distinct objectives:

  1. Derive statistically significant insights from a database.
  2. Model a regression analysis for a variable (in this project, we have chosen to do use the linear regression to predict the probability of a crime to happen in a given date with some given circunstances.)

Workflow

  1. Database search and download, finally deciding on a open source database from the Chicago Data Portal - Crimes from 2001 to Present. The resulting database had 20 years of observations, totalising 7.5 million rows.
  2. Data Cleaning and filtering for the past 5 years (2015-2020), resulting in a database of around 1.5 million observations.
  3. Data Analysis & Visualisations: Using Python, Matplotlib and Seaborn.
  4. Hypothesis Testing: to test statistically significant events.
  5. Linear Regression using OLS (Ordinary Least Squares): to predict crimes happening in a given date with known circonstances.
  6. Assumptions testing: verification of the assumptions for the OLS model.
  7. Presentation: slides construction and oral presentation to our Ironhack Cohort.

Organization

Group members responsibilities

  • Leonardo: full Data Cleaning, some data visualisations, 1 Hypothesis Test, the whole Linear Regression (using OLS), plus a big part of the Google Slides presentation.
  • Vina: some data analysis, some data visualisations, 2 hypothesis tests and some slides in the Google Slides presentation.
  • Natalia: research of database and some interesting insights, some data analysis and a few slides of the final presentation.

Links

Here you may find the relevant links for the main documents produced during this project:

Chicago Crimes - Google Slides Final Presentation

GitHub Repository: crimes-in-chicago

Crimes in Chicago - Cleaning

Crimes in Chicago - Geographical Analysis

Crimes in Chicago - Typology of Crimes and Arrests

Crimes in Chicago - Crimes per Communities

Crimes in Chicago - Time Analysis

PS.: only the main files have been mentioned in this section, nevertheless the repository contains also other auxiliary files.