big_picture.qmd

---
title: "Color Lines: Mapping Racial Divides in New York State"
description: "A group of data science student analyzes racial disparities with focus on household composition, income, education level among demographic groups in New York."
toc: true
draft: FALSE
---

![](images/big_picture_image.png){width="800"}

## Thesis

Our study looks into racial disparities in the New York state and we found that racial disparities existed at income, education, and geographically across in each county in the state.

## Analysis

### Proportions of Income groups in each Race

![](images/Household%20weight.png)

This graph shows that income level vary among races. As we can observe here, Asian and White race has the highest proportion of income range above \$50000 and White has the lowest proportion for \$0 to \$10000 income group. On the other hand, Black and Hispanic families shared a much lower income level with the lowest relative proportion in the highest income range, and the highest proportion in the lowest income range.

### Impact of racial disparities on Education

```{r include = FALSE}
## Loading Clean Data
library(here)
# Set the path to the RDS file
cleaned_dataset_path_rds <- here("dataset", "cleaned_NYSERDA_LMI_Census_2013-2015.rds")
# Load the dataset from the RDS file
clean_data <- readRDS(cleaned_dataset_path_rds)
```

```{r echo = FALSE}
library(ggplot2)
ggplot(clean_data, aes(x = Race_Ethnicity, fill = Income_Groups)) +
  geom_bar(position = "fill", stat = "count") +
  facet_wrap(~ Education_Level) +
  labs(title = "Income Distribution by Education Level and Race",
       x = "Education Level",
       y = "Percentage") +
  scale_y_continuous(labels = scales::percent) +
  theme_minimal() + 
  theme(text = element_text(size = 12), 
        axis.text.x = element_text(angle = 45, hjust = 1), 
        strip.text.x = element_text(size = 7, face = "bold")) 
```

We often relate educations attainment with income levels and assume it's one of the most important indicator of education levels, and a higher education attainment means higher income. However, different races may experience different educational influence in their income. As we can observe in this chart, where each chart shows the different distribution of income levels between different races at a given income level, Black population often exhibit the least proportion of income higher than \$50000 compared to its counterpart with the same education level. This graph suggests that racial disparities could contribute to how education influences income level and some races are less beneficial than others in education.

### Interactive Map

Now let's look at the New York State from its map. With all the 62 counties in New York, they different from each others in terms of racial demographics, income level, education levels, and much more.

```{r, echo = FALSE}
suppressMessages({
  library(sf)
  library(here)
  library(dplyr)
  library(stringr)
  library(tidyr)

  clean_data <- readRDS(here("dataset", "cleaned_NYSERDA_LMI_Census_2013-2015.rds"))

  # Reading shapefile quietly
  ny_counties <- st_read("dataset-ignore/tl_2023_us_county.shp", quiet = TRUE)

  ny_counties <- ny_counties[ny_counties$STATEFP == '36', ]
  demographic_data <- clean_data  
})

```

```{r echo = FALSE}
#install.packages("modeest")
library(modeest)

# Loading geographic data
ny_counties_sf <- st_as_sf(ny_counties)
ny_counties_sf <- st_transform(ny_counties_sf, crs = 4326)


#Adding the average wage from the combined dataset 
dataset_to_combine <- read.csv("combining-dataset/Quarterly_Census_of_Employment_and_Wages_Annual_Data__Beginning_2000_20240415.csv", stringsAsFactors = FALSE) |>
    mutate_if(is.character, str_trim) # Trim leading and trailing whitespace from all character columns

#Counting the average wage for each county from another dataset
county_avg_wage <- 
  filter(dataset_to_combine, Area.Type == "County") |>
  group_by(Area) |>
  summarise(Total_Wages = sum(`Total.Wage`, na.rm = TRUE),
            Total_Employment = sum(`Average.Employment`, na.rm = TRUE)) |>
  mutate(Average_Wage = Total_Wages / Total_Employment) |>
  rename(County = Area) |>
  mutate(County = str_replace_all(County, fixed(" County"), "")) |>
  select(-c(Total_Wages, Total_Employment))

#make a table with racial proportion for each race in each county 
race_proportion <- demographic_data |>
  group_by(Race_Ethnicity, County) |>
  summarise(count=n(), .groups = 'drop') |>
  group_by(County) |>
  mutate(total = sum(count)) |>
  ungroup() |>
  mutate(proportion = count / total) %>%
  select(-count, -total) %>%
  pivot_wider(names_from = Race_Ethnicity, values_from = proportion)

# Aggregating demographic data to find least common race, average income, and most common education level
demographic_data_aggregated <- demographic_data %>%
  group_by(County) %>%
  summarise(
    Least_Common_Race_Ethnicity = names(sort(table(Race_Ethnicity), decreasing = FALSE))[1],
    Most_Common_Education_Level = names(sort(table(Education_Level), decreasing = TRUE))[1],
    #Average_Income = weighted.mean(Income_Numeric, Household_Weight, na.rm = TRUE),
    #.groups = 'drop'
  )|>
  left_join(county_avg_wage, by = "County") |>
  left_join(race_proportion, by = "County")

# Merging
merged_data <- ny_counties_sf %>%
  left_join(demographic_data_aggregated, by = c("NAME" = "County"))
```

```{r include = FALSE}
# install.packages("leaflet")
library(RColorBrewer)
library(leaflet)
library(scales)
library(htmlwidgets)

# Color palettes
race_colors <- colorFactor(brewer.pal(9, "Pastel1"), domain = merged_data$Least_Common_Race_Ethnicity, na.color = "grey50")
education_colors <- colorFactor(brewer.pal(9, "Set3"), domain = merged_data$Most_Common_Education_Level, na.color = "grey50")
wage_colors <- colorNumeric(palette = "viridis", domain = merged_data$Average_Wage, na.color = "grey50")

interactive_map <- leaflet(merged_data) %>%
  addProviderTiles(providers$CartoDB.Positron)  # Adding base tiles

# Function to add data polygons with specific legends
add_data_polygons <- function(data_column, color_palette, group_name, format_popup) {
  interactive_map <<- interactive_map %>%
    addPolygons(
      fillColor = ~color_palette(get(data_column)),
      color = "#444444",
      weight = 1,
      opacity = 1,
      fillOpacity = 0.7,
      popup = ~paste(NAME, "<br>", format_popup(get(data_column))),
      group = group_name
    ) %>%
    addLegend(
      position = "bottomleft",
      pal = color_palette,
      values = ~get(data_column),
      title = group_name,
      opacity = 0.7,
      labFormat = labelFormat(),
      group = group_name
    )
}

# Adding legends 
add_data_polygons("Least_Common_Race_Ethnicity", race_colors, "Race/Ethnicity", function(x) paste("Least Common Race/Ethnicity:", x))
add_data_polygons("Most_Common_Education_Level", education_colors, "Education Level", function(x) paste("Most Common Education Level:", x))
add_data_polygons("Average_Wage", wage_colors, "Average Wage", function(x) paste("Average Wage: $", format(x, big.mark = ",")))


# Adding population maps
population_cols <- c("Black, non-Hispanic", "Asian, non-Hispanic", "White, non-Hispanic", "Hispanic")
names(population_cols) <- c("Black/AA Population", "Asian Population", "White Population", "Hispanic Population")
for (pop_col in names(population_cols)) {
  local_color_palette <- colorNumeric(palette = "viridis", domain = merged_data[[population_cols[pop_col]]], na.color = "grey50")
  add_data_polygons(population_cols[pop_col], local_color_palette, pop_col, function(x) paste(pop_col, ": ", scales::percent(x), sep = ""))
}

interactive_map <- interactive_map %>%
  addLayersControl(
    overlayGroups = c("Race/Ethnicity", "Education Level", "Average Wage", "Black/AA Population", "Asian Population", "White Population", "Hispanic Population"),
    options = layersControlOptions(collapsed = FALSE)
  )

print(interactive_map)
```

![Team 7Up - Interactive Component](images/interactive_map.mov)

Here in the map, we can observe and identify that certain counties with higher proportion of certain races would have higher or lower income. For example, Manhattan county in the middle of NYC has a significantly higher average wage than other counties. Bronx county just up north Manhattan with higher Black & African American population has much lower income compared other counties in the state. Additionally, the most common education level is High School. However, in counties coming closer to New York City, individuals, mostly other race - not Asian or Black, do have Bachelor or higher degree. This map exposes the racial inequality existed in county level in terms of income and education aspects, also implicating the racial segregation in the New York State.

*Note: This is not the 4-minute video recording of our team. In our recording presentation (link attached in README.md file on our Github repository), we also introduced this map and briefly explained it. For some reasons, we are not able to render to display our interactive map in this website. To play around and interact more with this map, please run the code in "big_picture.qmd" file. This map does have some user-friendly functionality such as zoom in/out or filter of your choice.*