forked from sussmanbu/ma415_final_project
-
Notifications
You must be signed in to change notification settings - Fork 1
/
big_picture.qmd
190 lines (152 loc) · 9.05 KB
/
big_picture.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---
title: "Color Lines: Mapping Racial Divides in New York State"
description: "A group of data science student analyzes racial disparities with focus on household composition, income, education level among demographic groups in New York."
toc: true
draft: FALSE
---
![](images/big_picture_image.png){width="800"}
## Thesis
Our study looks into racial disparities in the New York state and we found that racial disparities existed at income, education, and geographically across in each county in the state.
## Analysis
### Proportions of Income groups in each Race
![](images/Household%20weight.png)
This graph shows that income level vary among races. As we can observe here, Asian and White race has the highest proportion of income range above \$50000 and White has the lowest proportion for \$0 to \$10000 income group. On the other hand, Black and Hispanic families shared a much lower income level with the lowest relative proportion in the highest income range, and the highest proportion in the lowest income range.
### Impact of racial disparities on Education
```{r include = FALSE}
## Loading Clean Data
library(here)
# Set the path to the RDS file
cleaned_dataset_path_rds <- here("dataset", "cleaned_NYSERDA_LMI_Census_2013-2015.rds")
# Load the dataset from the RDS file
clean_data <- readRDS(cleaned_dataset_path_rds)
```
```{r echo = FALSE}
library(ggplot2)
ggplot(clean_data, aes(x = Race_Ethnicity, fill = Income_Groups)) +
geom_bar(position = "fill", stat = "count") +
facet_wrap(~ Education_Level) +
labs(title = "Income Distribution by Education Level and Race",
x = "Education Level",
y = "Percentage") +
scale_y_continuous(labels = scales::percent) +
theme_minimal() +
theme(text = element_text(size = 12),
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text.x = element_text(size = 7, face = "bold"))
```
We often relate educations attainment with income levels and assume it's one of the most important indicator of education levels, and a higher education attainment means higher income. However, different races may experience different educational influence in their income. As we can observe in this chart, where each chart shows the different distribution of income levels between different races at a given income level, Black population often exhibit the least proportion of income higher than \$50000 compared to its counterpart with the same education level. This graph suggests that racial disparities could contribute to how education influences income level and some races are less beneficial than others in education.
### Interactive Map
Now let's look at the New York State from its map. With all the 62 counties in New York, they different from each others in terms of racial demographics, income level, education levels, and much more.
```{r, echo = FALSE}
suppressMessages({
library(sf)
library(here)
library(dplyr)
library(stringr)
library(tidyr)
clean_data <- readRDS(here("dataset", "cleaned_NYSERDA_LMI_Census_2013-2015.rds"))
# Reading shapefile quietly
ny_counties <- st_read("dataset-ignore/tl_2023_us_county.shp", quiet = TRUE)
ny_counties <- ny_counties[ny_counties$STATEFP == '36', ]
demographic_data <- clean_data
})
```
```{r echo = FALSE}
#install.packages("modeest")
library(modeest)
# Loading geographic data
ny_counties_sf <- st_as_sf(ny_counties)
ny_counties_sf <- st_transform(ny_counties_sf, crs = 4326)
#Adding the average wage from the combined dataset
dataset_to_combine <- read.csv("combining-dataset/Quarterly_Census_of_Employment_and_Wages_Annual_Data__Beginning_2000_20240415.csv", stringsAsFactors = FALSE) |>
mutate_if(is.character, str_trim) # Trim leading and trailing whitespace from all character columns
#Counting the average wage for each county from another dataset
county_avg_wage <-
filter(dataset_to_combine, Area.Type == "County") |>
group_by(Area) |>
summarise(Total_Wages = sum(`Total.Wage`, na.rm = TRUE),
Total_Employment = sum(`Average.Employment`, na.rm = TRUE)) |>
mutate(Average_Wage = Total_Wages / Total_Employment) |>
rename(County = Area) |>
mutate(County = str_replace_all(County, fixed(" County"), "")) |>
select(-c(Total_Wages, Total_Employment))
#make a table with racial proportion for each race in each county
race_proportion <- demographic_data |>
group_by(Race_Ethnicity, County) |>
summarise(count=n(), .groups = 'drop') |>
group_by(County) |>
mutate(total = sum(count)) |>
ungroup() |>
mutate(proportion = count / total) %>%
select(-count, -total) %>%
pivot_wider(names_from = Race_Ethnicity, values_from = proportion)
# Aggregating demographic data to find least common race, average income, and most common education level
demographic_data_aggregated <- demographic_data %>%
group_by(County) %>%
summarise(
Least_Common_Race_Ethnicity = names(sort(table(Race_Ethnicity), decreasing = FALSE))[1],
Most_Common_Education_Level = names(sort(table(Education_Level), decreasing = TRUE))[1],
#Average_Income = weighted.mean(Income_Numeric, Household_Weight, na.rm = TRUE),
#.groups = 'drop'
)|>
left_join(county_avg_wage, by = "County") |>
left_join(race_proportion, by = "County")
# Merging
merged_data <- ny_counties_sf %>%
left_join(demographic_data_aggregated, by = c("NAME" = "County"))
```
```{r include = FALSE}
# install.packages("leaflet")
library(RColorBrewer)
library(leaflet)
library(scales)
library(htmlwidgets)
# Color palettes
race_colors <- colorFactor(brewer.pal(9, "Pastel1"), domain = merged_data$Least_Common_Race_Ethnicity, na.color = "grey50")
education_colors <- colorFactor(brewer.pal(9, "Set3"), domain = merged_data$Most_Common_Education_Level, na.color = "grey50")
wage_colors <- colorNumeric(palette = "viridis", domain = merged_data$Average_Wage, na.color = "grey50")
interactive_map <- leaflet(merged_data) %>%
addProviderTiles(providers$CartoDB.Positron) # Adding base tiles
# Function to add data polygons with specific legends
add_data_polygons <- function(data_column, color_palette, group_name, format_popup) {
interactive_map <<- interactive_map %>%
addPolygons(
fillColor = ~color_palette(get(data_column)),
color = "#444444",
weight = 1,
opacity = 1,
fillOpacity = 0.7,
popup = ~paste(NAME, "<br>", format_popup(get(data_column))),
group = group_name
) %>%
addLegend(
position = "bottomleft",
pal = color_palette,
values = ~get(data_column),
title = group_name,
opacity = 0.7,
labFormat = labelFormat(),
group = group_name
)
}
# Adding legends
add_data_polygons("Least_Common_Race_Ethnicity", race_colors, "Race/Ethnicity", function(x) paste("Least Common Race/Ethnicity:", x))
add_data_polygons("Most_Common_Education_Level", education_colors, "Education Level", function(x) paste("Most Common Education Level:", x))
add_data_polygons("Average_Wage", wage_colors, "Average Wage", function(x) paste("Average Wage: $", format(x, big.mark = ",")))
# Adding population maps
population_cols <- c("Black, non-Hispanic", "Asian, non-Hispanic", "White, non-Hispanic", "Hispanic")
names(population_cols) <- c("Black/AA Population", "Asian Population", "White Population", "Hispanic Population")
for (pop_col in names(population_cols)) {
local_color_palette <- colorNumeric(palette = "viridis", domain = merged_data[[population_cols[pop_col]]], na.color = "grey50")
add_data_polygons(population_cols[pop_col], local_color_palette, pop_col, function(x) paste(pop_col, ": ", scales::percent(x), sep = ""))
}
interactive_map <- interactive_map %>%
addLayersControl(
overlayGroups = c("Race/Ethnicity", "Education Level", "Average Wage", "Black/AA Population", "Asian Population", "White Population", "Hispanic Population"),
options = layersControlOptions(collapsed = FALSE)
)
print(interactive_map)
```
![Team 7Up - Interactive Component](images/interactive_map.mov)
Here in the map, we can observe and identify that certain counties with higher proportion of certain races would have higher or lower income. For example, Manhattan county in the middle of NYC has a significantly higher average wage than other counties. Bronx county just up north Manhattan with higher Black & African American population has much lower income compared other counties in the state. Additionally, the most common education level is High School. However, in counties coming closer to New York City, individuals, mostly other race - not Asian or Black, do have Bachelor or higher degree. This map exposes the racial inequality existed in county level in terms of income and education aspects, also implicating the racial segregation in the New York State.
*Note: This is not the 4-minute video recording of our team. In our recording presentation (link attached in README.md file on our Github repository), we also introduced this map and briefly explained it. For some reasons, we are not able to render to display our interactive map in this website. To play around and interact more with this map, please run the code in "big_picture.qmd" file. This map does have some user-friendly functionality such as zoom in/out or filter of your choice.*