Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

Commit

Permalink
Release 1.0.0 (#24)
Browse files Browse the repository at this point in the history
As this is the final release, it does the following:
- closes #18
- closes #20
- Updates the `.git-blame-ignore-revs`
- Cleanup some formatting in `final_report_group04.md
- Fix backwards title on plot
- Marks the Project Milestone 6 submission/1.0 release

Co-authored-by: IfYouWantMoney <[email protected]>
  • Loading branch information
Bluesy1 and theHDarian authored Apr 13, 2023
1 parent 3ddb249 commit 6b7d133
Show file tree
Hide file tree
Showing 6 changed files with 95 additions and 17 deletions.
16 changes: 14 additions & 2 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,19 @@

b5c91592f83de86ea0b018414910059491e467aa
9b203cc8841b133f21a7bc82dd04e88443b8e3d2
f25e464ec07366a21217986c183888a33c2e43c3
899b26124d605e9e037bf335a3809a5f6c262384
2a7ad6c963c8de9590391bb960a5e6765a57b25d
f25e464ec07366a21217986c183888a33c2e43c3
b704797d3b13e2a0b1262849e227448c6b49c681
899b26124d605e9e037bf335a3809a5f6c262384
d407a9f749175b6c8889ff0be4c5a1a0b47006bf
d3bab21ad2231afb72279de9234b2befb9ad4582
7ec2840e6021fed3f03646c7559e2c46793d6c7e
9103b146c5a372fa3493b67203fe9e34ed4edd7d
771ab67557caf3d195daf7eb870f97f285334711
afb9e8c8beffe210517aec0121a04d42987bf4a2
e6ea7763c045c6e95bc07ba811a2d2dd10cb6566
84df65bb77750483935bf2c86e0669addcbaca92
b40371f22a9a0b7027292f0bae5db37a213ec1ad
1a4eaf5c5ba2d9f2136fe27988ba308fb964e120
b41ff86a889eb5e0a8410349f958b3194752d84c
0f629b4f5bab72417c40abbc90792b39b15a061f
69 changes: 69 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,75 @@ authors:
family-names: Kendal-Freedman
- given-names: Sky
family-names: Huang
identifiers:
- type: url
value: >-
https://github.com/ubco-W2022T2-data301/project-group-group04/releases/tag/1.0.0
description: Release Version
repository-code: >-
https://github.com/ubco-W2022T2-data301/project-group-group04
abstract: >-
Brief analysis of air quality and asthma data in the US
from the EPA and CDC.
keywords:
- Air Quality
license: MIT
version: 1.0.0
date-released: '2023-04-13'
references:
- title: "Air Data: Air Quality Data Collected at Outdoor Monitors Across the US"
publisher:
name: "US Environmental Protection Agency"
authors:
- name: "US Environmental Protection Agency"
url: https://www.epa.gov/outdoor-air-quality-data
date-accessed: "2023-01-30"
date-downloaded: "2023-01-30"
scope: "Data Source for all air quality data"
data-type: CSV
type: data
- title: "PLACES: Local Data for Better Health, Census Tract Data"
publisher:
name: "Centers for Disease Control and Prevention"
authors:
- name: "Centers for Disease Control and Prevention"
url: https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh
year: 2022
date-accessed: "2023-01-30"
date-downloaded: "2023-01-30"
data-type: CSV
scope: "Data Source for PLACES asthma data"
keywords:
- "places"
- "census tract"
- "brfss"
- "prevalence"
- "risk"
- "behaviors"
- "outcomes"
- "prevention"
- "health"
- "status"
type: data
- title: "Raw data files used by the geoplot examples and documentation"
authors:
- given-names: Aleksey
family-names: Bilogur
repository: https://github.com/ResidentMario/geoplot-data
date-accessed: "2023-03-02"
date-downloaded: "2023-03-02"
data-type: GEOJSON
scope: "Geographic Data Source for file `contiguous-usa.geojson`"
type: data
- title: "2018 Cartographic Boundary Files - Shapefile"
publisher:
name: "U.S. Census Bureau"
authors:
- name: "U.S. Census Bureau"
url: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
date-accessed: "2023-03-04"
date-downloaded: "2023-03-04"
year: 2018
data-type: SHAPEFILE
scope: "Geographic Data Source for data folders `cb_2018_us_cbsa_5m` and `cb_2018_us_cbsa_500k`"
type: data
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ This dataset is a combination of annual air quality index summaries sorted by CB
## Team Members

1. I'm Gavin Kendal-Freedman, a 3rd year at UBC Okanagan, Majoring in Chemistry, and taking a minor in Data Science, and I'm a dual US-Canadian citizen, originally from Seattle, WA, in the US, and I am a strong environmentalist.
2. Person 2: Hello, I'm Sky and I'm a computer science major with a data science minor at UBC. I'm interested in data science and I'm excited to learn more about it in this course!
2. Hello, I'm Sky and I'm a computer science major with a data science minor at UBC. I'm interested in data science and I'm excited to learn more about it in this course!

## Images

<!--{You should use this area to add a screenshot of an interesting plot, or of your dashboard} -->

Sample dashboard for data analysis:

<img src ="images/Dashboard.png" width="500px" alt="Dashboard for one of the analysis questions">
<img src ="images/Dashboard 1.png" width="500px" alt="Dashboard for one of the analysis questions">

## References

Expand Down
12 changes: 6 additions & 6 deletions analysis/analysis1.ipynb

Large diffs are not rendered by default.

11 changes: 4 additions & 7 deletions final_report_group04.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@ As time proceeds, pollution from industries is steadily making the air quality i

For our exploratory analysis, we mostly focused on wether or not there were any correlations that we were expecting, or not expecting. To do that, one of the things we did was aggregate the different air quality parameters into heatmap, which did indicate that some of the expected correlations are there. Furthermore, a choropleth of AQI by region, and point based asthma prevalence was generated to check if there were any potential trends, or if it was truly random. These plots are shown below:

<img src ="images/gavin-heatmap.png" width="500px" alt="Heatmap of parameters">

<img src ="images/gavin-plot-1.png" width="500px" alt="Choropleth of AQI with asthma prevalence overlaid">
<img src ="images/gavin-heatmap.png" width="350px" height="350px" alt="Heatmap of parameters"> <img src ="images/gavin-plot-1.png" width="500px" alt="Choropleth of AQI with asthma prevalence overlaid">

# Questions and Results

Expand All @@ -26,17 +24,16 @@ Looking at the [plot of Max AQI against Crude Asthma prevalence](./images/gavin-

[^1]: [2020 Western US Wildfire Season](https://en.wikipedia.org/wiki/2020_Western_United_States_wildfire_season) - While Wikipedia is not a primary source, it has aggregated information about the extent of wildfires this year, and more complete information is available on the internet if the reader wishes to go into more detail. The exacts of the wildfires are not important for the data analysis here, just that the AQI is highly sample biased on the west coast for 2020.

[^2]: Given the wildfires and as such sampling bias issues identified, the reader may ask why 2020 was chosen as the year for air quality. The asthma data released by the CDC was only for the year 2020 and as such, the most representative year to compare it to air quality was 2020, other years may have different measurements of other populations which could skew analysis further, and since the main particulates from wildfire smoke (PM2.5, PM10, and $\text{CO}_2$) are not being analyzed here, the wildfires will not effect the analysis very much overall.
[^2]: Given the wildfires and as such sampling bias issues identified, the reader may ask why 2020 was chosen as the year for air quality. The asthma data released by the CDC was only for the year 2020 and as such, the most representative year to compare it to air quality was 2020, other years may have different measurements of other populations which could skew analysis further, and since the main particulates from wildfire smoke (PM2.5, PM10, and CO<sub>2</sub>) are not being analyzed here, the wildfires will not effect the analysis very much overall.

Also, the [AQI time delta plot](./images/gavin-plot-3.png) and the [violin plot](./images/gavin-aqi-violinplot.png) show median AQI across time as deltas and absolutes, which shows that over time, there are fluctuations across the country across years, there are small, temporary fluctuations, but overall AQI tends to have stayed fairly steady over time at median[^3] yearly readings, indicating over time AQI has stayed the same in most regions.

[^3]: Median readings were chosen here over Max AQI to limit the effects of wildfires as discussed above.

Overall, we can find positive correlations between Ozone Levels, Carbon Monoxide, Nitro (NOx) Compounds, with a potential weak correlation with Sulfur Dioxide, an emission from some fuels like diesel and natural gas. Futhermore, There is not a direct correlation between AQI and Crude asthma prevalence in the US, however comparing individual groups of parameters shows that there is a correlation between Ozone, Nitro (NOx) Compounds, and Carbon Monoxide, at least in certain portions of the US (corn and tornado belts/alleys), which show there is not a difference between rural and urban areas necessarily. A final analysis across times showed that AQI does not vary massively across time, at least over the time period investigated, and the minor variations across years are not significant, especially once differences due to wildfire events[^1] have been taken into account.

*Note:* A more in context analysis can be seen in the jupyter notebook from which analysis was done in, which can be found [here](./analysis/analysis1.ipynb).

## Question 2

Our second research question, look at the effects of humidity on the air quality, by looking at the AQI and the level of CO, CO2, NO2, and other airborne molecules and particulates at different humidity levels. This is done by separation the data into dry and humid areas, and then looking at the distribution of the pollutants in each area. This is achieved by splitting each CBSA zone into 3 categories using average relative humidity: Dry (relative humidity below 40%), Moderate (relative humidity from 40% to 60%), and Humid (relative humidity above 60%).

When looking at the [ridge line plot for pollutants](./images/skyridge1.png) at different humidity levels, no particular correlation between dry and humid areas can be observed, aside from a few outliers like Antimony being more concentrated in dry areas and Barium being less concentrated in humid areas.
Expand All @@ -49,6 +46,6 @@ Moving on from the analysis of gaseous molecules and particulate matter, the cor

<!-- A brief paragraph that highlights your key results and what you learned from doing this project. -->

Overall, connections between certain parameters, including but not limited to carbon monoxide, ozone, sulfur dioxide, and nitro (NOx) compounds do have relatively strong effects on AQI, and a subset of those parameters appear to effect asthma prevalence. Furthermore, across the last 10 years, there have been no statistically significant changes in regional air quality[^1] [^4]. Additionally, it appears that there is not a statically significant correlation between most pollutants and relative humidity overall. However, it is worth noting that this observation may be influenced by extreme outliers. Additionally, aggregate metrics such as median AQI showed a correlation with high relative humidity during certain years, namely 2020 and 2021, but it is unclear what causes this relationship. Possible explanations include recent forest fires[^1] or differences in the surrounding air quality between dry and humid areas, but more data would be needed to support these hypotheses. Further investigation is required to confirm these findings.
Overall, connections between certain parameters, including but not limited to carbon monoxide, ozone, sulfur dioxide, and nitro (NO<sub>x</sub>) compounds do have relatively strong effects on AQI, and a subset of those parameters appear to effect asthma prevalence. Furthermore, across the last 10 years, there have been no statistically significant changes in regional air quality[^1][^4]. Additionally, it appears that there is not a statically significant correlation between most pollutants and relative humidity overall. However, it is worth noting that this observation may be influenced by extreme outliers. Additionally, aggregate metrics such as median AQI showed a correlation with high relative humidity during certain years, namely 2020 and 2021, but it is unclear what causes this relationship. Possible explanations include recent forest fires[^1] or differences in the surrounding air quality between dry and humid areas, but more data would be needed to support these hypotheses. Further investigation is required to confirm these findings.

[^4]: While there are some small changes in median AQI across years, it is not enough to be considered significant. See [this](./images/gavin-aqi-violinplot.png) plot.
Binary file modified images/gavin-plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6b7d133

Please sign in to comment.