Data Analysis: Do high emissions predict reporting non-compliance? #147

colton-lapp · 2025-01-13T23:50:39Z

Description

This pull request creates:

A jupyter notebook file with data analysis findings
A blog post describing the findings in the jupyter notebook file

The blog post investigates a question raised in #114 - does poor performance correlate with non-reporting? The short answer is no, I didn't find that pattern in the data.

The data analysis in the Jupter notebook consists of the following steps:

Create some basic data viz showing variables of interest and compliance trends over time
Create lagged variables of emissions last year and the emission trends from 2 years ago to 1 year ago
Create graphs comparing mean/median GHG intensity last year and GHG trend from 2 years ago to 1 year ago vs reporting compliance, showing basically no difference
Run a regression with a single control variable (square footage) to confirm there is no statistically significant relationship
Run some robustness checks by dropping outliers and dropping covid and repeating steps 3-4; still no significant results

These findings are then summarized in a new blog post.

A couple other notes:

I used the graphing package plotly to make html graphs that are interactive, and embedded those in my blog. Because the html graphs allow you to over over individual data points and display info, they take up a decent amount of space (between 1-20 mb). This also makes the Jupyter notebook file larger, but I tried to cut down the size by making some of the plots static image files
I don't know anything about javascript or html so I relied on Gen AI to do a lot of the coding for embedding and rendering interactive html graphs and fetching regression results from a JSON file. This could probably use some serious attention
I've introduced some new dependencies for data visualization and am not sure how this is managed in the project

This is my first time creating a PR for a public repo and for this project specifically so happy to restructure any work or accept any feedback! I'm expecting some heavy feedback on files committed (i.e. new packages used in requirements.txt, python virtual environment, directory structure).

Fixes #114

Testing Instructions

I would recommend pulling, running docker-compose up and looking at my blog. Additionally, check out the Jupyter notebook to verify that I'm analyzing the correct variables and don't have data analysis mistakes, etc. To see the interactive html graphs in the notebook, you have to view the Jupyter file in NBViewer as it won't render in Github

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

netlify · 2025-01-13T23:50:56Z

✅ Deploy Preview for radiant-cucurucho-d09bae ready!

Name	Link
🔨 Latest commit	`048cd6e`
🔍 Latest deploy log	https://app.netlify.com/sites/radiant-cucurucho-d09bae/deploys/67afed985582720008e7a940
😎 Deploy Preview	https://deploy-preview-147--radiant-cucurucho-d09bae.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

src/pages/blog/GHGIntensityPredictCompliance.vue

… for plotly

…outliers

colton-lapp · 2025-01-15T23:00:47Z

I've reduced the file size of the graphs down to a cumulative 2MB. I did this by dropping some of the data displayed on hover. We could reduce the file size even more by not displaying every single observation in the scatterplots and only displaying a handful of of the buildings that have standard emissions (it's hard to tell them apart anyways). We could also just convert the images to static PNG files. Let me know what you think is best.

vkoves · 2025-01-20T22:41:10Z

@colton-lapp - I meant to comment when I pushed up my fixes - I've added some date stamps to the blog posts and reodered it so yours comes first (since it's newer). I'm fine with that file size, but it looks like there's some responsiveness issues with the graphs - if you can fix those, I'm good with it, but otherwise we could move to images. Here's an example:

Desktop (shows scrollbar)	Mobile (cut-off)

Also is there a way to note dependencies for you Jupyter notebook? I tried running it locally but had to manually install dependencies like plotly, which aren't in our requirements.txt. Maybe you can add some instructions at the start of your notebook and maybe a requirements.txt file? I don't know what's typical there

…s, plotly, and running regressions. Cleaned up notebook comments

…rements.txt

vkoves · 2025-02-03T23:21:59Z

@colton-lapp - wanted to check in on this, it looks like Pytest is failing, and if you need help using the graph images just on mobile, let me know!

I also showed the preview to one of our partner's at Climate Reality Chicago (who asked for this research question), and they were a big fan, and said:

I love, love, love the visuals and the way you methodically walk the reader through your line of thinking! And I can imagine the “Distribution of GHG Intensities” image being useful for us in other contexts in the future.

The only change I might suggest is that you have 2 graphs immediately above “Results: No noticeable difference between groups” that mention “last year” and “this year” in their titles. It might be good to replace those with numerical years so that people don’t have to check the blog post publication year / can refer to those graphs in other contexts without creating ambiguity about which year’s data are included. Just a thought.

I agree with her feedback on adding years, and I think that would be a simple improvement to each graph, just to make sure if it gets screenshotted and shared it's really clear what data we used. Even adding Chicago in there might be helpful, but 🤷🏻‍♂️

vkoves · 2025-02-07T02:08:05Z

src/pages/blog/GHGIntensityPredictCompliance.vue

+      </p>
+
+      <p class="constrained">
+        Many buildings in the publicly available data


Let's change "the publicly available data" to "Chicago's building benchmarking data" across your article and link the first time to the main data source we use on the site.

vkoves · 2025-02-07T02:09:36Z

src/pages/blog/GHGIntensityPredictCompliance.vue

+          rel="noopener noreferrer"
+          >don't report emissions data.</a
+        >
+        Is there a pattern to which buildings fail to report? Anecdotally, it's


I'd change this "Anecdotally..." sentence to "Our team has noticed that some high emissions buildings stop reporting, while more efficient buildings tend to keep reporting year after year." There unfortunately has not been real press coverage of this data 😭

vkoves · 2025-02-07T02:10:27Z

src/pages/blog/GHGIntensityPredictCompliance.vue

+
+      <p>
+        The graph below depicts the count of buildings that did and did not
+        report emissions data every year.


I'd change "every year" to "each year", unless it's a cumulative graph

… fix requirements.txt

colton-lapp · 2025-02-09T02:29:15Z

@vkoves

Finally got some time to work on this and addressed everything you mentioned I think:

Tests passing with new requirements.txt file (note, you wanted a seperate requirements.txt file but the tests seem to use src/data/requirements.txt, so I had to change that one. I have a seperate file in src/data/analysis/predict_compliance_requirements.txt that is identical which I was hoping to use. I'll let you sort this out in the tests if you want
Using basic javascript to check for mobile, and showing static PNG files if it's mobile
Adopted your phrasing changes. Linked to the dataset when I mention it the first time
Added authorship to the top (feel free to restyle) which links to my Github (I don't really have social media, happy to take the link off)
I changed the graphs as suggested to reference specific years. Note - in the previous graphs, I was actually showing the data for all the years, hence the vague language (i.e. saying "emissions last year for compliance this year had data points with emissions in 2018, compliance in 2019 AND emissions in 2019, compliance in 2020, etc). I subset the dataframe to only show the most recent year to make it clearer and updated the title
Some other small tweaks to graphs and stuff

Final todo: Fix regression table at the bottom to be prettier

src/pages/blog/GHGIntensityPredictCompliance.vue

vkoves

@colton-lapp - the resize change works great, but if you test your preview on mobile, the font sizes are too small to read. Can you re-export those PNGs with larger font sizes targeted to mobile? You could also move the legend below the graphs (if that's an option) and you need more room.

I've also made a few minor tweaks to the blog post to improve the styling, particularly the regression table:

vkoves · 2025-02-11T02:49:52Z

src/pages/blog/GHGIntensityPredictCompliance.vue

+  }
+
+  checkScreenSize(): void {
+    this.isMobile = window.innerWidth <= 768;


Not blocking, but in the future I'd recommend doing this with CSS, which can handle checking screen size changes automatically. You'd do something like:

 <img class="mobile-only" ....> <iframe class="desktop-only">

.graph-cont { .mobile-only { display: none; } // Mobile styling @media (max-width: $mobile-max-width) { .desktop-only { display: none; } .mobile-only { display: block; } } }

colton-lapp added 5 commits December 18, 2024 22:41

Feat: First draft of emissions predicting compliance blog

b236f34

fixed plotly graphs not rendering in nbviewer

f06aeda

finished draft of blog and cleaned up notebook

12a7664

Reduced memory by turning interactive plots to static images

59cb9c5

Cleaned up code in notebook, typos in blog, saved png images

174f66d

colton-lapp self-assigned this Jan 13, 2025

colton-lapp added enhancement New feature or request data Data updates & tweaks labels Jan 13, 2025

colton-lapp and others added 2 commits January 14, 2025 20:41

Merge branch 'main' into compliance-analysis

c570860

Run Prettier after ignoring auto-generated files

9dfb65d

github-advanced-security bot found potential problems Jan 15, 2025

View reviewed changes

src/pages/blog/GHGIntensityPredictCompliance.vue Fixed Show fixed Hide fixed

src/pages/blog/GHGIntensityPredictCompliance.vue Fixed Show fixed Hide fixed

src/pages/blog/GHGIntensityPredictCompliance.vue Fixed Show fixed Hide fixed

vkoves and others added 3 commits January 14, 2025 21:09

Move styles to CSS

d1730f4

reduced file size of html images by not saving javascript source code…

f25fe26

… for plotly

Reduced file size of graphs by dropping hover text and only plotting …

447a681

…outliers

colton-lapp requested a review from vkoves January 15, 2025 22:59

vkoves added 3 commits January 18, 2025 15:37

Tweak About page

6721455

Reorganized blog page and added publish dates

613d695

Run Prettier

3fe0d61

colton-lapp added 2 commits January 21, 2025 22:16

Updated requirements.txt to include dependencies for Jupyter notebook…

991002b

…s, plotly, and running regressions. Cleaned up notebook comments

Fixed ipython version that wasn't compatible with python 3.9 in requi…

4065b79

…rements.txt

vkoves reviewed Feb 7, 2025

View reviewed changes

colton-lapp added 2 commits February 8, 2025 18:32

Trying to fix dependencies with virtual env and python 3.9

b3fa757

Updated language, fixed images for mobile, added authorship, tried to…

2b38356

… fix requirements.txt

Ran prettier

8c36999

github-advanced-security bot found potential problems Feb 9, 2025

View reviewed changes

colton-lapp added 2 commits February 8, 2025 20:44

Fixed typo and linting errors

8ab5b1b

Ran prettier

1bccc87

github-advanced-security bot found potential problems Feb 9, 2025

View reviewed changes

colton-lapp and others added 4 commits February 8, 2025 21:08

updated description of graphs to reflect it's only current year

7df4715

Fix cranky ESLint

0ca6540

Tweak blog post styling

595ae08

Fix table on mobile

704d0b1

vkoves requested changes Feb 11, 2025

View reviewed changes

vkoves added 2 commits February 13, 2025 23:36

Merge branch 'main' into compliance-analysis

a60e769

Run Prettier

048cd6e

vkoves mentioned this pull request Feb 15, 2025

Implement Homepage v2 Redesign #157

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Analysis: Do high emissions predict reporting non-compliance? #147

Data Analysis: Do high emissions predict reporting non-compliance? #147

colton-lapp commented Jan 13, 2025 •

edited

Loading

netlify bot commented Jan 13, 2025 •

edited

Loading

colton-lapp commented Jan 15, 2025

vkoves commented Jan 20, 2025

vkoves commented Feb 3, 2025

vkoves Feb 7, 2025 •

edited

Loading

vkoves Feb 7, 2025

vkoves Feb 7, 2025

colton-lapp commented Feb 9, 2025 •

edited

Loading

vkoves left a comment

vkoves Feb 11, 2025

Data Analysis: Do high emissions predict reporting non-compliance? #147

Are you sure you want to change the base?

Data Analysis: Do high emissions predict reporting non-compliance? #147

Conversation

colton-lapp commented Jan 13, 2025 • edited Loading

Description

Testing Instructions

Checklist:

netlify bot commented Jan 13, 2025 • edited Loading

✅ Deploy Preview for radiant-cucurucho-d09bae ready!

colton-lapp commented Jan 15, 2025

vkoves commented Jan 20, 2025

vkoves commented Feb 3, 2025

vkoves Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

vkoves Feb 7, 2025

Choose a reason for hiding this comment

vkoves Feb 7, 2025

Choose a reason for hiding this comment

colton-lapp commented Feb 9, 2025 • edited Loading

vkoves left a comment

Choose a reason for hiding this comment

vkoves Feb 11, 2025

Choose a reason for hiding this comment

colton-lapp commented Jan 13, 2025 •

edited

Loading

netlify bot commented Jan 13, 2025 •

edited

Loading

vkoves Feb 7, 2025 •

edited

Loading

colton-lapp commented Feb 9, 2025 •

edited

Loading