Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Analysis: Do high emissions predict reporting non-compliance? #147

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

colton-lapp
Copy link
Collaborator

@colton-lapp colton-lapp commented Jan 13, 2025

Description

This pull request creates:

  1. A jupyter notebook file with data analysis findings
  2. A blog post describing the findings in the jupyter notebook file

The blog post investigates a question raised in #114 - does poor performance correlate with non-reporting? The short answer is no, I didn't find that pattern in the data.

The data analysis in the Jupter notebook consists of the following steps:

  1. Create some basic data viz showing variables of interest and compliance trends over time
  2. Create lagged variables of emissions last year and the emission trends from 2 years ago to 1 year ago
  3. Create graphs comparing mean/median GHG intensity last year and GHG trend from 2 years ago to 1 year ago vs reporting compliance, showing basically no difference
  4. Run a regression with a single control variable (square footage) to confirm there is no statistically significant relationship
  5. Run some robustness checks by dropping outliers and dropping covid and repeating steps 3-4; still no significant results

These findings are then summarized in a new blog post.

A couple other notes:

  • I used the graphing package plotly to make html graphs that are interactive, and embedded those in my blog. Because the html graphs allow you to over over individual data points and display info, they take up a decent amount of space (between 1-20 mb). This also makes the Jupyter notebook file larger, but I tried to cut down the size by making some of the plots static image files
  • I don't know anything about javascript or html so I relied on Gen AI to do a lot of the coding for embedding and rendering interactive html graphs and fetching regression results from a JSON file. This could probably use some serious attention
  • I've introduced some new dependencies for data visualization and am not sure how this is managed in the project

This is my first time creating a PR for a public repo and for this project specifically so happy to restructure any work or accept any feedback! I'm expecting some heavy feedback on files committed (i.e. new packages used in requirements.txt, python virtual environment, directory structure).

Fixes #114

Testing Instructions

I would recommend pulling, running docker-compose up and looking at my blog. Additionally, check out the Jupyter notebook to verify that I'm analyzing the correct variables and don't have data analysis mistakes, etc. To see the interactive html graphs in the notebook, you have to view the Jupyter file in NBViewer as it won't render in Github

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@colton-lapp colton-lapp self-assigned this Jan 13, 2025
Copy link

netlify bot commented Jan 13, 2025

Deploy Preview for radiant-cucurucho-d09bae ready!

Name Link
🔨 Latest commit 048cd6e
🔍 Latest deploy log https://app.netlify.com/sites/radiant-cucurucho-d09bae/deploys/67afed985582720008e7a940
😎 Deploy Preview https://deploy-preview-147--radiant-cucurucho-d09bae.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@colton-lapp colton-lapp added enhancement New feature or request data Data updates & tweaks labels Jan 13, 2025
@colton-lapp colton-lapp requested a review from vkoves January 15, 2025 22:59
@colton-lapp
Copy link
Collaborator Author

I've reduced the file size of the graphs down to a cumulative 2MB. I did this by dropping some of the data displayed on hover. We could reduce the file size even more by not displaying every single observation in the scatterplots and only displaying a handful of of the buildings that have standard emissions (it's hard to tell them apart anyways). We could also just convert the images to static PNG files. Let me know what you think is best.

@vkoves
Copy link
Owner

vkoves commented Jan 20, 2025

@colton-lapp - I meant to comment when I pushed up my fixes - I've added some date stamps to the blog posts and reodered it so yours comes first (since it's newer). I'm fine with that file size, but it looks like there's some responsiveness issues with the graphs - if you can fix those, I'm good with it, but otherwise we could move to images. Here's an example:

Desktop (shows scrollbar) Mobile (cut-off)
Screenshot from 2025-01-20 16-37-50 Screenshot from 2025-01-20 16-38-02

Also is there a way to note dependencies for you Jupyter notebook? I tried running it locally but had to manually install dependencies like plotly, which aren't in our requirements.txt. Maybe you can add some instructions at the start of your notebook and maybe a requirements.txt file? I don't know what's typical there

@vkoves
Copy link
Owner

vkoves commented Feb 3, 2025

@colton-lapp - wanted to check in on this, it looks like Pytest is failing, and if you need help using the graph images just on mobile, let me know!

I also showed the preview to one of our partner's at Climate Reality Chicago (who asked for this research question), and they were a big fan, and said:

I love, love, love the visuals and the way you methodically walk the reader through your line of thinking! And I can imagine the “Distribution of GHG Intensities” image being useful for us in other contexts in the future.

The only change I might suggest is that you have 2 graphs immediately above “Results: No noticeable difference between groups” that mention “last year” and “this year” in their titles. It might be good to replace those with numerical years so that people don’t have to check the blog post publication year / can refer to those graphs in other contexts without creating ambiguity about which year’s data are included. Just a thought.

I agree with her feedback on adding years, and I think that would be a simple improvement to each graph, just to make sure if it gets screenshotted and shared it's really clear what data we used. Even adding Chicago in there might be helpful, but 🤷🏻‍♂️

</p>

<p class="constrained">
Many buildings in the publicly available data
Copy link
Owner

@vkoves vkoves Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change "the publicly available data" to "Chicago's building benchmarking data" across your article and link the first time to the main data source we use on the site.

rel="noopener noreferrer"
>don't report emissions data.</a
>
Is there a pattern to which buildings fail to report? Anecdotally, it's
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this "Anecdotally..." sentence to "Our team has noticed that some high emissions buildings stop reporting, while more efficient buildings tend to keep reporting year after year." There unfortunately has not been real press coverage of this data 😭


<p>
The graph below depicts the count of buildings that did and did not
report emissions data every year.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change "every year" to "each year", unless it's a cumulative graph

@colton-lapp
Copy link
Collaborator Author

colton-lapp commented Feb 9, 2025

@vkoves

Finally got some time to work on this and addressed everything you mentioned I think:

  • Tests passing with new requirements.txt file (note, you wanted a seperate requirements.txt file but the tests seem to use src/data/requirements.txt, so I had to change that one. I have a seperate file in src/data/analysis/predict_compliance_requirements.txt that is identical which I was hoping to use. I'll let you sort this out in the tests if you want
  • Using basic javascript to check for mobile, and showing static PNG files if it's mobile
  • Adopted your phrasing changes. Linked to the dataset when I mention it the first time
  • Added authorship to the top (feel free to restyle) which links to my Github (I don't really have social media, happy to take the link off)
  • I changed the graphs as suggested to reference specific years. Note - in the previous graphs, I was actually showing the data for all the years, hence the vague language (i.e. saying "emissions last year for compliance this year had data points with emissions in 2018, compliance in 2019 AND emissions in 2019, compliance in 2020, etc). I subset the dataframe to only show the most recent year to make it clearer and updated the title
  • Some other small tweaks to graphs and stuff

Final todo: Fix regression table at the bottom to be prettier

Copy link
Owner

@vkoves vkoves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colton-lapp - the resize change works great, but if you test your preview on mobile, the font sizes are too small to read. Can you re-export those PNGs with larger font sizes targeted to mobile? You could also move the legend below the graphs (if that's an option) and you need more room.

Screenshot from 2025-02-10 20-47-06

I've also made a few minor tweaks to the blog post to improve the styling, particularly the regression table:

Screenshot from 2025-02-10 21-51-02

}

checkScreenSize(): void {
this.isMobile = window.innerWidth <= 768;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but in the future I'd recommend doing this with CSS, which can handle checking screen size changes automatically. You'd do something like:

<!-- html -->
<img class="mobile-only" ....>

<iframe class="desktop-only">
.graph-cont {
  .mobile-only { display: none; }

  // Mobile styling
  @media (max-width: $mobile-max-width) {
     .desktop-only { display: none; }
     .mobile-only { display: block; }
  }
}

@vkoves vkoves mentioned this pull request Feb 15, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Data updates & tweaks enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

See If There's A Correlation Between Poor Performance & Not Reporting
2 participants