Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate RAStock data #20

Merged
merged 20 commits into from
Sep 30, 2024
Merged

Integrate RAStock data #20

merged 20 commits into from
Sep 30, 2024

Conversation

mikivee
Copy link
Collaborator

@mikivee mikivee commented Sep 25, 2024

Summary

Adds selected RAStock upgrades to the surrogate model training set. This does NOT retrain the model. Ticket

Detailed Changes

  1. Copies functionality from pep on pre-processing RAStock outputs, wraps it in functions in a new util.py and constants.py script in preparation for a move to dmutils. Note that some of this code will change after latest changes are merged into PEP.
  2. Adds some unit tests for (2).
  3. Modifies extract_data to use the functions in (1) to pull in all RAStock annual outputs and combine with ResStock outputs.
  4. Adds features for 13.01 and 11.05 in build_feature_store.py
  5. Some minor fixes in datagen.py to make sure we train on only subset of building ids, since we don't want the train on the new upgrades for right now.
  6. Unrelated bug fix shoved into this PR: datatype of n_building_units feature in build_feature_store.py updated from str to int.

Next

Move util and constants and tests over to dmutils and once this repo is fully migrated over, delete from here and import from there instead.

Who Reviews

  • @nweires could you look at (1), (2) and (3) and see if this aligns with what you were thinking? I held off on updating the code until we have the upgrade table and mohammad pushes his cchp changes.
  • @vengroff @tarrenpeterson: could one of you review the whole PR? You can probably mostly skim (1)-(2).

@mikivee mikivee marked this pull request as draft September 25, 2024 23:16
@mikivee mikivee marked this pull request as ready for review September 27, 2024 12:56
Copy link
Contributor

@vengroff vengroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments.

@@ -953,6 +955,8 @@ def transform_building_features() -> DataFrame:
# COMMAND ----------

# DBTITLE 1,Apply upgrade functions
# TODO: put this in some kind of shared config that can be used across srcipts/repos
SUPPORTED_UPGRADES = [0.0, 1.0, 3.0, 4.0, 6.0, 9.0, 11.05, 13.01]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we perhaps want to have individual constants for these numbers. I guess or people who work with these every single day, they know what 3 and 7 mean, but for the rest of us, constants would be amazing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nweires where are the enums that you and @christine-george are gonna use gonna live? Can import thsoe use those here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working out exactly what this will look like, but yes, there should be something better to use soon! I'm hoping to move away from using these upgrade ID numbers as the key to identify upgrades.

Draft changes are here - see upgrades.json and the schema in read_upgrades.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Natalie! I will integrate these enums in a subsequent PR when that is ready to roll!

@mikivee mikivee removed the request for review from tarrenpeterson September 27, 2024 14:59
Copy link

@nweires nweires left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parts related to reading RAStock data look good to me.


This function is getting revamped in a different repo so not bothering to document for now
"""
# TODO: update with new code from natalie/mohammad in pep
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed already in https://github.com/rewiringamerica/pep/pull/187, but if you want to wait for it to finish changing, this is fine for now.

Copy link
Collaborator Author

@mikivee mikivee Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i saw that but I thought I'd wait till the upgrade id table was ready and the cchp schema changes were in and then make all the changes in one go.

@@ -953,6 +955,8 @@ def transform_building_features() -> DataFrame:
# COMMAND ----------

# DBTITLE 1,Apply upgrade functions
# TODO: put this in some kind of shared config that can be used across srcipts/repos
SUPPORTED_UPGRADES = [0.0, 1.0, 3.0, 4.0, 6.0, 9.0, 11.05, 13.01]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working out exactly what this will look like, but yes, there should be something better to use soon! I'm hoping to move away from using these upgrade ID numbers as the key to identify upgrades.

Draft changes are here - see upgrades.json and the schema in read_upgrades.py.

mikivee and others added 3 commits September 30, 2024 19:19
@mikivee mikivee merged commit 3f019e5 into main Sep 30, 2024
1 check passed
@mikivee mikivee deleted the mev/integrate_rastock branch September 30, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants