Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weather files aren't being found on Eagle #144

Closed
aspeake opened this issue Mar 31, 2020 · 11 comments
Closed

Weather files aren't being found on Eagle #144

aspeake opened this issue Mar 31, 2020 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@aspeake
Copy link
Contributor

aspeake commented Mar 31, 2020

Summary:
When running on Eagle, all simulations are failing in BuildExistingModel, saying the weather file does not exist. The weather files seem to be copied over correctly into the my scratch folder on Eagle (from /shared-projects/buildstock/weather/project_resstock_national_weather.zip).

When running with the eagle-output-refactor branch the issue seems to be fixed

Example error for one simulation (singularity_output.log):

Found error in state 'os_measures' with message [":/ruby/2.2.0/gems/openstudio-workflow-1.3.4/lib/openstudio/workflow/util/measure.rb failed with message Runner error :/ruby/2.2.0/gems/openstudio-workflow-1.3.4/lib/openstudio/workflow/util/measure.rb failed with Measure BuildExistingModel reported an error with [\"'/weather/USA_VA_Sterling-Washington.Dulles.Intl.AP.724030_TMY3.epw' does not exist or is not an .epw file.\"],

Platform

@aspeake aspeake added the bug Something isn't working label Mar 31, 2020
@nmerket nmerket assigned rajeee and unassigned nmerket Apr 1, 2020
@nmerket
Copy link
Member

nmerket commented Apr 2, 2020

@rajeee After we concluded our discussion this morning, I'm thinking that we probably can't blame the Lustre filesystem for this one. The bind mount for the weather all happens on a local SSD drive mounted to /tmp/scratch. We do copy the weather files from lustre to the SSD, but it should error earlier if that was the problem.

It sounds like our current approach is to:

  1. Run this with a known working sha of OpenStudio-BuildStock to see if the problem still presents itself.
  2. If that works, try a binary search using the git commit history to identify when the problem appears.

@aspeake
Copy link
Contributor Author

aspeake commented Apr 2, 2020

Another datapoint for this bug: just ran 363k on this branch: https://github.com/NREL/OpenStudio-BuildStock/tree/cost_dashboard and the vast majority succeeded, but about 1,000 spit out the same error as above. This branch is way behind master right now.

Ran with the buildstock-0.16.1 environment on Eagle, and used the weather zip /shared-projects/buildstock/weather/project_resstock_national_weather.zip

@rajeee @nmerket

@rajeee
Copy link
Contributor

rajeee commented Apr 3, 2020

@nmerket Based on Andrew's comment above, it looks like the issue is

  1. Not universal (only about 1000 out of 363K have the problem)
  2. Not related to OpenStudio-Buildstock exclusively. A very recent SHA from the master brach of OpenStudio-Buildstock has been able to run successfully (using the eagle_output_refactor branch).

We now have no evidence of eagle_output_refactor branch not working. (But lack of evidence is not evidence of lack!)

Thoughts:

  1. Considering the error is showing up very irregularly, it again hints at some kind of file mounting problem in singularity. If it was a code bug, I would expect all simulations to fail.
  2. If that is the case, eagle_output_refactor branch would fail too (probably at the same rate), since, there isn't material change in eagle_output_refactor branch in terms of running simulation.

@nmerket
Copy link
Member

nmerket commented Apr 6, 2020

Thanks @rajee for looking into all this. I'm inclined to close this for now given that the eagle-output-refactor branch seems to resolve the issue. If it pops back up, we can revisit.

@rajeee
Copy link
Contributor

rajeee commented Apr 6, 2020

@nmerket I agree. The eagle_output_refactor branch successfully ran 10,000 test cases @TobiAdekanye did. The very rare failures Andrew ran into using the master branch, could be because of of the eagle filesystem being overwhelmed (which is what eagle_output_refactor addresses, I guess), so unless we run into systematic problems with eagle_output_refactor branch, I too think this issue can be assumed resolved.

@aspeake
Copy link
Contributor Author

aspeake commented Apr 10, 2020

@rajeee @nmerket

So I may see what is going on with this. I am getting the same error running the master branches of BuildstockBatch and OS-Buildstock right now, as well as with eagle-output-refactor. BuildExistingModel is looking for <weather file name>.epw, but the shared weather zip files (/shared-projects/buildstock/weather/project_resstock_national_weather.zip) that I specify in the yml follow the format <weather file name>_TMY3.epw.

For example:

  • Error: [/weather/USA_OH_Cleveland-Hopkins.Intl.AP.725240.epw' does not exist or is not an .epw file.]
  • Actual weather file: /weather/USA_OH_Cleveland-Hopkins.Intl.AP.725240_TMY3.epw

With the recent changes to various tsvs, this probably explains why an older version of OS-Buildstock was working for me, but I don't understand why the refactor branch fixed things. Either way, is there a different national TMY3 weather file zip that should be used? Or does the measure need to be updated?

@aspeake
Copy link
Contributor Author

aspeake commented Apr 10, 2020

@rajeee @nmerket

So I may see what is going on with this. I am getting the same error running the master branches of BuildstockBatch and OS-Buildstock right now, as well as with eagle-output-refactor. BuildExistingModel is looking for <weather file name>.epw, but the shared weather zip files (/shared-projects/buildstock/weather/project_resstock_national_weather.zip) that I specify in the yml follow the format <weather file name>_TMY3.epw.

For example:

* Error: `[/weather/USA_OH_Cleveland-Hopkins.Intl.AP.725240.epw' does not exist or is not an .epw file.]`

* Actual weather file: `/weather/USA_OH_Cleveland-Hopkins.Intl.AP.725240_TMY3.epw`

With the recent changes to various tsvs, this probably explains why an older version of OS-Buildstock was working for me, but I don't understand why the refactor branch fixed things. Either way, is there a different national TMY3 weather file zip that should be used? Or does the measure need to be updated?

This is relevant to this PR: NREL/resstock#432

@TobiAdekanye provided me the updated TMY3 zip files with the correct name and I uploaded them to Eagle here: /shared-projects/buildstock/weather/ResStock_TMY3.zip. Using this as the weather file path seemed to fix things.

@aspeake
Copy link
Contributor Author

aspeake commented Apr 15, 2020

@nmerket @rajeee, so the above case was corrected by updating the weather zip, however @ekpresent was seeing this error on Eagle yesterday, and I am getting it again today.

Running https://github.com/NREL/OpenStudio-BuildStock/tree/geb-potential with buildstock-0.16.1 spit out the error, but running with buildstock-0.15 did not.

@nmerket
Copy link
Member

nmerket commented Apr 15, 2020

@aspeake which above case? The case where the weather files didn't have the _TMY3 at the end of them or something else?

@aspeake
Copy link
Contributor Author

aspeake commented Apr 15, 2020

@aspeake which above case? The case where the weather files didn't have the _TMY3 at the end of them or something else?

Yes, the _TMY3 error (#144 (comment))

@nmerket
Copy link
Member

nmerket commented Oct 27, 2022

Will be fixed in #316

@nmerket nmerket closed this as completed Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants