Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NSRDB GOES V4 #2326

Open
williamhobbs opened this issue Dec 5, 2024 · 29 comments · May be fixed by #2378
Open

Add NSRDB GOES V4 #2326

williamhobbs opened this issue Dec 5, 2024 · 29 comments · May be fixed by #2378

Comments

@williamhobbs
Copy link
Contributor

NSRDB's latest "V4" dataset appears to be available. It would be nice to have this available in iotools.

See NREL/SAM#1920 for some additional info. Quoting @cpaulgilman:

NSRDB V4.0.0 data (1998-2023) is now available from new NREL Developer API endpoints:

https://developer.nrel.gov/docs/solar/nsrdb/nsrdb-GOES-tmy-v4-0-0-download/

https://developer.nrel.gov/docs/solar/nsrdb/nsrdb-GOES-aggregated-v4-0-0-download/

SAM's Advanced Downloads option can access these endpoints, but the direct download from the Location and Resource page still accesses the older V3.2.2 (1998-2022) API endpoint.

Note there is an "aggregated" and "conus" endpoint. The latter seems to be a subset of the former, but need to determine the correct endpoint to use.

Also note that there was an issue with DNI that has been fixed (also in that SAM issue's comments).

@markcampanelli
Copy link
Contributor

I was unsure what to make of the GOES V4 datasets. The descriptions of the various GOES endpoints all seem to be generic “copy pasta” at https://developer.nrel.gov/docs/solar/nsrdb/. Perhaps this should be reported?

@cpaulgilman
Copy link

cpaulgilman commented Dec 5, 2024

Another source of information about these datasets is the NSRDB website, for example, https://nsrdb.nrel.gov/data-sets/us-data. There's no mention of GOES V4 there (yet), although the GOES V4 endpoints are included in the list of available data from the NSRDB Viewer, albeit with different names:

image

@williamhobbs
Copy link
Contributor Author

The descriptions of the various GOES endpoints all seem to be generic “copy pasta” at https://developer.nrel.gov/docs/solar/nsrdb/. Perhaps this should be reported?

Tagging @grantbuster, should there be updates to the docs on the NREL developer website for NSRDB GOES V4? I'll admit that I was expecting "PSM V4", so even the name has me a bit confused.

Also, I thought about tagging Paul Edwards, but this seems like a question for the NSRD team instead of the developer website team, so I'll let someone else tag him if needed...

@markcampanelli
Copy link
Contributor

markcampanelli commented Dec 5, 2024

If a get_psm4 or get_goes4 function is added for this, then I think we should consider updating the names parameter to take a sequence of strings instead of just a single string. I often download the entire set of years of historical data, and I'm wondering if it would be more efficient to allow an array of years to be specified all in one go? (My apologies if this discussion has already been had and settled for reasons not apparent to me. If it must be one year at a time, then keep the same functionality but use the singular name instead of plural names.) UPDATED: CSV-formatted responses do not support multi-year requests.

To better follow semver and avoid surprising users when upgrading pvlib, I would be a fan of adding the full version in the function names, e.g., get_goes_4_0_0, and using NREL's corresponding (immutable) URL to that version. Then, each of these version-specific functions could call the same "URL-flexible" data-provider function under the hood, where this latter function does not specify any default URL(s).

@xieyupku
Copy link

xieyupku commented Dec 6, 2024

I recommend updating pvlib from PSM V3 to PSM V4.
Based on the structure of psm3.py, the new description for PSM V4 could be
Get PSM4 TMY
see https://developer.nrel.gov/docs/solar/nsrdb/nsrdb-GOES-tmy-v4-0-0-download/

Enhancements in PSM V4 compared to PSM V3:

  1. Improved cloudy-sky direct normal irradiance (DNI) data using the FARMS-DNI model.
  2. Improved snow/ice surface albedo data.
  3. Implementation of a physics-guided machine learning procedure to gap-fill missing cloud properties.

@cwhanse
Copy link
Member

cwhanse commented Dec 6, 2024

consider updating the names parameter to take a sequence of strings

Does the NREL API parse that? pvlib is just passing the value of names in the request body. If pvlib has to parse a list of strings or similar, that would complicate matters.

adding the full version in the function names, e.g., get_goes_4_0_0

The url parameter can accept the NREL URL for each version. It could be helpful to provide some kind of dictionary, e.g., {'v400': <url for GOES 4.0.0>} if the URLs are actually immutable, and not something we'd have to correct every year or two.

@williamhobbs
Copy link
Contributor Author

consider updating the names parameter to take a sequence of strings

Does the NREL API parse that?

I can't seem to get a multi-year request to work. The documentation does make it seem like it should work, though.

@markcampanelli
Copy link
Contributor

I can't seem to get a multi-year request to work. The documentation does make it seem like it should work, though.

I think I was mistaken here. CSV-formatted responses apparently do not support multiple years:

CSV Output Format
Direct streaming of CSV data is supported for single location, single year only.

From https://developer.nrel.gov/docs/solar/nsrdb/psm3-2-2-download/.

@markcampanelli
Copy link
Contributor

consider updating the names parameter to take a sequence of strings

Does the NREL API parse that? pvlib is just passing the value of names in the request body. If pvlib has to parse a list of strings or similar, that would complicate matters.

adding the full version in the function names, e.g., get_goes_4_0_0

The url parameter can accept the NREL URL for each version. It could be helpful to provide some kind of dictionary, e.g., {'v400': <url for GOES 4.0.0>} if the URLs are actually immutable, and not something we'd have to correct every year or two.

@cwhanse So the idea is to grow this dictionary as the NSRDB evolves?

@cwhanse
Copy link
Member

cwhanse commented Dec 7, 2024

So the idea is to grow this dictionary as the NSRDB evolves?

Yes, and keep the current function's signature.

@williamhobbs
Copy link
Contributor Author

@AdamRJensen @kandersolar @wholmgren, it looks like "horns" issue has been resolved with GOES v4. Add this to the "reasons to add GOES V4" bucket.

To my eye, it looks like some variability has increased, but hard to say if it's "better". It looks worse for this one day, but there are other (more important, IMO) types of variable conditions that I haven't looked at.

image

For reference (from https://doi.org/10.1109/PVSC57443.2024.10748762):
image

@williamhobbs
Copy link
Contributor Author

I also just noticed that GOES V4 now includes 2023, while PSM3 is still only available through 2022.

https://developer.nrel.gov/docs/solar/nsrdb/nsrdb-GOES-aggregated-v4-0-0-download/ vs https://developer.nrel.gov/docs/solar/nsrdb/psm3-2-2-download/.

I'm speculating, but if the NSRDB team is not planning to update PSM3, then that makes adding GOES V4 a higher priority. I've sent an email to that team and will report back if I hear anything.

@grantbuster
Copy link

I have stepped back from my role in the NSRDB project so I’m no longer a good resource for what’s going on with the latest and greatest… that being said, this has not been responded to in a while now so I’ll try to help.

I’m pretty sure that new years of data will be released under NSRDB v4, and v3 will continue to be available on S3 but will not be maintained.

@williamhobbs williamhobbs linked a pull request Feb 6, 2025 that will close this issue
8 tasks
@kandersolar
Copy link
Member

In hindsight, I wish get_psm3 had been two separate functions: get_psm3 and get_psm3_tmy. I think we should learn from that mistake and have separate get_ functions for each GOES v4 endpoint we want to support.

Function naming: I think get_nsrdb_goes4 would be better than get_goes4.

@williamhobbs
Copy link
Contributor Author

have separate get_ functions for each GOES v4 endpoint we want to support.

Current endpoints are:

  • nsrdb-GOES-aggregated-v4-0-0-download (30 and 60 min for 1998-2023)
  • nsrdb-GOES-conus-v4-0-0-download (5, 15, 30, 60 min for 2021-2023)
  • nsrdb-GOES-tmy-v4-0-0-download (TMYs)

@kandersolar, are you suggesting 3 functions, like get_nsrdb_goes4_aggregated, get_nsrdb_goes4_conus, and get_nsrdb_goes4_tmy?

FYI, GOES4 endpoints not included (as of now in PR #2378):

  • nsrdb-GOES-full-disc-v4-0-0-download (10, 30, 60 min for 2018-2023, presumably coverage for full sat image beyond CONUS?)

@kandersolar
Copy link
Member

@kandersolar, are you suggesting 3 functions, like get_nsrdb_goes4_aggregated, get_nsrdb_goes4_conus, and get_nsrdb_goes4_tmy?

I suppose so, although perhaps it is worth waiting for some thumbs up on the idea before putting effort towards code changes.

@williamhobbs
Copy link
Contributor Author

As an aside on naming, "GOES v4" is confusing to me. Was version 3 PSM3? And GOES v4 is what would have been called PSM4, but it's now named after the satellites (GOES) instead of the "physical solar model" (PSM)?

Here's an abstract from last month that references "PSM v4": https://ams.confex.com/ams/105ANNUAL/meetingapp.cgi/Paper/455074.

I'm wondering if we are going to get comfortable with "GOES4" and then the NSRDB team will change it to "PSM4" or something...

@cwhanse
Copy link
Member

cwhanse commented Feb 7, 2025

I read it as @williamhobbs does, "PSM v4" is the satellite-to-irradiance model, and "GOES" is the satellite data source (vs. e.g. Himawari). I would lean toward naming the pvlib functions get_nsrdb_psm4 and get_nsrdb_psm4_tmy, as those are meaningful to a user envisioning what they want to receive.

Maybe the endpoint can be a parameter?

@williamhobbs
Copy link
Contributor Author

I just saw a copy of the slides from AMS 2025 and it does seem that "PSM v4" is the name the NREL team is using. I'm convinced we should use "psm4" instead of "goes4" in the function name(s).

@wholmgren
Copy link
Member

I am also in favor of separate functions for separate end points. I'm neutral on if we should also maintain a get_psm3|4 wrapper function.

@adriesse
Copy link
Member

As I understand it, the NREL team also processes data from other satellites for other regions and makes the results available under the NSRDB umbrella. I haven't checked, but there are probably different end points for those. Perhaps we should look at the whole list before deciding how to wrap them?

I'm in favor of aiming for one-to-one io functions, but justified exceptions are ok.

@williamhobbs
Copy link
Contributor Author

Is there a reason the utc API parameter is not available in NSRDB iotools functions? The first thing I always try to do after getting PSM3 data is to convert to UTC. I'm wondering if it should be added to get_nsrdb_psm4 functions...

@kandersolar
Copy link
Member

Quick note: with NREL/developer.nrel.gov#377, the PSM3.2.2 endpoints are now marked as deprecated: https://developer.nrel.gov/docs/solar/nsrdb/

@kandersolar kandersolar added this to the v0.11.3 milestone Feb 20, 2025
@kandersolar
Copy link
Member

As I understand it, the NREL team also processes data from other satellites for other regions and makes the results available under the NSRDB umbrella. I haven't checked, but there are probably different end points for those.

I think this is correct. And what's worse, different datasets (different spatial/time extents) can come from the same satellite. A full accounting for the GOES-based PSM4 datasets would look something like this:

  • get_psm4_goes_aggregated: 30/60-minute, 1998-2023, 4-km, North America & half of South America
  • get_psm4_goes_conus: 5-minute, 2021-2023, 2-km, continental US & Mexico
  • get_psm4_goes_fulldisc: 10-minute, 2018-2023, 2-km, most of North & South America
  • get_psm4_goes_tmy: the usual, but not sure about spatial extent

I think I'd be in favor of us adding three functions at this time: aggregated, conus, and tmy. I am confident there is demand for those. We can always add fulldisc later if desired.

We might also consider adding functions for the Meteosat (Europe/Africa/Middle East) and Himawari (East Asia/Australia) endpoints, but they do not seem to be updated with the latest data, so perhaps we should wait for the NSRDB team to show more interest in maintaining those datasets before we add functions for them.

@williamhobbs
Copy link
Contributor Author

@kandersolar, I created a new branch with the 4 separate get_ functions. I mentioned in in the PR, #2378 (comment), but maybe should have mentioned it here as well.

@adriesse
Copy link
Member

adriesse commented Feb 20, 2025

Just to further clarify or confuse: GOES isn't just one satellite (pair), but a series of satellites over time with evolving capabilities. I don't think there's a lack of interest at NREL, but it's just a huge task to do all they do.

There is interest from the northern latitudes for fulldisc data.

@kandersolar
Copy link
Member

I should have worded that sentence differently. I only meant that there's a lot of churn in the NSRDB world and I'm wary of adding functions for datasets that (for understandable reasons) won't get updated or might disappear entirely before too long.

@adriesse
Copy link
Member

I guess this churn could be handled more easily or nimbly with a separate iotools package.

@wholmgren
Copy link
Member

wholmgren commented Feb 26, 2025

@kandersolar, I created a new branch with the 4 separate get_ functions. I mentioned in in the PR, #2378 (comment), but maybe should have mentioned it here as well.

I think I like this when looking at the functions. I like the opportunity to fix the awkwardness with leap_day. Certainly some opportunities for code refactoring but that's expected at this stage. If we go with the separate functions then I think we should venture into computed docstrings. Computed docstrings are now common in the pydata world but haven't yet been used in pvlib. The module is 800 lines and about half of those are minor permutations of a 100 line docstring - and this will only get worse if we mirror more of the NSRDB API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants