pvdaq io functions #664

bmeyers · 2019-02-18T02:12:44Z

Closes issue PVDAQ API IO tools #663
I am familiar with the contributing guidelines.
Fully tested. Added and/or modified tests to ensure correct behavior for all reasonable inputs. Tests (usually) must pass on the TravisCI and Appveyor testing services.
Updates entries to docs/sphinx/source/api.rst for API changes.
Adds description and name entries in the appropriate docs/sphinx/source/whatsnew file for all changes.
Code quality and style is sufficient. Passes LGTM and SticklerCI checks.
New code is fully documented. Includes sphinx/numpydoc compliant docstrings and comments in the code where necessary.
Pull request is nearly complete and ready for detailed review.

Brief description of the problem and proposed solution (if not already fully described in the issue linked to above): Functions for making data requests to NREL's PVDAQ API. I'm using the year CSV file API, which is the most efficient way of obtaining multi-year, sub-hourly data.

bmeyers · 2019-02-18T02:13:38Z

I will fix the lint errors. Other suggestions on what to complete before this can be merged are appreciated. Thanks!

bmeyers · 2019-02-18T02:42:46Z

Also, this function relies on requests. What should I do to capture that dependency?

mikofski · 2019-02-18T03:56:32Z

Maybe take a quick look at the other iotools to see how they handle dependencies?

wholmgren · 2019-02-18T16:38:47Z

Thanks @bmeyers!

I think requests is already pulled into our test environments for another package. It's probably ok to make it an explicit requirement. The tmy module calls on urllib and I'd be happy to replace that with requests if it's an explicit dependency.

Please add get_pvdaq_data to https://github.com/pvlib/pvlib-python/blob/master/pvlib/iotools/__init__.py

Have you considered any simple methods for testing? It's not necessary to test the live connection, but it's good to assert that the format is read in the way you expect it to be on a very short file.

The progress bar is interesting. Do you know how it behaves in the different environments that are often used by pvlib community (python terminal, ipython terminal, notebooks, scripts w/ w/o logging)?

cwhanse · 2019-02-19T00:32:36Z

The standardize_time_axis function could have use outside the pvdaq. What does the group think about an iotools.util module?

bmeyers · 2019-02-19T00:58:56Z

@wholmgren:
For testing, I could see two options. (1) Test the standardize_time_axis function on a small, minimal test data frame, generated by code, or (2) test a live connection using the DEMO_KEY. Do you have a preference?

The progress bar has been tested in python terminal, ipython terminal, notebooks, and with scripts. My understanding is that server logs will capture the output from the sys.stdout.write command, but I have not tested this. I also haven't tested server deployment without logging, but I believe that the sys.stdout.write command would have no effect.

Should I add the requests dependency to setup.py? Is it an "extra"?

@cwhanse:
Thanks! I'm happy to start a iotools.util module with this in it. The function is from this package I recently put together, and there might be some other nuggets in there that could start to fill out an iotools.util modules. Shall we make this part of the same PR?

…progress bar is not overwritten by a later print statement in a user script. Also added an if __name__ == "__main__" block for testing purposes

wholmgren · 2019-02-19T17:08:14Z

The resampling part of standardize_time_axis could be useful in more situations, but I don't know if it should be applied by default.

What's the advantage of the set_index part of this function over calling the pandas functions directly? In my experience using to_datetime is either straightforward (single column in a format very close to ISO 8601) or a pain that is hard to generalize (e.g. #666).

For testing, we'd want a pvlib/test/test_pvdaq.py module that covers most if not all use cases. I'm trying to be less picky about code and test code in iotools to encourage more contributions. Testing small data files bundled in pvlib/data is usually preferable to testing network connections. Tests that need network connections should be wrapped in the pandas network decorator (see the test_tmy module for example).

bmeyers · 2019-02-20T02:53:22Z

Hi Will, you're right, I should have made the usage of standardize_time_axis user controllable. Just pushed a commit to do just that. I think having the default be True is nice, but I'm happy to change it if you'd rather the default behavior be not to apply the function.

The set_index makes the Date-Time column the axis for the data frame. This allow the usage of the reindex method later on when applying the standardized time index. What do you mean by calling pandas functions directly? I also rely on to_datetime for parsing the datetime strings. Is that a problem? I haven't hit a request with the PVDAQ API where that didn't work.

10-4 on testing.

wholmgren · 2019-02-21T16:36:49Z

What do you mean by calling pandas functions directly?

Hopefully the discussion below is more clear.

The first part of your function contains these lines to create a DatetimeIndex:

    # convert index to timeseries
    try:
        df[datetimekey] = pd.to_datetime(df[datetimekey])
        df.set_index('Date-Time', inplace=True)
    except KeyError:
        time_cols = [col for col in df.columns
                     if np.logical_or('Time' in col, 'time' in col)]
        key = time_cols[0]
        df[datetimekey] = pd.to_datetime(df[key])
        df.set_index(datetimekey, inplace=True)

The first part of the try block is straightforward and I'm questioning if putting it in a pvlib.iotools.utils function is actually helpful. Beyond that, I'm wondering two related things about this part of the function:

Is it general enough to be useful as its own function in a utils module?
Assuming it is general enough, does it actually improve code clarity in other parsing functions?

Here are links to all of the datetime parsing code in iotools:

midc
srml
surfrad
tmy
proposed CRN parser in #666

It seems to me that the answer to question 1 is "not at this point". I don't yet know the answer to 2, but I think writing generic parsing code is hard!

I also rely on to_datetime for parsing the datetime strings. Is that a problem?

No problem. I think that's the right way to do it.

I think the 2nd half of the function merits its own function that assumes a Series/DataFrame that already has a DatetimeIndex.

adriesse · 2019-03-08T09:42:10Z

pvlib/iotools/pvdaq.py

+        years will be concatenated into a single data frame
+    delim: string
+        The deliminator used in the CSV file being requested
+


Does pvdaq ever use other delimiters?

adriesse · 2019-03-08T09:43:03Z

pvlib/iotools/pvdaq.py

+
+
+def get_pvdaq_data(sysid=2, api_key='DEMO_KEY', year=2011, delim=',',
+                   standardize=True):


Suggest year before api_key

adriesse · 2019-03-08T09:47:30Z

pvlib/iotools/pvdaq.py

+                      for item in req_params.items()]
+        req_url = base_url + '&'.join(param_list)
+        response = requests.get(req_url)
+        if int(response.status_code) != 200:


How about raising an exception?

mikofski · 2019-05-03T18:49:19Z

pvlib/iotools/__init__.py

@@ -7,3 +7,4 @@
 from pvlib.iotools.midc import read_midc_raw_data_from_nrel  # noqa: F401
 from pvlib.iotools.ecmwf_macc import read_ecmwf_macc  # noqa: F401
 from pvlib.iotools.ecmwf_macc import get_ecmwf_macc  # noqa: F401
+from pvlib.iotools.pvdaq import get_pvdaq_data


just append # noqa: F401 to tell stickler to ignore -- this is solely for the api

mikofski · 2020-01-22T18:43:13Z

@bmeyers any update on this? There's renewed interest from @shirubana (Silvana @ NREL) 😁

bmeyers · 2020-01-22T19:07:05Z

@mikofski Sorry all for the lack of progress on my point. After PVSC abstracts get submitted, I'll pick this up again. I definitely want to get this completed.

wholmgren · 2020-05-12T21:29:34Z

I implemented a couple of PVDAQ fetch functions over in our Solar Forecast Arbiter project:

https://github.com/SolarArbiter/solarforecastarbiter-core/blob/master/solarforecastarbiter/io/fetch/pvdaq.py

https://github.com/SolarArbiter/solarforecastarbiter-core/blob/master/solarforecastarbiter/io/fetch/tests/test_pvdaq.py

I based them on @bmeyers code but removed most of the DatetimeIndex handling code and the progress bar. I'll describe my motivations here in case they're useful for advancing this PR:

I didn't find that the date/time key control was needed to handle the publicly accessible data. Perhaps it's needed for non-public data?

I also wanted to control how I handled the resampling/reindexing based on my requirements, so I left that to my own function. For example, I wanted to resample sub 1 minute data into 1 minute data using resample.mean but wanted to use resample.first for 5 and 15 minute data with the occasional odd reading.

Finally, while I'd like to return a localized DatetimeIndex, the time zone of the data is not clear. It appears to be in local standard time, but I don't see any metadata or documentation about this (I could have missed it). So, it appears it's up to the user to decipher the UTC offset based on the lat/lon of the separately accessible metadata.

I still think it would be great if the functionality ended up in pvlib in the long run.

mikofski · 2020-05-13T18:01:51Z

My vote would be to add the solar arbiter pvdaq implementation to iotools and close this PR.

Usage is the best user story, pre-mature optimization is the killer (this PR case in point).

I would rather see this in pvlib, observe usage, and have users create issues or discuss on groups/SO, and then we modify as needed, than not have it all or living in PR state indefinitely.

pvdaq io functions

c0563d3

wholmgren added the enhancement label Feb 18, 2019

wholmgren added this to the 0.6.2 milestone Feb 18, 2019

wholmgren added the io label Feb 18, 2019

bmeyers added 4 commits February 18, 2019 10:56

fixing line length issues

32bda8a

importing new function in package init

a6373c8

fixing blank lines

6d8eb66

fixing other lint issues

4ce7826

Adding a newline character after last progress bar write so that the …

60041ed

…progress bar is not overwritten by a later print statement in a user script. Also added an if __name__ == "__main__" block for testing purposes

allow user option for standardize_time_axis

bebcea3

adriesse reviewed Mar 8, 2019

View reviewed changes

mikofski reviewed May 3, 2019

View reviewed changes

wholmgren modified the milestones: 0.6.2, 0.7.0 May 15, 2019

wholmgren modified the milestones: 0.7.0, 0.8.0 Nov 26, 2019

wholmgren mentioned this pull request Apr 21, 2020

PVDAQ sites for reference data SolarArbiter/solarforecastarbiter-core#397

Closed

wholmgren mentioned this pull request May 7, 2020

add pvdaq reference sites, fetch, config SolarArbiter/solarforecastarbiter-core#438

Merged

12 tasks

wholmgren added the solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter label May 12, 2020

wholmgren modified the milestones: 0.8.0, Someday Aug 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pvdaq io functions #664

pvdaq io functions #664

bmeyers commented Feb 18, 2019 •

edited

Loading

bmeyers commented Feb 18, 2019

bmeyers commented Feb 18, 2019

mikofski commented Feb 18, 2019

wholmgren commented Feb 18, 2019

cwhanse commented Feb 19, 2019

bmeyers commented Feb 19, 2019

wholmgren commented Feb 19, 2019

bmeyers commented Feb 20, 2019

wholmgren commented Feb 21, 2019

adriesse Mar 8, 2019

adriesse Mar 8, 2019

adriesse Mar 8, 2019

mikofski May 3, 2019

mikofski commented Jan 22, 2020

bmeyers commented Jan 22, 2020

wholmgren commented May 12, 2020

mikofski commented May 13, 2020



		def get_pvdaq_data(sysid=2, api_key='DEMO_KEY', year=2011, delim=',',
		standardize=True):

pvdaq io functions #664

Are you sure you want to change the base?

pvdaq io functions #664

Conversation

bmeyers commented Feb 18, 2019 • edited Loading

bmeyers commented Feb 18, 2019

bmeyers commented Feb 18, 2019

mikofski commented Feb 18, 2019

wholmgren commented Feb 18, 2019

cwhanse commented Feb 19, 2019

bmeyers commented Feb 19, 2019

wholmgren commented Feb 19, 2019

bmeyers commented Feb 20, 2019

wholmgren commented Feb 21, 2019

adriesse Mar 8, 2019

Choose a reason for hiding this comment

adriesse Mar 8, 2019

Choose a reason for hiding this comment

adriesse Mar 8, 2019

Choose a reason for hiding this comment

mikofski May 3, 2019

Choose a reason for hiding this comment

mikofski commented Jan 22, 2020

bmeyers commented Jan 22, 2020

wholmgren commented May 12, 2020

mikofski commented May 13, 2020

bmeyers commented Feb 18, 2019 •

edited

Loading