You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a high-level issue to capture an extension to Siphon. The idea is that users of siphon would have a way to access data without a needing to supply a specific source (via class name or URL). Because I'm an unimaginative hack at the best of times, I'll call it "Default Data Sources" for now.
Consider model output. As it is now, you need to know a data source to use Siphon. It would be nice to be able to do something like:
dataset=GFS("0.25", <rundate>).gimme()
or
dataset=GFS("0.25", "latest").gimme()
and at that point, you'd have a netCDF4-compatible Dataset object hooked
up to the OPeNDAP or cdmremote endpoint for a specific run, or the the latest available run, of the 0.25 degree GFS. Depending on the requested run time (or the presence of a bounding box), Siphon may try thredds.ucar.edu, thredds-test.unidata.ucar.edu, or www.ncei.noaa.gov. Running on jetstream? thredds-jetstream.unidata.ucar.edu bumps up in priority.
Now, consider a Simple Web Service, such as one of the Upper Air data sources. Currently, siphon requires choosing a specific provider to grab Upper Air data (i.e. WyomingUpperAir or IGRAUpperAir). What if, similar to GFS above, users could simply use:
and siphon would pick a default source based on the user supplied parameters and/or what data are available "locally close" (e.g. same cloud).
Of course, we'd always want to have a way for the user to determine the actual source for any of these requests. For example:
print(dataset.source_name)
>>"Integrated Global Radiosonde Archive version 2."print(dataset.source_publishers)
>>"NOAA National Centers for Environmental Information."print(dataset.source_about)
>>"https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00975"
Note: In these examples I've used a method called gimme(). I don't actually suggest that, but we don't have a consistent "give me the data" method across different types of data or different access types. For simplewebservices, we have:
get_catalog returns a TDSCatalog containing the datasets that match a query
Some work by giving you all the data for the request at once, some provide a "remote" view into the data and only pull things as the variables are sliced.
I think what we want at this level of functionality would be pandas.DataFrame for point type data (things that live in, say, siphon.defaultdatasources.point), and xarray.Dataset for gridded type data (things that live in, say, siphon.defaultdatasources.grids) based on a single request/response loop (so no OPeNDAP or cdmremote kinds of access, for consistency).
The text was updated successfully, but these errors were encountered:
This is a high-level issue to capture an extension to Siphon. The idea is that users of siphon would have a way to access data without a needing to supply a specific source (via class name or URL). Because I'm an unimaginative hack at the best of times, I'll call it "Default Data Sources" for now.
Consider model output. As it is now, you need to know a data source to use Siphon. It would be nice to be able to do something like:
or
and at that point, you'd have a netCDF4-compatible
Dataset
object hookedup to the
OPeNDAP
orcdmremote
endpoint for a specific run, or the the latest available run, of the 0.25 degree GFS. Depending on the requested run time (or the presence of a bounding box), Siphon may trythredds.ucar.edu
,thredds-test.unidata.ucar.edu
, orwww.ncei.noaa.gov
. Running on jetstream?thredds-jetstream.unidata.ucar.edu
bumps up in priority.Now, consider a Simple Web Service, such as one of the Upper Air data sources. Currently, siphon requires choosing a specific provider to grab Upper Air data (i.e.
WyomingUpperAir
orIGRAUpperAir
). What if, similar to GFS above, users could simply use:and siphon would pick a default source based on the user supplied parameters and/or what data are available "locally close" (e.g. same cloud).
Of course, we'd always want to have a way for the user to determine the actual source for any of these requests. For example:
Note: In these examples I've used a method called
gimme()
. I don't actually suggest that, but we don't have a consistent "give me the data" method across different types of data or different access types. Forsimplewebservice
s, we have:request_data
,request_all_data
,latest_observations
, returnpandas.DataFrame
acis_request
returns adict
raw_buoy_data
,realtime_observations
returnstr
.For
NCSS
get_data
returns whatever was in the queryget_data_raw
returnsbytes
For
RadarServer
,get_catalog
returns aTDSCatalog
containing the datasets that match a querySome work by giving you all the data for the request at once, some provide a "remote" view into the data and only pull things as the variables are sliced.
I think what we want at this level of functionality would be
pandas.DataFrame
for point type data (things that live in, say,siphon.defaultdatasources.point
), andxarray.Dataset
for gridded type data (things that live in, say,siphon.defaultdatasources.grids
) based on a single request/response loop (so noOPeNDAP
orcdmremote
kinds of access, for consistency).The text was updated successfully, but these errors were encountered: