It is easy to install with devtools
library(devtools)
install_github("btupper/threddscrawler")
TopCatalogRefClass
for catalogs that are containers of CatalogRefClass
pointers. This is like a listing of files and subdirectories in a directory, but here the files and subdirectories are all CatalogRefClass
pointers.
CatalogRefClass
is a pointer to TopCatalogRefClass
THREDDS dataset
comes in two flavors: collections of datasets and direct datasets. I split these into DatasetsRefClass
(collections) and DatasetRefClass
(direct); the latter has an 'access' child node the former does not. A collection is a listing of one or more datasets (either direct or catalogs). A direct dataset is a pointer to an actual resource like a NetCDF file.
NERACOOS exposes data using a THREDDS server. This is an example that draws upon the MUR SST data subset prepared in 2015.
We start by examining the catalog Note that programmatically we access the companion XML file
We'll crawl these pages in succession...
library(threddscrawler)
# start by getting the TopCatalog - picture TopCatalog as web page that list one or more catalogs.
Top <- get_catalog('http://www.neracoos.org/thredds/catalog/GMRI/SST/TESTS/NASA_MUR_SST/catalog.xml')
Top
# Reference Class: "TopCatalogRef"
# verbose_mode: FALSE
# url: http://www.neracoos.org/thredds/catalog/GMRI/SST/TESTS/NASA_MUR_SST/catalog.xml
# children: service dataset
# catalogs: NorthEastShelf GulfOfMaine
# now get the catalogs embedded in the page. Note that these point to other TopCatalogs.
A <- Top$get_catalogs()
A
# $NorthEastShelf
# Reference Class: "CatalogRefClass"
# verbose_mode: FALSE
# url: http://www.neracoos.org/thredds/catalog/GMRI/SST/TESTS/NASA_MUR_SST/NorthEastShelf/catalog.xml
# children:
# name:NorthEastShelf
# href:NorthEastShelf/catalog.xml
# title:NorthEastShelf
# type:
# ID:GMRI_TESTS/NASA_MUR_SST/NorthEastShelf
#
# $GulfOfMaine
# Reference Class: "CatalogRefClass"
# verbose_mode: FALSE
# url: http://www.neracoos.org/thredds/catalog/GMRI/SST/TESTS/NASA_MUR_SST/GulfOfMaine/catalog.xml
# children:
# name:GulfOfMaine
# href:GulfOfMaine/catalog.xml
# title:GulfOfMaine
# type:
# ID:GMRI_TESTS/NASA_MUR_SST/GulfOfMaine
# now we get the catalogs in the NorthEastShelf
NES <- A[['NorthEastShelf']]$get_catalog()
NES
# Reference Class: "TopCatalogRef"
# verbose_mode: FALSE
# url: http://www.neracoos.org/thredds/catalog/GMRI/SST/TESTS/NASA_MUR_SST/NorthEastShelf/catalog.xml
# children: service dataset
# catalogs: MonthlyMeans MonthlyFiles DailyFiles AggregatedMeans
# let's get the catalogs. I won't show them, but we'll get the TopCatalog for the 'DailyFiles'
B <- NES$get_catalogs()
DAYS <- B[['DailyFiles']]$get_catalog()
# now get 2010
C <- DAYS$get_catalogs()
Y2010 <- C[['2010']]$get_catalog()
Now we are at "the bottom" of the search path and we find only a collection of datasets. Instead of requesting subsequent catalogs we can now request datasets.
days <- Y2010$get_datasets()
head(days, n = 2)
# $`20101231-JPL-L4UHfnd-GLOB-v01-fv04-MUR_subset.nc`
# Reference Class: "DatasetsRefClass"
# verbose_mode: FALSE
# url: http://www.neracoos.org/thredds/catalog/GMRI/SST/TESTS/NASA_MUR_SST/NorthEastShelf/DailyFiles/2010/20101231-JPL-L4UHfnd-GLOB-v01-fv04-MUR_subset.nc
# children: dataSize date
# datasets: NA
#
# $`20101230-JPL-L4UHfnd-GLOB-v01-fv04-MUR_subset.nc`
# Reference Class: "DatasetsRefClass"
# verbose_mode: FALSE
# url: http://www.neracoos.org/thredds/catalog/GMRI/SST/TESTS/NASA_MUR_SST/NorthEastShelf/DailyFiles/2010/20101230-JPL-L4UHfnd-GLOB-v01-fv04-MUR_subset.nc
# children: dataSize date
# datasets: NA
Note that the 'datasets' attribute is NA - that tells us that have a real data source, not a catalog of data sources.