NCZarr Support Part I: Local Datasets #884

openSourcerer9000 · 2021-09-20T18:45:34Z

In response to #672, I've added in the logic to handle Zarr datasets by passing them through to netCDF-C's NCZarr protocol. protocols/zarr.py takes a Zarr dataset specified in any of the following formats and returns a valid NCZarr URI (specified here), to be recognized by netCDF-C:

"http://s3.amazonaws.com/bucket/dataset.zarr"
"http://s3.amazonaws.com/bucket/dataset.zarr"#mode=nczarr,s3
"/home/path/to/dataset.zarr"
Path('/home/path/to/dataset.zarr')
"file:///home/path/to/dataset.zarr"
"file:///home/path/to/dataset.randomExt#mode=nczarr,file"
"file:///home/path/to/dataset.zarr#mode=nczarr,zip"

Note that so far, this will only work on LOCAL datasets, with the default libnetcdf build installed when compliance-checker is set up with conda. NCZarr is also only fully supported in Linux at the moment, I added an OS check to pass this caveat through to a user trying to run on a Zarr from another OS.

Getting S3 support down for netCDF-C is an ongoing effort. Once it's solid, the S3 test that is currently commented out in test_cli.py should pass and it should work on S3 Zarr datasets.

Update: it looks like this is on the home stretch, AWSome!

While I was in test_protocols.py, I also refactored it to use Pytest, continuing the upgrade to Pytest.

#test_protocols.py

id_url = {
# Check that urls with Content-Type header of "application/x-netcdf" can
# successfully be read into memory for checks.
'netcdf_content_type':"https://gliders.ioos.us/erddap/tabledap/amelia-20180501T0000.ncCF?&time%3E=max(time)-1%20hour",
# Tests that a connection can be made to ERDDAP's GridDAP
'erddap':"http://coastwatch.pfeg.noaa.gov/erddap/griddap/osuChlaAnom",
# Tests that a connection can be made to Hyrax
'hyrax':"http://ingria.coas.oregonstate.edu/opendap/hyrax/aggregated/ocean_time_aggregation.ncml",
# Tests that a connection can be made to a remote THREDDS endpoint
'thredds':"http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg_ana/TP",
# Tests that a connection can be made to an SOS endpoint
'sos':"https://data.oceansmap.com/thredds/sos/caricoos_ag/VIA/VIA.ncml",
}

When run they look like:

test_protocols.py::TestProtocols::test_connection[netcdf_content_type] PASSED
test_protocols.py::TestProtocols::test_connection[erddap] PASSED
test_protocols.py::TestProtocols::test_connection[hyrax] PASSED
test_protocols.py::TestProtocols::test_connection[thredds] PASSED
test_protocols.py::TestProtocols::test_connection[sos] PASSED

…ows compatibility problem was fixed

ocefpaf · 2024-05-22T10:02:19Z

Closing in favor of #1071.

openSourcerer9000 added 17 commits September 9, 2021 14:46

add dataset

13ad032

Pass through nczarr options to cmd line

d42dc15

undo pytest changes, merge from separate branch

d345225

Merge commit '7fa549774b0e01d45963ae76f4dbcc7d94dbb176' into zarrSupport

6fe58f9

zarr protocol

4b5c338

Upgrade test_protocols to pytest

f92e1b3

Dataset from zarr

2546e06

cmd line tests

41005cf

run pre commit

d71c092

Cleanup

f0c6eee

whitespace

3e674d4

OS check updated

a51308e

invalidSchema handling

dce5148

pytest skipif libnetcdf older than nczarr

7a5bf09

testing on ubuntu

7e37bf2

skipif logic

a456bcc

remove forced downgrade of libnetcdf<4.8.0 in github actions, as wind…

Loading
Loading status checks…

11eec91

…ows compatibility problem was fixed

benjwadams added the in progress label May 21, 2024

ocefpaf force-pushed the zarrSupport branch from 3acf764 to 11eec91 Compare May 21, 2024 17:54

ocefpaf mentioned this pull request May 21, 2024

Dataset from zarr #1071

Merged

ocefpaf closed this May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCZarr Support Part I: Local Datasets #884

NCZarr Support Part I: Local Datasets #884

openSourcerer9000 commented Sep 20, 2021

ocefpaf commented May 22, 2024

NCZarr Support Part I: Local Datasets #884

NCZarr Support Part I: Local Datasets #884

Conversation

openSourcerer9000 commented Sep 20, 2021

ocefpaf commented May 22, 2024