-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr group JSON file issues #134
Comments
To add some specifics, this lead to a difficult to diagnose issue when trying to implement #120. Because the same file name was used (test_compression.nc) for multiple test cases (with and without byte shuffled data), the Zarr metadata file would not be updated. This lead to incorrect chunk sizes and offsets, and ultimately decompression failure. It's not clear why this did not also affect the test when using local storage. |
Worked it out (and updated the description) - for local files we generate the JSON file in the same directory as the netCDF file, which in the test case is in a temporary directory. This is not persistent, so does not suffer from the issue. |
When using multiple netCDF files with the same names, the Zarr group JSON file would previously not be overwritten after it was first written. This would lead to subsequent uses potentially using an invalid Zarr group metadata file. This change switches to use a temporary file to store the Zarr group metadata. This should not be a problem because the Zarr datasource is cached in the Active object as the _zds member between operations. Closes #134
PyActiveStorage writes out a Zarr group JSON metadata file when processing a variable in a dataset. The filename is
<netcdf file basename>_<variable name>.json
. If using local storage it is written to the same directory as the netCDF file. If using S3 storage it is written to the current directory. If a file of the same name exists, it is not updated. The files are never removed.This leads to various issues when reusing netCDF filenames for different datasets (e.g. during testing), since the Zarr group metadata may be describing a previous incarnation of the dataset.
It also leads to an undesirable build up of JSON files.
Zarr dataset metadata is cached on an
Active
instance, so there isn't much to be gained from leaving the JSON files around to be reused.I propose we use a temporary file to write the Zarr group metadata, and immediately remove it once it has been used.
The text was updated successfully, but these errors were encountered: