Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide default variable encoding parameters for xarray's to_netcdf #72

Open
chpolste opened this issue Jun 5, 2023 · 0 comments
Open

Comments

@chpolste
Copy link
Collaborator

chpolste commented Jun 5, 2023

xarray.Dataset.to_netcdf has an option for encoding, allowing data to be written as int16, saving 75% of disk space compared to direct storage of double and 50% compared to float. Compression can additionally be applied for further savings.

The encoding requires an offset and scale factor to be chosen for each variable, so that the values can be stored as

value_unpacked = scale_factor * value_packed + add_offset

The (potential) loss of precision is usually no problem for atmospheric variables (most reanalysis data is delivered like this anyway), as long as the scale factor and offset are appropriately chosen. This is where some specific knowledge about sensible ranges for each field stored is required.

Users could always choose their own, but we might want to provide a default set of encodings for the output variables of the QGField properties. They need to be general enough to match any season, hemisphere, region, etc. but specific enough to retain precision. I suggest we work out a set of values in this issue and ship these conveniently with the package eventually.

See also:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants