Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example file - L3BIN and Parquet format (just binlist and chlor_a values) #5

Open
mdsumner opened this issue Dec 23, 2024 · 1 comment

Comments

@mdsumner
Copy link
Member

mdsumner commented Dec 23, 2024

This Parquet file has a subset of an original BIN file.

https://github.com/hypertidy/L3bin/blob/master/inst/extdata/AQUA_MODIS.20241125.L3b.DAY.CHL.NRT.parquet

The BIN file from here: https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/AQUA_MODIS.20241125.L3b.DAY.CHL.NRT.nc

R code to convert (using rhdf5) just the bin index and the chlor_a values:

library(croc)  ## gh: sosoc/croc
file <- "AQUA_MODIS.20241125.L3b.DAY.CHL.NRT.nc"

d <- cbind(read_binlist(file), read_compound(file, "chlor_a"))
#  bin_num nobs nscenes  weights    time_rec chlor_a_sum chlor_a_sum_squared
#1  276025    3       1 1.732051  3019994880   0.1000465         0.005855022
#2  276026    7       2 3.449490  7046648832   0.2427317         0.017250253
#3  276027   10       2 4.000000 10066643968   0.3045491         0.023477813
# ...

arrow::write_parquet(d, gsub("nc$", "parquet", file), compression = "zstd")

The L3 grid configuration is NROW = 4320.

 (info <- rhdf5::h5ls(file))
Datatype: binDataType
Datatype: binIndexType
Datatype: binListType
                 group                name       otype   dclass     dim
0                    / level-3_binned_data   H5I_GROUP
1 /level-3_binned_data            BinIndex H5I_DATASET COMPOUND    4320
2 /level-3_binned_data             BinList H5I_DATASET COMPOUND 1956874
3 /level-3_binned_data          binDataDim H5I_DATASET    FLOAT       0
4 /level-3_binned_data         binIndexDim H5I_DATASET    FLOAT       0
5 /level-3_binned_data          binListDim H5I_DATASET    FLOAT       0
6 /level-3_binned_data             chlor_a H5I_DATASET COMPOUND 1956874
7                    /  processing_control   H5I_GROUP
8  /processing_control    input_parameters   H5I_GROUP

mdsumner added a commit that referenced this issue Dec 23, 2024
@mdsumner
Copy link
Member Author

mdsumner commented Dec 27, 2024

Justus (keewis) provided this code for use in Python to unpack the structured arrays

xarray.open_datatree("AQUA_MODIS.20241125.L3b.DAY.CHL.NRT.nc")

#the way to transform the recarray into variables is:
def extract_structured_variables(rec):
    return xr.Dataset({name: (rec.dims, rec.data[name]) for name in rec.dtype.names})

def process_recarrays(node):
    return xr.DataTree.from_dict({name: extract_structured_variables(var) for name, var in node.data_vars.items()})

def postprocess(tree):
    return tree.assign({"/level-3_binned_data": process_recarrays(tree["/level-3_binned_data"])})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant