-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convention for HEALPix grid parameters #433
Comments
@uweschulzweida, thanks for your question. Maybe it can be treated as grid mapping. Does it have two horizontal coordinate variables, like the existing grid mappings of Appendix F? All of those are methods for converting between the ( |
I have not found anything suitable in Appendix F. In any case, only the two parameters NSIDE/ORDER are needed to calculate the grid coordinates. For a HEALPix grid it does not make sense to store the coordinates in a NetCDF file. I think I will save the parameters as described above for the time being. If someone has a better idea I can change it. |
You're right, HEALPix is not in Appendix F at the moment. I see from wikipedia that it is a map projection, or a class of map projections. If you could provide a definition of its parameters in the form of the other entries of Appendix F, we could certainly consider adding it. You remark, "For a HEALPix grid it does not make sense to store the coordinates in a NetCDF file." Sect 5 of CF expects latitude and longitude coordinates to be provided as 2D auxiliary coordinate variables if the horizontal coordinates aren't latitude and longitude (as in this case). This is mandatory, whereas the grid mapping is an optional extra. The reason for it is to make the data self-explanatory and useful (the aim of the CF convention, in general), by enabling generic applications to geolocate the data. Most applications will not be aware of the HEALPix grid, for instance, but will understand latitude and longitude. Best wishes Jonathan |
I'm missing something here @JonathanGregory. When we use a map projection we provide coordinates which can only be turned into lat/lon by using information about the map projection itself. In Healpix, the dimension index (singular) plus the Section 5 says:
Which I take to mean that given Healpix is a well known formalism for whch a grid mapping variable can be provided, and so we don't need latitude and longitude. |
Dear Bryan @bnlawrence Actually it was me who was missing something! I didn't know that the convention had been changed, such that the 2D lat and lon coordinates are no longer mandatory if the grid mapping is provided. This change was introduced in version 1.8 by issue 179, which I had never seen before this morning! I'm not disputing it, and I understand the reasons for changing it. I suspect I didn't notice it because it was proposed during the 2019 CF meeting in Tacoma, which I didn't attend in person. There may have been several things initiated then, which were of course properly followed up in issues, but since I'm quite attentive to the CF discussions, my oversight shows there's a danger when a large number of things are done at once that they may not receive the same scrutiny as usual. That's something to keep in mind. So you're right, Healpix can be included without 2D aux lat and lon coordinates if it can be described by a grid mapping in the format of Appendix F, as discussed with @uweschulzweida. Can that be done? Best wishes Jonathan |
Hello, It's interesting to note that the 5.6 text implicitly assumes that horizontal coordinates of some type are always present. What if there are no projection coordinates? This is potentially relevant because it's not obvious to me what auxiliary coordinates, if any, would be for the data variable's Thanks, |
I think it's crucial to understand that the choice of order defines the domain filling curve to be used: (Figure from Górski et al, 2004, https://iopscience.iop.org/article/10.1086/427976/pdf, shows pixelation for |
(Incidentally there is a lot of confusion in people talking about healpix in our community because at the same time as introducing healpix, people are introducing hierarchical datasets, that is, storing lots of versions of the same variable at different pixelations, so that folks wanting low resolution data can read a lower resolution pixelation. E.g. in practice folks talking about storing multiple zoom levels where |
Thanks, @bnlawrence, I now understand the suggested So there are no auxiliary coordinates for the |
@uweschulzweida I see that the zoom level seems pretty fundamental to how DKRZ are using healpix. Do you think the zoom level should be in the attributes directly, rather than say, nside? |
... or would you store the latitudes and longitudes as auxiliary coordinates, with the grid_mapping there to show provenance (and to define the cell edges) |
The zoom level can be easily calculated from the nside parameter. I don't think this should be stored as an attribute, since the zoom level is only applied to nested ordering. |
Well, zoom level and nside are intimately related right: Why do you say it is only applied to nested ordering? My reading of the DKRZ documentation suggests it is being used as a query parameter into a dataset with a number of variables each with a different zoom level. It seems to me that the "interesting" information from the metadata point of view is the zoom level, not the nside parameter as it's not directly related to the way variables are chosen. |
Yes, the zoom level is important for us! We store the data in zarr archives. Each zoom level is a separate dataset with all variables. The name of the zarr archive contains the zoom level that is accessed via the query request. |
Given that, it feels like it would be more useful to expose the zoom level in the CF metadata, which means that tools which harvest that metadata can use it, and of course, the tools which read the data can trivially calculate (Obviously it's trivial both ways, but I figure the representation which needs least conversion should be the one that is harvested to catalogs and/or visible when lazily loading.) |
Just been discussing what we might put in Appendix F with @davidhassell; suffice to say we'd need a reasonably complete description of what is going on with HealPix. We'd also want, we think, to recommend (but not mandate) the use of two 1D auxiliary coordinate variables with the lat/lon of the pixel centres so as to simplify data usage by those that don't want to utilise healpy or equivalent. |
just discovering this discussion now. We actually did the same exercise to add the HEALPix grids into WMO GRIB2 standard (I suspect for the same project than the one reported by @uweschulzweida ): GRIB2 issue I have 2 comments:
In the GRIB2 proposal we also added extra keys to specify if the observable is valid at the center of the pixel, at edges, etc. You may want to do that as well (although I must admit I would need to read again how grids and projections are handled in CF). |
Thanks @sebvi, really glad you've jumped in here!
There is a lot of other material in your various templates, is any of that relevant here? I note you also have table entries for edges and vertices. Is that mandatory or have you added that so that the information can be provided if desired? |
ECMWF has an (indirect) interest in seeing this going forward as many of our end users might want to retrieve the data in HEALPix GRIB2 and then convert to HEALPix netCDF to then use with their preferred tools. The closer the metadata the easier the conversion. :)
The extra keys in GRIB2 were added for several reasons but mostly to be covered for future requests:
EDIT: I 've hit the button too quickly |
I'm pretty keen that we don't introduce unnecessary semantic mismatches. I don't see any reason for doing so here. I think we can build a proposal based on this. I think the important point is that we would only need one mandatory additions to the original CDL (i.e. something like a mandatory I am in two minds as to the necessity for scanning order in netCDF, what do others think? If we did it, we could do it with optional additional parameters? I think we can optionally use auxiliary coordinates for the vertices etc. Will think more about that. |
Incidentally the document produced for the GRIB appendix provides a useful intro to all this for newcomers! |
eyes=thanks! |
@bnlawrence asks
From my outside perspective I think that transferring the concept of [different] scanning orders from GRIB to netCDF is not helpful. In the GRIB "world" I do see the need and underlying reasons (as @sebvi hinted at) where GRIB is essentially a machine-to-machine format where scanning order is a well established fact. This is not the case for netCDF, which is a data format widely used across rather different communities. And scanning order would be a new concept in the netCDF "world" (where I, like @sebvi, have questions regarding the actual need/use case, e.g. with respect to efficiency). Across the different communities using netCDF there is already widespread confusion regarding X, Y, Z vs. latitude, longitude (see here for a rundown of anecdotal evidence). Hence, I think that unifying GRIB different scanning orders into what is established in netCDF is a perfect task for GRIB-to-netCDF converter tools. |
Dear Sebastien @sebvi, Bryan @bnlawrence et al. Thanks for the discussion on this issue, on a convention for HEALPix in CF. Is someone in a position to make a definite proposal, to take it forward? Best wishes Jonathan |
Hello,
I was wondering what the use cases might be for storing data on edges and vertices. It would seems to me that you'd only want to do this if there are well defined mappings between the edges/vertices and the pixels (faces) themselves - is that the case? You could of course use UGRID to to store such information, but then the structure is lost, which seems counterproductive! Thanks, |
Hello, I've been thinking about this again, as there is going to be Hackathon on trying to progress this issue at the 2024 CF workshop (currently scheduled for 08:00 UTC on Thursday 19th September). There was (I think) a consensus to use a
but we also need to identify which data dimension the grid mapping applies to. In the above example there is only one data dimension, but there could be many, and there could be more than one that is the same size as the HEALPix dimension. For instance, let's introduce a
Normally we associate the grid mapping with the appropriate dimensions via coordinate variables (section 5.6) - either implicitly via their standard names, or else with an explicit format that references their netCDF variable names. This is not possible when there are no coordinate variables! Perhaps the obvious solution is to allow the explicit
Are there any problems with this approach, I wonder? Thanks, |
Dear @davidhassell Your proposal for HEALPix looks sensible to me. You are proposing that the second syntax in Sect 5.6 for Earlier, you asked whether the HEALPix dimension could have optional auxiliary coordinates of latitude and longitude. I think Yes. Even though they're not mandatory any more, they could be recommended, because it makes the data useful to analysts and programs which don't know how to work out the coordinates. Best wishes Jonathan |
Dear Jonathan,
Indeed.
Ye, I think that's right. I don't we need to recommend it though, as we don't for any other types of grd mapping - we say instead in 5.6: "Such coordinates [lats and lons] may be provided in addition to the provision of a grid mapping variable, but that is not required.". Not being willing or able to use the grid mapping to create missing lats and lons is equally problematic for any projection, including HEALPix - possibly less to for HEALPIx, because at least you know that it has global coverage. |
I am unclear why the former requirement to produce the 2D lat and lon aux coord vars, when it was abolished, wasn't replaced with a recommendation to do so. Perhaps this should be revisited, because it can certainly help with the usability of data.
|
That's fine by me. |
This discussion has made me think that there may be an implication for the CF data model, here. The Coordinate Reference construct "relates the coordinate values of the coordinate system to locations in a planetary reference frame". What we'll have in the HEALPix case is a relationship between "a Domain Axis construct and locations in a planetary reference frame", and the ordering of the domain axis construct must be consistent with that defined by the Coordinate Conversion. |
You could argue for HEALPix that there is an implied coordinate value, which is an integer index starting from zero for the first element along the dimension. Perhaps the extended syntax you suggested above Thinking again about this, I wonder whether in the HEALPix case we ought to require latitude and longitude auxiliary coordinate variables as well, or at least strongly recommend them, because without them a non-HEALPix-aware software has no good way to make a plot of the data. In other cases of non-latitude-longitude horizontal coordinates, you can still plot the data with the supplied 1D coordinate variables. |
Dear Jonathan, Thanks for these thoughts ... I'd like to argue the other side, to see where we get to :)
I would instead say that such indices are not a coordinates, as they serve no geolocation purpose and cannot be converted to coordinates in another CRS. So I still think we need to extend the data model. Another application of this extension could be for tripolar grids. For these, we currently have no way of encoding which is the X and which the Y axis of the horizontal grid, information which is essential for operations such as regridding. Perhaps we could use the "grid mapping + dimensions" idea to make this connection, and also provide a place to describe how the particular tripolar grid was created. I.e. something like
In the traditional grid mapping case (e.g. There is of course more to it than that. In the transverse mercator case you at least know that adjacent cells in the array are also physically contiguous (assuming that there are no "gaps" in the coordinate variables). Not so in the HEALPix case, although you do know that the HEALPix grid has global coverage. |
Hello, Just to let you know that we'll be working on this issue in the 2024 CF workshop during a hackathon session on Thursday 19th September (10:00–12:30 UTC+2). Thanks, |
Hello, I am experimenting with HEALPix for earth observation data and was wondering how to store such data on NetCDF. Reading the conversation, I came across GRIB2's design choice to allow nside != 2^k, which definitely has tremendous value.
I would like to mention the xy-HEALPix plane coordinates which I have not seen mentioned here. There, each pixel is given as the triple of face index and xy-coordinate within the face. The reason I want to mention this is the limitation I see in using only the ring index for nside != 2^k. The advantage of the nested index is data proximity for continuous data blocks. However, we can also achive this in netCDF using array chunks in the xy-coordinates. Another point is the fact, that we do not loose the "zooming" feature, because we just need to lower the resolution of the face images, e.g. average 3x3 cells in a 192x192 face to get a 64x64 face. Hope this might be of interest to this discussion, |
We will soon be producing a lot of data on a HEALPix (Hierarchical Equal Area isoLatitude Pixelation) grid. The grid coordinates can be calculated easily from 2 parameters. The NSIDE parameter controls the resolution of the pixellization and the ORDER parameter sets the index ordering convention of the pixels (ring or nested).
I think it would be good to have a convention for storing these parameters in NetCDF. My idea is to define these parameters via the grid_mapping attribute, for example:
This defines a HEALPix grid with nside=64 and a nested index ordering.
Can this be taken over into the CF Convention or do you have a better suggestion?
The text was updated successfully, but these errors were encountered: