Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Algoa Bay Forecast] NetCDF structure, conventions, etc. #13

Open
zachsa opened this issue Jul 6, 2022 · 12 comments
Open

[Algoa Bay Forecast] NetCDF structure, conventions, etc. #13

zachsa opened this issue Jul 6, 2022 · 12 comments
Assignees

Comments

@zachsa
Copy link
Collaborator

zachsa commented Jul 6, 2022

Hi @mattcarr03.

I'm working with the processed model output using GDAL / raster2pgsql (actually I think this also might be GDAL, wrapped by PostGIS). This work relates to #12.

Running the raster2pgsql CLI I get a warning:

Warning 1: No UNIDATA NC_GLOBAL:Conventions attribute

Do you know what it would take to write the processed NetCDF file in a way that is compliant with CF conventions (climate and forecast). And is this worth doing?

I also ran this past @Marc-Pienaar to see if he has any insight

This is the command

raster2pgsql \
  -d \
  -q \
  -I \
  -t auto \
  -R \
  NETCDF:"/path/to/hourly-avg-20220705-normalized.nc":temperature \
  public.test_raster_input \
  | psql \
    postgres://admin:password@localhost:5432/somisana_local
@zachsa
Copy link
Collaborator Author

zachsa commented Jul 6, 2022

This is the printout of the header using xarray

<xarray.Dataset>
Dimensions:      (time: 240, depth: 20, lat: 106, lon: 152)
Coordinates:
    lon_rho      (lat, lon) float32 ...
    lat_rho      (lat, lon) float32 ...
  * depth        (depth) float32 -0.975 -0.925 -0.875 ... -0.125 -0.075 -0.025
  * time         (time) datetime64[ns] 2022-06-30T01:00:00 ... 2022-07-10
Dimensions without coordinates: lat, lon
Data variables:
    temperature  (time, depth, lat, lon) float32 ...
    salt         (time, depth, lat, lon) float32 ...
    u            (time, depth, lat, lon) float64 ...
    v            (time, depth, lat, lon) float64 ...
    m_rho        (time, depth, lat, lon) float64 ...
Attributes:
    description:  CROCO output from algoa Bay model transformed lon/lat/depth...

This is the printout using gdalinfo

zach@ZACH-SP4:~/code/saeon$ gdalinfo /home/zach/somisana-data/hourly-avg-20220705-normalized.nc
Driver: netCDF/Network Common Data Format
Files: /home/zach/somisana-data/hourly-avg-20220705-normalized.nc
Size is 512, 512
Metadata:
  NC_GLOBAL#description=CROCO output from algoa Bay model transformed lon/lat/depth/time
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"/home/zach/somisana-data/hourly-avg-20220705-normalized.nc":temperature
  SUBDATASET_1_DESC=[240x20x106x152] temperature (32-bit floating-point)
  SUBDATASET_2_NAME=NETCDF:"/home/zach/somisana-data/hourly-avg-20220705-normalized.nc":salt
  SUBDATASET_2_DESC=[240x20x106x152] salt (32-bit floating-point)
  SUBDATASET_3_NAME=NETCDF:"/home/zach/somisana-data/hourly-avg-20220705-normalized.nc":u
  SUBDATASET_3_DESC=[240x20x106x152] u (64-bit floating-point)
  SUBDATASET_4_NAME=NETCDF:"/home/zach/somisana-data/hourly-avg-20220705-normalized.nc":v
  SUBDATASET_4_DESC=[240x20x106x152] v (64-bit floating-point)
  SUBDATASET_5_NAME=NETCDF:"/home/zach/somisana-data/hourly-avg-20220705-normalized.nc":m_rho
  SUBDATASET_5_DESC=[240x20x106x152] m_rho (64-bit floating-point)
  SUBDATASET_6_NAME=NETCDF:"/home/zach/somisana-data/hourly-avg-20220705-normalized.nc":lon_rho
  SUBDATASET_6_DESC=[106x152] lon_rho (32-bit floating-point)
  SUBDATASET_7_NAME=NETCDF:"/home/zach/somisana-data/hourly-avg-20220705-normalized.nc":lat_rho
  SUBDATASET_7_DESC=[106x152] lat_rho (32-bit floating-point)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

@zachsa
Copy link
Collaborator Author

zachsa commented Jul 6, 2022

At a guess, it looks like the srid may not be correct (at least I can't see what it is).

Thanks @Marc-Pienaar - https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html

@Marc-Pienaar
Copy link

Marc-Pienaar commented Jul 6, 2022 via email

@mattcarr03
Copy link
Collaborator

Hi Zach,

I have just been having I look now, and I think the problem is that the model output in on a curvilinear grid, therefore the lon and lat coordinates are 2D rather than 1D. Therefore, the dimensions are 1D grid points 1 to 152 and the 2D coordinates are saved as variables, this sounds like it wouldn't work looking at @Marc-Pienaar comment. Sorry, I wasn't sure if gdal could handle this.

We can re-grid the data to a regular grid using python, in the post-processing step. That way we can have 1D lon and lat saved as dimensions. Which might help with the error, I imagine working with a regular grid makes things a lot easier.

I will push the changes shortly

Matt

@zachsa
Copy link
Collaborator Author

zachsa commented Jul 7, 2022

Hi @mattcarr03 - thanks.

What do you mean by a 2D vs 1D grid? I'm not sure I follow - I see longitude / latitude roughly as 'x' an 'y' coordinates on a Cartesian plane (i.e. a 4326 project). My understanding is that any map image that I see that is in the 4326 SRID has to be projected to a flat surface, but that this is done behind the scenes and not something I have to think about as far as I know.

If I'm following correctly... I understand that the CF conventions require separate dimensions for lng/lat, whereas the current lng/lat is a single 2D dimension?

I am still fumbling my way through the GDAL tool (actually, I'm using the raster2pgsql tool that uses GDAL under the hood). This tool registers rasters in a postgres table and then allows me to run GDAL commands on the raster file via PostgreSQL's SQL interface. I'm getting extremely odd information back from the GDAL commands - such as GDAL telling me there are 4 800 bands in the processed data (and 800 bands in the model output). Obviously this doesn't make much sense... (and I guess could be related to a single dimension with 2D data if I understand the above correctly). I haven't tested your changes yet - will do so soon, but I'd still like to understand.

I was imagining a single 'band' as one possible N D raster - i.e. I thought that a single variable in the NetCDF file would correspond to a single band in a raster file. Is this correct/not correct? (@Marc-Pienaar - do you have any insight here? from what I have read the word 'band' doesn't seem to have a fixed meaning across different types of rasters).

@zachsa
Copy link
Collaborator Author

zachsa commented Jul 7, 2022

... Yes @mattcarr03. It's quite challenging to provision an environment where xesmf will be available at deployment time. So far I can see that:

xesmf => depends on esmpy => depends on esmf with special requirements for a netcdf install

@zachsa
Copy link
Collaborator Author

zachsa commented Jul 7, 2022

@zachsa
Copy link
Collaborator Author

zachsa commented Jul 8, 2022

Actually, the large number of bands does make sense (i.e. when I load the raster into PostGIS via raster2pgsql I see 4800 bands)

For the temperature variable, there are:

  • 240 times steps
  • For each time step there are 20 depths

That means there 4 800 'slices'. PostGIS (i.e. GDAL) is seeing 4 800 x/y grids. Each grid is 152 x 106 pixels, and each pixel is a temperature value. So I am actually seeing the data in PostGIS.

@mattcarr03, I did look at the tool for regridding the data in the python step. Since it's difficult to install I'm still trying other things.

... it never actually occurred to me to ask why the image that you showed me didn't have straight edges. But @Marc-Pienaar showed me an illustration of GPS coordinates of curvilinear grid... and it does look a lot like the output of temperature at one depth that you previously showed me.

If we regrid, wouldn't that image then be changed?

figure

@zachsa
Copy link
Collaborator Author

zachsa commented Jul 8, 2022

@mattcarr03. also, could you send me an example of regridded file?

zachsa added a commit that referenced this issue Jul 8, 2022
…d created a new file that contains that work as a reference (#13)
@mattcarr03
Copy link
Collaborator

Hi @zachsa, sorry for the delayed replies, I'm sending one over email shortly and will respond to the other comments.

@mattcarr03
Copy link
Collaborator

Sorry if the previous comments weren't clear.

I was trying to illustrated that the data is not set on a regular/square grid. Instead the data is rotated and curved. Therefore, the data points each have a unique lon and lat points, as opposed to sharing longitude points across a vertical slice (or latitude points across a horizontal slice).

This means that the lon and lat co-ordinates need to be represented on a 2D matrix, rather than a 1D array. However, the netcdf convection requires the dimensions to be a single 1D array. Therefore, the output netcdf uses the number of grid points as the x and y instead of real world coordinates. The coordinates are saved as "coordinates" which I understand to be an extra class similar to variables.

By regridding to a regular grid (constant longitude for a vertical slice/constant latitude for a horizontal slice) the dimensions of the netcdf can be a 1D array of real world coordinates. This imagine is much easier to convert into a raster.

I checked the regridding and the image in not changed, there is lots of empty data/whitespace around the image to maintain the curve and rotation.

Additionally, I have managed to regrid using the scipy package (https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html), however this requires looping through time and depth so it is not nearly as eloquent as the xesmf solution.

@zachsa
Copy link
Collaborator Author

zachsa commented Jul 8, 2022

image

@zachsa zachsa changed the title [Algoa Bay Forecast] Processed model output should adhere to relevant NetCDF conventions [Algoa Bay Forecast] NetCDF structure, conventions, etc. Jul 11, 2022
zachsa added a commit that referenced this issue Jul 11, 2022
… misconfigured postgis (since this script doesn't use postgis). Updated the loading function to also insert coordinate datasets from netcdf into postgis (#12, #13)
zachsa added a commit that referenced this issue Jul 13, 2022
…teps, since its hours since x date. And updated the scheduled algoa bay forecast run to be at 6AM SAST to avoid any potential problems with UTC 00:00 time zones. Also, previously work included setting the boundary data job to restart on failure (#12 #13)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants