-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move OSGB transformation hack out of nwp-consumer and into ocf-datapipes #26
Comments
@dfulu is working on this I think in datapipes |
It may already be fixed without me intending to do so. But what would the coordinates look like without #27? |
Does PVnet version right now uses lat/lon of osgb? |
The PVNet modeld oesn't use either. It just uses the NWP image. But the pvnet datapipe inside ocf_datapipes is currently using OSGB. I think there are some issues to sort out before we could drop the OSGB coordinates. At least we need to change the data loading function: https://github.com/openclimatefix/ocf_datapipes/blob/main/ocf_datapipes/load/nwp/providers/ukv.py But also I was looking at the current production NWP data.
So the latitude and longitude are each 2-dimensional? When I select
Which means the latitude varies a little bit depending on x-OSGB. We currently don't have support in ocf_datapipes for two-dimensional coordinates. We always assume the coordinates are independent and form a regular grid. So it would actually be a decent chunk of work to switch to using lat-lons. |
Although it looks like the OSGB coordinates which we calculate might also each be two-dimensional, but we just ignore that? |
Yea the lat lons are a 2-dimension grid. So the rough order of work would be
|
Or perhaps just 1. Move the osgb calculation into |
I made a plot of the OSGB coordinates we calculate in client.py#L281-L290 vs what they would be used 2D OSGB. In the figure below, the red points are our current grid points, and the blue points are the points using 2D OSGB. There are some lines showing the offset between the two. This means that in production we are accidentally offsetting our windows by 3-4 pixels. This is likely something we want to fix NB: this figure needs a download and zoom in to see the gridpoints and offsets properly |
Nice, thanks for doing this, how do we go about fixing it? |
@jacobbieker had suggested we might regrid to the same grid used in our training data. That might be the easiest. It also should work if we can get datapipes working with 2D spatial grids and select directly from lat-lon - but our input images might be slightly distorted compared to the training data. So I don't know which would be best |
Can you do this re-gridding in |
We'd have to write it in, won't have any regridding functionality yet. However, could it not make sense to do it within the consumer? It's likely that every model that uses the UKV data in production will have been trained on the UKV CEDA archive data, so trained using that grid. So each model would need to perform this regridding independently. I can't really think of any uses for the UKV data on the grid it comes in on |
Sorry maybe im confused, what the regridding from and to? |
Ahh right, so @jacobbieker was suggesting, and I think I would agree, that we should regrid the data to the same grid as the training data. Currently, the production data is on a different grid than the training data. It's not just how we access the coordinates of the data, but the data is centred on different spatial points. This also means that we have a slight spatial distortion between the training and production data In the fig below, the blue points are the actual grid (2D) of the production data. In black is the grid of the training data. I'm not sure how much the distortion will affect our results now (probably quite little) but if we moved to bigger image patch sizes it could |
And what grid is the training data on? What grid is the live data on? Are they both lat lons? Perhaps the grid doesnt have a name its just what the data comes in. I like the idea that the nwp-consumer is pretty clean and doesnt do that kinda of stuff, or perhaps there's a option to turn this regridded on in the nwp-consumer. Would we have to save the grid of the training data, in a file. Load this in the nwp-consumer and then re-grid? |
So far as I'm aware, the live (metoffice) data arrives in lat/lon, and it is regridded to osgb x/y as the training (ceda) data is in OSGB. |
The training is on a regularly-spaced OSGB coordinate grid, whereas the production data isn't on a regularly spaced OSGB of lat-lon grid. I suspect the production data is actually on some grid of its own
Yeh that's fair enough. I guess another option is just to have some utility function which could be used in any current and future model apps. I'm not sure it makes sense to do the regridding in datapipes. For regridding, we need to actually load the underlying data, not just the coordinates, in order to do the interpolation to the new grid. To find a value on the new grid via interpolation, other surrounding spatial points must be loaded. I think this means that if we select a NWP window in datapipes, we will need to load the chunk containing the window of interest as well as all surrounding chunks. So if we try to do the regridding lazily, its going be a lot of repeated computation to do the regridding 317 times (once for each GSP) - it would also add to the RAM footprint of each worker. And if we did it eagerly, regridding the whole dataset in RAM before slicing windows it means each worker will also have a copy of the NWP in RAM. One copy of the dataset (data-national) is currently 240MB in RAM. So I think it makes more sense to do the regridding once and save it to disk, then pull from that. We probably could put that regridding, disk cacheing, and reloading into datapipes, but maybe that's messy?
Yeh I think so |
I think we're talking about different things when we say "regridding". I might be wrong, but I thought what we do currently is a translation of the coordinate from lat-lon to OSGB - so the underlying data values stay the same. "Regridding" the way I'm talking about it is to do an interpolation to different spatial points. Like in the figure above we'd interpolate all the blue datapoints (the production grid) to the location of the black ones (the training grid). |
Yeah, I would agree with @dfulu on regridding the data once and resaving it to disk, or having it in the consumer, since one of the ideas of the consumer is that it keeps the prod and training data the same. According to this, the operational UKV is not on a regular grid at all, there is an inner 1.5km grid that then slowly becomes less dense until it is a 4km grid on the edges, so that might be the cause of the mismatch. |
Ah ok, think I agree, so perhaps the task is for the nwp-consumer to regrid the data do a regular osgb grid, then itll be the same as the training data |
Indeed, and with this in mind, perhaps it makes sense to interpolate in the consumer. As long as we're happy with the zarrs it produces not being the data as was from that source! However, if we are interpolating, is OSGB the best thing to choose? What with us now consuming NWP data from India, Malta etc, perhaps we should set the standard now to lat long for forward compatibility, and do a batch regrid of the training data? |
I think in general we should try to use lat/lon grids, as this is what other people do. The Metoffice is probably a special use case, where the training data is in OSGB grid, and therefore I think the nwp-consumer should also make it like this. Perhaps we can have an option to turn this on/off for the Metoffice data. This means we could leave it in lat/lon grid if we decided to use that in the future |
I think that the training data must have been regridded at some point, so I think thats fine. Also like Jacob said, for model performance the biggest concern should be that the training and production data are the same. So I'd go towards making our production data look as close as possible to the training data rather than a close as possible to the source data |
Do we save the MetOffice data as we go into a rolling archive? If we're downloading it anyway for the production systems we should push it to leonardo too, that way going forward we'd have an archive of data exactly as it was in live, also in lat long. Then we could do a batch interpolate of the old ceda training data to match, and then we'd be consistent going forward. I appreciate this would maybe mean retraining (although in live the data is lat long anyway, so the model must be able to cope). Seems to me that hinging our systems around legacy training data that is incompatible with both the norm as per Peter's comment and our ability to more easily expand into other parts of the globe would artificially hold us back. |
Having a rolling archive would be great from tracking down any blips in the forecast! |
Yea, that could be a good task for Dagster to go and do,
|
Could we (just discussing this with @devsjc )
|
I think we need to do the regridding step somewhere, not just transform the coords. Here's how we might do the regridding properly:
It takes about 10s on my GCP instance, but its got a lot of cores. |
In the image above the white boarder is missing data - i.e. in production we have a smaller spatial area than we do in training. I didn't know this was the case actually, and it is something we'll need to think about if we want to use bigger NWP window sizes - especially considering around the Cornwall and east Anglia GSPs |
Thats good to check, perhaps you can run training with these bigger areas first, see the effect and then we can think what to do about it. It oculd be a case of using ECMWF a bit earlier as we wont be restricted then |
This shows all the patches we are currently using. So thankfully at the minute, we should be far enough from the boundaries that this doesn't matter. This is with window size of 24 pixels. If we were to double to 48 (semi-planned at some point) I think we'll be at the cusp of needing to deal with this properly |
Should we move this to a seperate github chat about making NWP bigger? |
I think @devsjc and I and I have found the grid that the production data comes on. It is using a Lambert Azimuthal Equal Area projection which I presume has been centred on the UK. Here's the link the to metoffice doc. So I think that means we will still want to reproject the production NWP data onto another grid to match the training data |
@dfulu you mind adding the code, that you used on GCP The other option is the NWP-consumer saves teh data how it is. Then PVnet does the regridding There's something quite nice about the NWP-consumer saving the data how it comes, and then pvnet needs it in a certina way, so it re does it |
So I think we found that the gird from the live consumer isn't quite the same as the CEDA grid. The picture above @dfulu illustrates that well. Its a few pixels different. This means the ML model will see a slightly different image (not PVnet takes an image in, not a point) So I think the options are
I think I'm tempted to put this in 2. |
I think my preference would be 1 or 4 |
Thanks @dfulu for this. Maybe we should try it in 4, and see how it works, other we could move it to 1 |
In order to quickly deploy the consumer and have it work with the soon to be retired PVNet1, which expects the "x" and "y" coordinates to be (poorly labelled) OSGB coordinates, a "quick fix" was put in place in the consumer to approximate those values. (See
inputs/metoffice/client.py
) This is not an optimal end position however, as OCF are moving away from OSGB and towards lat/long anyway, not to mention the ambiguity that a bare "x" and "y" coordinate add to the dataset.Instead, this logic to convert from lat long to OSGB should be implemented in ocf-datapipes, then the version of that library being used in PVNet1/2 can be bumped accordingly.
As such, this issue mostly requires code changes in ocf-datapipes and the PVNet1/2 repos. The net change in this repo should be exactly the inverse of that which was added with the implementation of the "quick fix" in #27, i.e. removing all that was added there and nothing else.
The text was updated successfully, but these errors were encountered: