Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix shape masking #6129

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from
Draft

Conversation

hsteptoe
Copy link
Contributor

🚀 Pull Request

Description

Fixes #6126 by exposing the shape geometry to the user. Adds to docs and docstrings to make it clearer how to use this.


Consult Iris pull request check list


Add any of the below labels to trigger actions on this PR:

  • benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts

@pp-mo
Copy link
Member

pp-mo commented Aug 28, 2024

From @SciTools/peloton : discussed what we are seeing here and in #6126.
We think this is looking useful, but the two of you seem to have it covered @hsteptoe @acchamber ?
Please shout if you need help.

@hsteptoe
Copy link
Contributor Author

Discussion with @acchamber notes:

  • Don't move imports out of functions as this creates circular import errors
  • Check target_crs assignment for occasion when cube doesn't have a CRS
  • Re-write conversion of 0-360 -> -180-180 coords in case of degree based CRSs

@hsteptoe
Copy link
Contributor Author

What are @SciTools/peloton (@pp-mo) thoughts about adding rasterio as a Iris dependency?

This would facilitate better handling of shapes to rasters (which is essentially the problem we're solving) with the added bonus that we could also mask to other shape types, like lines, to extract a trajectory from a Cube, for example.

@acchamber
Copy link
Contributor

I've had a few other discussions with other users now where they can't use this function with OSGB shapefiles which this PR would fix. Where are we on being able to include it in the next release?

@hsteptoe @pp-mo

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Dec 2, 2024

I've had a few other discussions with other users now where they can't use this function with OSGB shapefiles which this PR would fix. Where are we on being able to include it in the next release?

@hsteptoe @pp-mo

I was planning on going to the AVD surgery this week to discuss some changes and/or make the case for rasterio

@trexfeathers
Copy link
Contributor

Thanks for your patience everyone. It's sometimes a struggle to balance everything and we just haven't had an opportunity to discuss this. Coming to the Surgery (UK Met Office) is an ideal next step.

@stephenworsley
Copy link
Contributor

@pp-mo, @trexfeathers and @stephenworsley have discussed this at the Surgery and agree in principle that we could consider adding rasterio as an optional dependency.

@trexfeathers
Copy link
Contributor

From @SciTools/peloton: would @hsteptoe and @acchamber appreciate any more input from core devs at this point?

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Feb 5, 2025

From @SciTools/peloton: would @hsteptoe and @acchamber appreciate any more input from core devs at this point?

I just need to find more time, which is in short supply as we get near the end of FY...

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Feb 14, 2025

This is now ready for some initial (alpha?) testing 🎉. @acchamber are you available to try out the new features? (Updates to docs etc. will follow if this is successful) Tests are incomplete are should not be review yet.

This is a fairly substantial re-write. New/improved features include:

  • Automatic CRS transforms. Will handle situation where the cube and geometry have different CRSs by transforming the geometry.
  • Handles all shape types, including Points, Lines and MultiPolygons.

There are some breaking changes, principally moving away from having a minimum_weight keyword, replacing it with an all_touched keyword. The rational here is for consistency across shape types. Minimum weight doesn't make sense for Points and Lines. Instead, Line pixilation is based on Bresenham's line algorithm. Polygon pixilation is based on either: (a) pixel centres that intersect the polygon, or (b) all pixels touched by the polygon irrespective of the area of intersection.

My informal tests so far show working behaviour for iris test data from rotated_pole.nc, toa_brightness_stereographic.nc, E1_north_america.nc, air_temp.pp. Know limitation (not-working) for semi-structured model grids, such as orca2_votemper.nc. Some test shapefiles 👉test_shapefiles.zip

@bjlittle bjlittle added this to the v3.13 milestone Feb 17, 2025
@acchamber
Copy link
Contributor

acchamber commented Feb 17, 2025

I'll see if I can find the time to play around with this alpha in the next two weeks, to give some feedback, however I am concerned by the breaking changes here.

Being unable to mask the anti-meridian or the poles feels like a major flaw - As you've pointed out fixing it requires some guess of intention by the program of the user's desires, but a user being unable to mask shapes such as - Russia , The Pacific Ocean, The Arctic circle or Antarctica is a deteriment to many science workflows, especially when some of these do work in the current function.

I am also concerned about losing minimum_weight - while you are right it doesn't make sense for Points and Lines on the face of it, reducing functionality down from a user-adjustable spectrum to a binary choice does not fit the variety of scales shapefiles can have compared to model resolution - there's enough use cases that want an option between the two extremes of "touch the centre only or touch anywhere in the box". Especially if centred-touching is the default - it makes it quite possible for a user to mask a cube and receive a fully masked cube back for smaller shapefiles and not have a obvious answer as to why. I can see mimimum_weight becoming a optional keyword but could not support removing it completely.

@hsteptoe
Copy link
Contributor Author

I'll see if I can find the time to play around with this alpha in the next two weeks, to give some feedback, however I am concerned by the breaking changes here.

Being unable to mask the anti-meridian or the poles feels like a major flaw - As you've pointed out fixing it requires some guess of intention by the program of the user's desires, but a user being unable to mask shapes such as - Russia , The Pacific Ocean, The Arctic circle or Antarctica is a deteriment to many science workflows, especially when some of these do work in the current function.

I am also concerned about losing minimum_weight - while you are right it doesn't make sense for Points and Lines on the face of it, reducing functionality down from a user-adjustable spectrum to a binary choice does not fit the variety of scales shapefiles can have compared to model resolution - there's enough use cases that want an option between the two extremes of "touch the centre only or touch anywhere in the box". Especially if centred-touching is the default - it makes it quite possible for a user to mask a cube and receive a fully masked cube back for smaller shapefiles and not have a obvious answer as to why. I can see mimimum_weight becoming a optional keyword but could not support removing it completely.

Re. meridian crossing, I'll add some more shapefile test cases for the Arctic, Antarctic and Pacific Ocean. I think that so long as these are MultiPolygons divided across the meridian it will work fine. You say 'some' of these work in the current function. Can you be more specific about which use cases do and don't work? This isn't really a limitation with the masking function, but more a limitation of the way shapefiles are represented and the connection a shape has with its CRS. There is nothing by default to specify that a shapefile that straddles the 180th meridian should connect across the meridian because it exists in a cartesian coordinate frame which by default doesn't have a cyclic boundary condition. So really it's up to the user (not Iris) to make sure that the shapefile they use (or construct) correctly specifies how the meridian crossing should be handled.

Re. minimum_weight, in principal we could look at adding this back in, but I still question how much this is actually used. The all_touched implementation is consistent with rasterising approaches in other GIS applications (and consistent with the underlying rasterio API), but I agree that making all_touched default to True is a good call and would replicate the current default behaviour. My guess is that users either go with the default (minimum_weight=0) or minimum_weight=0.5... is anyone really ever choosing minimum_weight=0.3 or minimum_weight=0.8? It would be good to understand what these use cases might be if they are.

@acchamber
Copy link
Contributor

, I'll add some more shapefile test cases for the Arctic, Antarctic and Pacific Ocean. I think that so long as these are MultiPolygons divided across the meridian it will work fine. You say 'some' of these work in the current function. Can you be more specific about which use cases do and don't work? This isn't really a limitation with the masking function, but more a limitation of the way shapefiles are represented and the connection a shape has with its CRS. There is nothing by default to specify that a shapefile that straddles the 180th meridian should connect across the meridian because it exists in a cartesian coordinate frame which by default doesn't have a cyclic boundary condition. So really it's up to the user (not Iris) to make sure that the shapefile they use (or construct) correctly specifies how the meridian crossing should be handled.

I know that Russia currently works in the current impletmentation with the "standard" Natural Earth shapefile. I have not actually checked for the Arctic/Antarctic, but I know there's active science that would need them to work, although I accept your point it's as much as about the shapefile construction as our impletmentation.

Re. minimum_weight, in principal we could look at adding this back in, but I still question how much this is actually used. The all_touched implementation is consistent with rasterising approaches in other GIS applications (and consistent with the underlying rasterio API), but I agree that making all_touched default to True is a good call and would replicate the current default behaviour. My guess is that users either go with the default (minimum_weight=0) or minimum_weight=0.5... is anyone really ever choosing minimum_weight=0.3 or minimum_weight=0.8? It would be good to understand what these use cases might be if they are.

I know I certainly am - it matters mostly when your grid resolution and your shapefile are similar magnitiudes - a good example being country-level masking on 1 degree or courser grids. You often get situations where a country is centred between grid boxes - being present in 20% of one, 30% of about, and 2% of a third as an example. Counting all the boxes equally is bad, but the shape doesn't nessacerily reach the centre of the boxes it is significantly in. Manual experimentation is required to find a suitable weight, doesn't mean it's not useful. You can also think of land-sea maskes and river watersheds for long-but-thin shapes. There's lots of different shapes you may use and data to apply them to, and assuming that all cases are covered by "all-touched" and centred-touching is reductive.

@hsteptoe
Copy link
Contributor Author

I know I certainly am - it matters mostly when your grid resolution and your shapefile are similar magnitiudes - a good example being country-level masking on 1 degree or courser grids. You often get situations where a country is centred between grid boxes - being present in 20% of one, 30% of about, and 2% of a third as an example. Counting all the boxes equally is bad, but the shape doesn't nessacerily reach the centre of the boxes it is significantly in. Manual experimentation is required to find a suitable weight, doesn't mean it's not useful. You can also think of land-sea maskes and river watersheds for long-but-thin shapes. There's lots of different shapes you may use and data to apply them to, and assuming that all cases are covered by "all-touched" and centred-touching is reductive.

It's not reductive if a users' choice of % isn't justifiable in the first place. Manual experimentation to find what 'looks right' isn't a robust or justifiable process when, by definition, going from vector -> raster involves a loss of fidelity. This requires a clear rule-based approach to this transform. A %-based approach can lead to voids in a raster. For a complex set of vectors that roughly sub-divide a grid cell equally (say, 30%-35%-35%), if a %-based threshold is poorly specified (say >= 40%) then the grid cell will fail to be selected and assigned to any of those rasterised vectors. A clear rule system (eg. centre-overlapping) will avoid this. This precedent exists across many rasterization implementations including things like OpenGL or ArcGIS. A %-based approach does not. I can see an argument for a rule based on >= 50% overlap, but only as a fixed choice rather than something to be tweaked by the user.

Also consider the Arakawa grid system used by numerical models... there is an understanding that a continuous physical property discretised onto a grid has to be assigned to a fixed point (on the centre or edge), even though we interpret this quantity as an integral over the grid box, ie. an area average. But again, this is based on a clear rule system: centres or edges.

Your concern with rasterising vectors onto coarse grid is simply an aliasing issue. A simple approach to anti-aliasing could be to weight the underlying raster by the proportion of the vector that overlaps it, but the weight needs to be carried through and applied to the raster at the point of masking, which doesn't happen in the current implementation, which is reductive in as much as it is limited to a True or False bool. An alternative approach that would make use of capability that already exists in Iris would be to regrid your underlying raster using a 1st-order conservative approach to a higher resolution that gives you the detail you need and then mask it using your vector (which I think is equivalent to the weighted anti-aliasing approach).

@trexfeathers
Copy link
Contributor

@hsteptoe was kind enough to bring this to a 'surgery' day at UK Met Office. Summary of my position, as the developer responsible for Iris:

  • We love this detailed discussion in a publicly visible medium ❤
  • We love the ownership and communication regarding upcoming releases, again all in public ❤
    🦋 Iris 3.13.x #6307 (comment)
  • minimum_weight:
    • Can't be removed right now.
      We know users are already relying on this. Removing such a feature would only be valid in a major release, or behind a FUTURE flag.
    • Sceptical about removing in the future, too.
      mask_cube_from_shapefile() aimed to provide the same functionality as the Met Office internal ASCEND package, allowing users to switch their scripts over. minimum_weight is part of this.
    • I'm OK with 'sidelining' minimum_weight, freezing the functionality as it is now.
      It sounds like calling down to Rasterio, together with its alternative approach to intersections (any/centre/all covered) offers improvements that are incompatible with the minimum_weight approach. I can therefore picture a check: if minimum_weight is passed then call the 'legacy' code, otherwise call the proposed new code. This could also include new parameters for mask_cube_from_shapefile() to better support the new functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: 🆕 Candidate
Development

Successfully merging this pull request may close these issues.

_transform_coord_system geometry is always assume to be None
6 participants