Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shapefile masking #5470

Merged
merged 58 commits into from
Feb 13, 2024
Merged

Shapefile masking #5470

merged 58 commits into from
Feb 13, 2024

Conversation

acchamber
Copy link
Contributor

@acchamber acchamber commented Sep 5, 2023

🚀 Pull Request

Description

Adds the apply_shapefile function to iris.util that allows a user to mask a cube with a shapefile.

Thing to work on

  • Most of the code is in a seperate _shapefiles.py file - should this just be brought into utils?
  • There's a few auxillary functions there - make public in utils themselves?
  • The tests - four basic ones using some iris-test-data but not fully testing all private functions. Needs more coverage?
  • Need to figure out how to document this - a numpy docstring is in progress but also needs something added to the docs somewhere.
  • Current implementation scales quadratically - rewrite to scale linearly or better with cube size.

Consult Iris pull request check list

@trexfeathers
Copy link
Contributor

trexfeathers commented Oct 2, 2023

  • Most of the code is in a seperate _shapefiles.py file - should this just be brought into utils?
  • There's a few auxillary functions there - make public in utils themselves?

I think what you have is good. We tend to take the lead from other Scientific Python packages, and the preference there is to not have too many convenience functions when they can be achieved with a few user lines (accepting that some places in Iris have broken this rule in the past!)

  • The tests - four basic ones using some iris-test-data but not fully testing all private functions. Needs more coverage?
  • The current 'known good output' tests are best suited to the tests/integration/ directory.
  • More unit tests:
    • All warnings or exceptions in both create_shapefile_mask() and apply_shapefile()
    • Some artificial small data - where a developer could understand which cells they would expect to be masked
  • Try to make sure that every line in create_shapefile_mask() and apply_shapefile() is exercised by at least 1 test
  • Need to figure out how to document this - a numpy docstring is in progress but also needs something added to the docs somewhere.

A Masking heading within Subsetting a Cube seems appropriate. For this work we will need to start with a placeholder where we can add further detail on masking in general IN FUTURE. Then you can add the shape-specific details as a sub-heading.

  • Current implementation scales quadratically - rewrite to scale linearly or better with cube size.

This is probably 'good enough' as is - especially looking at cost vs benefit. Might be worth looking at Dask's map_blocks() for some simple parallelisation of the operation. (see #5470 (comment))

@trexfeathers trexfeathers self-assigned this Nov 2, 2023
Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for taking the plunge on this @acchamber 💐. I have lots of reading material for you below!

lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/_shapefiles.py Outdated Show resolved Hide resolved
lib/iris/_shapefiles.py Outdated Show resolved Hide resolved
lib/iris/_shapefiles.py Outdated Show resolved Hide resolved
lib/iris/_shapefiles.py Outdated Show resolved Hide resolved
lib/iris/_shapefiles.py Outdated Show resolved Hide resolved
@trexfeathers
Copy link
Contributor

This is being planned as part of the 🐙Iris v3.8.0 project. The next block of time for this work ends on 2023-11-24. There is another time block in early 2024.

I'm buddying on quite a few outstanding project items that are due by 2023-11-24 (this search was representative at time of writing), so balancing reviews could get quite challenging in the last week. If we haven't started ticking off some of the above comments by COP 2023-11-20 then it is unlikely this will be merged by 2023-11-24 and we will need to consider options:

  • Checking @acchamber's availability after 2023-11-24
  • Finding alternative SciTools developers to finish the review
  • Finding additional developers to address some of the review comments
  • Finding SciTools developers to take over after 2023-11-24 if @acchamber has other priorities

@acchamber
Copy link
Contributor Author

Member

Thanks for the update on the timeframe. I've made a start on some comments, and have some focus time today to work on others. How do you want the ones that have been addressed marked - I remember you prefer for the reviewer to mark resolved, should I just comment on each indervidually?

After this week none of my other work has urgent timeframes so can push on this.

@trexfeathers
Copy link
Contributor

I remember you prefer for the reviewer to mark resolved

Yep, thanks 👍

should I just comment on each indervidually?

If you've just addressed something via a commit, there is no need to say anything, I'm happy to check through. But if it helps your own method to comment on each then feel free. And if there is any discussion to be had then definitely comment within the relevant thread.

If you want to post multiple comments at once: you can do this by reviewing your own pull request, which leaves each of your comments as Pending until you post the review.

@hsteptoe
Copy link
Contributor

@acchamber @trexfeathers ⌚ing this with interest, and might have some availability to help maintain momentum on this

@trexfeathers
Copy link
Contributor

Oh! And please try to avoid squashing your current commits, so that I can tell what is new since my last review. I've kept a record of what I've reviewed so far. I can cope with rebasing because they will still have the same messages (despite different hashes).

Copy link

codecov bot commented Feb 7, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (a47fd34) 89.71% compared to head (b924795) 89.74%.
Report is 10 commits behind head on main.

❗ Current head b924795 differs from pull request most recent head bf3c720. Consider uploading reports for the commit bf3c720 to get more accurate results

Files Patch % Lines
lib/iris/_shapefiles.py 98.79% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5470      +/-   ##
==========================================
+ Coverage   89.71%   89.74%   +0.03%     
==========================================
  Files          90       91       +1     
  Lines       22815    22904      +89     
  Branches     5438     5457      +19     
==========================================
+ Hits        20468    20556      +88     
  Misses       1617     1617              
- Partials      730      731       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lib/iris/util.py Outdated Show resolved Hide resolved
Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@acchamber
Copy link
Contributor Author

acchamber commented Feb 7, 2024

From the coverage report, I can see that we need unit tests to exercise the following blocks:

I left the "geometry_system" keyword in in case it's useful in the future - there's cases you'd use it, there's just not a good way of automatically detecting the projection of a shape because shapefile metadata isn't standardized enough and shapely won't read it if it does. Can rejig the code to remove it but didn't see any harm in keeping it for anyone digging deep enough. Other tests have been added

@trexfeathers
Copy link
Contributor

OK, I reckon it's just this left:

Need to figure out how to document this - a numpy docstring is in progress but also needs something added to the docs somewhere.

A Masking heading within Subsetting a Cube seems appropriate. For this work we will need to start with a placeholder where we can add further detail on masking in general IN FUTURE. Then you can add the shape-specific details as a sub-heading.

@acchamber
Copy link
Contributor Author

First attempt at user guide section in place - I've not put in a second example or use of the minimum_weight keyword to keep it simple, but it's a MVP now I think.

@acchamber acchamber marked this pull request as ready for review February 13, 2024 16:22
@trexfeathers
Copy link
Contributor

Awesome work @acchamber, well done and thanks for sticking with it 🥇

@trexfeathers trexfeathers merged commit 1b11d74 into SciTools:main Feb 13, 2024
16 checks passed
tkknight added a commit to tkknight/iris that referenced this pull request Feb 20, 2024
…umpydoc-pre-commit

* 'numpydoc-pre-commit' of github.com:tkknight/iris:
  Update CF standard names to v84. (SciTools#5761)
  Regrid docs fix (SciTools#5758)
  Improve ncdata and CF information on "iris heart xarray" page (SciTools#5752)
  Pin ASV - airspeed-velocity/asv#1385. (SciTools#5756)
  Normalise units of coordinate bounds (SciTools#5746)
  Add "Which Regridder?" Documentation (SciTools#5742)
  Document `Coord.ignore_axis` (SciTools#5744)
  Disable navidation with keys for docs HTML theme options (SciTools#5747)
  Shapefile masking (SciTools#5470)
  [pre-commit.ci] pre-commit autoupdate (SciTools#5739)
  DOCS: Add whatsnew for ruff pydocstyle compliance (SciTools#5700)
  update docstring (SciTools#5737)
  Updated environment lockfiles (SciTools#5738)
  Consider NaNs equal when comparing cubes (SciTools#5713)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

8 participants