-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reworking xarray interop #4
Conversation
Issue was raised on datacube repo that `.to_epsg` method on pyproj CRS without epsg code can be slow. The solution is to use `.to_wkt` instead of `.to_epsg` as a primary key for looking up CRS state from the cache. But there were other issues, with previous approach, like throwing away an already constructed pyproj.CRS when it is given on the input. _make_crs was refactored to accept either string or pyrpoj.CRS object, with custom key being `.to_wkt()` when given pyproj.CRS. Also adding key normalization for `epsg:<int>` style CRS strings.
Also convert GeoBox.from_geopolygon to staticmethod
This adds `.odc.` accessor to `xarray.DataArray`, from which crs,transform,geobox can be accessed.
Codecov Report
@@ Coverage Diff @@
## develop #4 +/- ##
===========================================
+ Coverage 89.72% 97.00% +7.27%
===========================================
Files 12 12
Lines 1480 1568 +88
===========================================
+ Hits 1328 1521 +193
+ Misses 152 47 -105
Continue to review full report at Codecov.
|
return crs_spec.to_wkt() | ||
|
||
|
||
@cachetools.cached(_crs_cache, key=_make_crs_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a lock
in case of multi-threaded access? Globals w/o a lock feel weird to me but mabye that's just me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how cachetools deals with concurrency actually. Dictionaries are "thread-safe" in Python, and so are imports, so wrapping cache around a function should be thread-safe as it happens at import time, and then I assume cachetools does reasonable thing afterwards, which at worst will create 2 copies of the same CRS, but will only cache one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an option to supply lock
to @cached
decorator, also should probably replace plain Dict
with some cache class instead.
more details here: https://cachetools.readthedocs.io/en/stable/#cachetools.cached
given that this is not part of the API, I think we can delay this change until later. Also need to check pyproj
constraints for multi-threaded access.
Assumption is that there is one common projection across all bands
.geobox
encoding
instead ofattributes
forgrid_mapping
(CRS: CF grid_mapping attribute to encoding datacube-core#1084)TODO:
add
.odc.
extension to Dataset as well.