-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MDArray API #433
base: master
Are you sure you want to change the base?
Conversation
I have implemented a sketch of the I'm looking for feedback regarding the types and functions I'm defining. Am I on the right track? Am I missing some functionality or some convenient helper functions? |
src/mdarray/mdarray.jl
Outdated
abstract type AbstractGroup end | ||
# needs to have a `ptr::GDAL.GDALGroupH` attribute | ||
|
||
abstract type AbstractMDArray end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want this to be <: AbstractDiskArray so chunked reads work automatically.
It looks great, thank you for working on this! I have a soft preference for having all the type definitions in https://github.com/yeesian/ArchGDAL.jl/blob/master/src/types.jl but am open to the way you have it right now if you feel strongly about it.
It is mostly to have information relevant to https://yeesian.com/ArchGDAL.jl/stable/memory/#Interactive-versus-Scoped-Objects at the type level. |
I am looking for advice how the difference in array index ordering (row- vs. column-major) and array index base(0 vs. 1) should be handled. Is there a precedent in ArchGDAL? Specifically, the function |
Is it okay to return an |
The closest precedent I can think of is ArchGDAL.jl/src/raster/rasterio.jl Lines 124 to 169 in 4b40fe8
i.e. (i) it'll be 0-based in index offsets (when receiving integers as index arguments) but to handle julian types like
I think it is okay to return a Group that has ptr==C_NULL as indicator of an error. The precedent that I can think of is e.g. in OGR geometries ArchGDAL.jl/src/ogr/geometry.jl Lines 512 to 529 in 4b40fe8
But I'm also open to being convinced otherwise (e.g. in #179 (comment) for #192) |
This concludes the first step of implementing support for multidimensional arrays. Next I would like to receive feedback on design choices (if anyone is interested enough to care) and clean up things. This means
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a huge labor of love, thank you for this and improving the style in some places while you're at it.
I would like to receive feedback on design choices
I think the move towards tracking children in the dataset to force close datasets needs closer discussion and review, but the rest of it LGTM
|(x::$T, y::UInt8)::UInt8 = UInt8(x) | y | ||
|(x::UInt8, y::$T)::UInt8 = x | UInt8(y) | ||
|(x::$T, y::$T)::UInt8 = UInt8(x) | UInt8(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why were these lines removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines should functionally still be there with two changes:
- Now also support
&
, not just|
- Now using
UInt32
instead ofUInt8
because some open flags do not fit intoUInt8
#TODO import ArchGDAL | ||
import ArchGDAL as AG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be reverted? If not, I think the reference to ArchGDAL
(in Aqua.test_all(
) below might need to be updated to AG
too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I need to check what this breaks.
The problem is that ArchGDAL defines some functions (e.g. isempty
, isvalid
, empty!
) that also exist in Base
, but they do not extend the ones in Base. As long as one uses the AG.
prefix all is fine. But after import ArchGDAL
, all the Base functions need to be accessed as Base.isempty
etc.
Since most packages using ArchGDAL presumably use the import ArchGDAL as AG
method, I wanted to experiment whether this works with test cases as well, or whether I have to add Base.
prefixes in my new test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as one uses the AG. prefix all is fine. But after import ArchGDAL, all the Base functions need to be accessed as Base.isempty etc.
Ohh thanks for the context!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree with this change I will add AG
prefixes to all tests.
return chunksize | ||
end | ||
|
||
# processperchunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, is this a TODO to track for https://gdal.org/doxygen/classGDALAbstractMDArray.html#a91005189ff493f595ddd4ba446cc216a ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I added such comments for all functions that I didn't wrap. This function is difficult to wrap because it expects a function pointer, but it's not impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is difficult to wrap because it expects a function pointer, but it's not impossible.
Yeah please feel free to leave those in comments like you did
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is. I don't know (yet) how to handle callbacks so I didn't wrap this function.
function destroy(dataset::AbstractDataset)::Nothing | ||
GDAL.gdalclose(dataset) | ||
# TODO: Wrap `GDAL.CPLErr` | ||
function close(dataset::AbstractDataset)::GDAL.CPLErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of OGR, a dataset might have features (from e.g.
ArchGDAL.jl/src/ogr/featurelayer.jl
Lines 29 to 41 in 4b40fe8
function createlayer(; | |
name::AbstractString = "", | |
dataset::AbstractDataset = create(getdriver("Memory")), | |
geom::OGRwkbGeometryType = wkbUnknown, | |
spatialref::AbstractSpatialRef = SpatialRef(), | |
options = StringList(C_NULL), | |
)::IFeatureLayer | |
return IFeatureLayer( | |
GDAL.gdaldatasetcreatelayer(dataset, name, spatialref, geom, options), | |
ownedby = dataset, | |
spatialref = spatialref, | |
) | |
end |
.ownedby
(so that Julia's GC wouldn't finalize the dataset while there are references to it) and have an implicit convention that users shouldn't be calling destroy
themselves (and either rely on Julia's GC, or the do-block approach for managing context).
That said, if you feel strongly about it, I'm sympathetic to the argument for being able to "force close" a dataset that resulted in the .children
attribute being introduced. To avoid https://gdal.org/api/python_gotchas.html#python-crashes-or-throws-an-exception-if-you-use-an-object-after-deleting-a-related-object, should we introduce a new type for datasets that tracks their children, and restrict this function to only accept those datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is this OSGeo/gdal#10490 . When writing a dataset, there is no reliable way to ensure it has been written since one has to wait until all GDAL objects pointing into the dataset have been garbage collected and finalized.
One way out would be not provide an interactive API, and to handle all lifetimes manually or via context managers. That's quite inconvenient in many cases.
Thus I decided to implement a close
function that (by default) hard-closes a dataset when the dataset is writable. For this, the dataset needs to hold references that need to be released when the dataset is force-closed.
This has nothing to do with lifetime management. You can open datasets with a hard_close=false
option, and no such tracking happens. However, it's then unclear when a dataset will actually be written to disk.
I now think that the dataset should hold weak references (not strong ones) to the objects that should be released. I'll update the code.
If you have a different suggestion for reliably closing or flushing a dataset I'd be happy to change things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way out would be not provide an interactive API, and to handle all lifetimes manually or via context managers. That's quite inconvenient in many cases.
Yeah, it is inconvenient. I'd still recommend it for this PR though -- to keep the PR scoped to the introduction of the MDArray API and to introduce the hard close option in a separate PR.
For this, the dataset needs to hold references that need to be released when the dataset is force-closed.
Even if that were implemented correctly, it still doesn't address the resulting complexity of making all GDAL objects suspect of potentially corresponding to resources that might have already been released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have to think about how to proceed without creating very unexpected behaviour.
There is already a function isnull
that checks whether a reference is valid. References become invalid when they are released.
@yeesian I'm getting back to this PR after being sidetracked for a while. Would you be open to a Zoom call to discuss how to proceed? |
Yeah sounds good, thanks for picking this up again! I've emailed you (based on what I could find at your personal website) to arrange a date/time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have chatted -- in the spirit of keeping things moving (but no time pressure):
- keep the scope to
mdarray
(including the new function for closing a dataset and its children) - not all functionality needs to be implemented
- I'm okay with introducing weak references to children in the
[I]Dataset
type (but keeping their usage scoped to themdarray
setting for now).
It's not been discussed, but I think it'll really help to have documentation for the new types and functions. I recognize that with a handwritten approach it'll be too much for this PR, so it's not a requirement or expectation (and test coverage is more important for this PR).
Feel free to point out anything I might have missed, and thank you again!
finalizer(destroy, dataset) | ||
return dataset | ||
end | ||
end | ||
|
||
function add_child!(dataset::WeakRef, obj::Any)::Nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend renaming it to _add_mdarray_child!
moving this function to src/mdarray/types.jl
since that's where its used (and we haven't verified that it's something we'll want to be supporting for the rest of GDAL's data types)
################################################################################ | ||
|
||
function destroy(datatype::AbstractExtendedDataType)::Nothing | ||
datatype.ptr == C_NULL && return nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add this check to every other implementation of destroy(...)
too? E.g.
ArchGDAL.jl/src/ogr/geometry.jl
Lines 77 to 96 in 4b40fe8
""" | |
Destroy geometry object. | |
Equivalent to invoking delete on a geometry, but it guaranteed to take place | |
within the context of the GDAL/OGR heap. | |
""" | |
function destroy(geom::AbstractGeometry)::Nothing | |
GDAL.ogr_g_destroygeometry(geom) | |
geom.ptr = C_NULL | |
return nothing | |
end | |
""" | |
Destroy prepared geometry object. | |
Equivalent to invoking delete on a prepared geometry, but it guaranteed to take place | |
within the context of the GDAL/OGR heap. | |
""" | |
function destroy(geom::AbstractPreparedGeometry)::Nothing | |
GDAL.ogrdestroypreparedgeometry(geom) |
To my knowledge, the list of such functions would be in
Lines 185 to 270 in 4b40fe8
:boundary, | |
:buffer, | |
:centroid, | |
:clone, | |
:convexhull, | |
:create, | |
:createcolortable, | |
:createcoordtrans, | |
:copy, | |
:createfeaturedefn, | |
:createfielddefn, | |
:creategeom, | |
:creategeomcollection, | |
:creategeomfieldcollection, | |
:creategeomdefn, | |
:createlayer, | |
:createlinearring, | |
:createlinestring, | |
:createmultilinestring, | |
:createmultipoint, | |
:createmultipolygon, | |
:createmultipolygon_noholes, | |
:createpoint, | |
:createpolygon, | |
:createRAT, | |
:createstylemanager, | |
:createstyletable, | |
:createstyletool, | |
:curvegeom, | |
:delaunaytriangulation, | |
:difference, | |
:forceto, | |
:fromGML, | |
:fromJSON, | |
:fromWKB, | |
:fromWKT, | |
:gdalbuildvrt, | |
:gdaldem, | |
:gdalgrid, | |
:gdalnearblack, | |
:gdalrasterize, | |
:gdaltranslate, | |
:gdalvectortranslate, | |
:gdalwarp, | |
:getband, | |
:getcolortable, | |
:getfeature, | |
:getgeom, | |
:getlayer, | |
:getmaskband, | |
:getoverview, | |
:getpart, | |
:getspatialref, | |
:importCRS, | |
:intersection, | |
:importEPSG, | |
:importEPSGA, | |
:importESRI, | |
:importPROJ4, | |
:importWKT, | |
:importXML, | |
:importUserInput, | |
:importURL, | |
:lineargeom, | |
:newspatialref, | |
:nextfeature, | |
:pointalongline, | |
:pointonsurface, | |
:polygonfromedges, | |
:polygonize, | |
:read, | |
:sampleoverview, | |
:simplify, | |
:simplifypreservetopology, | |
:symdifference, | |
:union, | |
:update, | |
:readraster, | |
) | |
eval(quote | |
function $(gdalfunc)(f::Function, args...; kwargs...) | |
obj = $(Symbol("unsafe_$gdalfunc"))(args...; kwargs...) | |
return try | |
f(obj) | |
finally | |
destroy(obj) |
function destroy(dataset::AbstractDataset)::Nothing | ||
close(dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I would prefer to not change the behavior of destroy()
(since it's out for quite a while by now), and instead for it to be a new function being introduced e.g. force_close_mdarray_dataset!(...)
(inside the src/mdarray/*
subdirectory), and to have it document that it will close all of its children (possibly recursively too)?
We can keep it scoped to mdarray
is until we have verifiable claims that it's working robustly for the other types of GDAL datasets (FeatureCollections, etc). Once there is a verifiably general implementation that's working and tested for all GDAL dataset types, we can give it a more general name.
See #432.