Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated OSGeo recipes #708

Merged
merged 29 commits into from
Nov 3, 2023
Merged

Updated OSGeo recipes #708

merged 29 commits into from
Nov 3, 2023

Conversation

cboettig
Copy link
Member

@cboettig cboettig commented Oct 4, 2023

This provides a more complete source-build of OSGeo with wider support for optional drivers. Notably, this addresses functionality that was already present in the Ubuntu LTS repo versions but missing from the current osgeo source-build recipe, such as virtual filesystem access for NetCDF.

A few other changes are made here as well:

  • this now installs only the packages that actually bind OSGeo libraries from source, rather than installing all the packages listed in install_geospatial.sh. This also does not disable binary installs by default, meaning that users can benefit from quick binary installs for most other (spatial or non-spatial) R packages. The downside risk is that installing/upgrading a binary version of sf, terra, gdalcubes, or stars will create a broken installation.

Still some open points for discussion:

  • some users suggested that having the recipe in scripts/experimental rather than scripts was discouraging for regular use. We have never outlined a policy for promoting a script from 'experimental', and changing those paths could potentially break downstream functionality.

  • We might want to tweak the recipe of the corresponding rocker/geospatial:dev-osgeo Dockerfile to include install_python.sh step first? (most of those python libraries are in fact included here).

  • sf & terra do not install when PROJ is available both from a source build (i.e. /usr/local/lib) and from the deb repos (i.e. /usr/lib). Having different versions of a library like this is not a problem for the OSGeo libraries or for most software afaik (@eddelbuettel am I right this isn't usually an issue since config should link to only one version?), but apparently a well-documented problem for sf and terra. This can create some unexpected problems for users -- if they install any deb that happens to depend on libproj-dev, suddenly these R packages won't build from source (they will simply through a segfault on the configure step, leaving users no idea for the reason). Maybe we just live with this, but I thought it was worth flagging. (This script will first try and remove those libs to avoid conflicts).

  • Perhaps this entire script should be done as a 'staged build', where the resulting binaries of both the osgeo libs, cli tools, and binary versions of the R packages are copied over, but all the dev libs needed to compile are jettisoned (as the official osgeo/gdal Dockerfile does). That would create a much smaller Docker image, but adds complexity (i.e. users couldn't upgrade sf/terra etc even with a source-install, they'd have to run this whole script again first!), and complexity in our build setup.

  • Users can run this script to add latest osgeo builds to any image in the stack (i.e. run /rocker_scripts/experimental/install_dev-osgeo.sh on any rocker-versioned image). This should install the requested version, defaulting to the latest release. Perhaps this script should be called install_osgeo-latest.sh instead?

  • Wondering if it would make more sense to create a renv "profile" and install these R packages binding OSGeo libraries under that profile? This might make it possible to have binary and non-binary builds on the same image (could even be a way to have latest OSGeo versions available on the standard rocker/geospatial image). But I think that approach needs some more testing first.

cc @mdsumner, @yuvipanda, @Robinlovelace, others -- thoughts?

@yuvipanda
Copy link
Contributor

/cc @ranchodeluxe as well, who was working with R folks :)

@eddelbuettel
Copy link
Member

@eddelbuettel am I right this isn't usually an issue since config should link to only one version

It depends on the library at hand but most of them (and for some time now) encode the major version. So then the distros generalize and let you have one version as a -dev to build against but multiple to run.

Random example from my 23.04 box:

$ ldconfig -p| grep libicuu
        libicuuc.so.72 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicuuc.so.72
        libicuuc.so.72 (libc6) => /lib/i386-linux-gnu/libicuuc.so.72
        libicuuc.so.71 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicuuc.so.71
        libicuuc.so.57 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicuuc.so.57
        libicuuc.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libicuuc.so
$ 

@Robinlovelace
Copy link
Contributor

Looks great to me from a glance, thanks!

@cboettig
Copy link
Member Author

@eitsupi apologies again for my issues with the bash linter -- any pointers on that?

@mdsumner
Copy link
Contributor

thanks a lot @cboettig this is awesome, sadly I can't pursue trying it atm but will be back into it in a few weeks

@eitsupi
Copy link
Member

eitsupi commented Oct 11, 2023

@eitsupi apologies again for my issues with the bash linter -- any pointers on that?

I think I have fixed all the points pointed out by shellcheck.

@mdsumner
Copy link
Contributor

I've built and used this and found no problems. I'd like to see it come out of "experimental" I always found that unsettling, to me the point of docker is to be able to easily get recent builds without affecting anyone else so I don't see any danger.

I like seeing more of the python supported (because a personal goal for me is to have these deps on an equal footing, without proliferating versions on a system), but to me the system deps are more important than any of the python or R installs, I'm just more comfy with the latter so I'm not worried about R packages, but I think getting all python packages going is probably out of scope.

The biggest issue with R and Python on the same system is having reticulate support, which requires building Python from source now (and I think that's clearly out of scope of this recipe as well).

@eitsupi
Copy link
Member

eitsupi commented Oct 30, 2023

@mdsumner Thanks for testing this!

I like seeing more of the python supported

What does mean "more of the python supported"?

Unfortunately, I think it would be unrealistic to do that with this repository as Python support is very difficult to achieve.
See also #670

which requires building Python from source now

Is this true?
My understanding is that reticulate works even if binary Python is installed later by mamba or something.

@Robinlovelace
Copy link
Contributor

reticulate::install_python() worked for me previously. Getting a box with R + Python + geo in seems really hard agreed!

@Robinlovelace
Copy link
Contributor

But shouldn't be...

@mdsumner
Copy link
Contributor

mdsumner commented Oct 30, 2023

What does mean "more of the python supported"?

I meant I can now get osgeo.gdal going after GDAL is built from source. I'm very much not clear on how python package installs are supposed to work, but that very small basis of all the R packages that use it and the core Python package is extremely important to me. I used sys path to add the installed osgeo.gdal/osr/ogr family to work in this image:

https://gist.github.com/mdsumner/5fb88b0eb437311eb7d8e1159d528f39?permalink_comment_id=4740514#gistcomment-4740514

GDAL puts its python into /usr/local/lib/python3/dist-packages and I can work with that perfectly well, I don't want my inexpert musings to pause this PR going into main, because I think it's actually a not well defined problem with uncertainty in many corners about how one should work with Python packages. Using one system Python to me is perfect in a docker context.

which requires building Python from source now
Is this true?
My understanding is that reticulate works even if binary Python is installed later by mamba or something.

Yes, I'm sorry I can't find the NEWS item but it's since 3.9 or 3.10 (see note) --enable-shared was dropped from default build settings. Posit advise on compilation here which is the best authority I can find on what to do, but it's obviously non-ideal for many reasons.

https://docs.posit.co/resources/install-python-source/

note: (at any rate I have a system with 3.8 and that's my VM for reticulate atm)

Again, reticulate is less important to me in this context than the fact that this image already has common support with a latest version of the libs GDAL, GEOS, PROJ. That alone makes this PR extremely valuable. I think those broader topics need flushing out, but probably not here. Thanks!

@cboettig
Copy link
Member Author

Thanks all, sounds like this is in good shape. To summarize:

  • the core of this update provides more complete gdal/geos/proj on the already existing dev-osgeo branch, without anticipating any real breaking changes there, so this is if nothing else a step forward.

  • @eitsupi open question about moving this script out of /rocker_scripts/experimental. Thoughts on that? I think that would mean updating the corresponding build recipe for rocker/geospatial:dev-osgeo. It could also potentially be a breaking change -- maybe we could symlink the to the old /rocker_scripts/experimental to avoid a break? Let me know how you want to proceed there (and what build recipe we need to update if we move it?)

  • there are some remaining python questions which probably aren't to be addressed here. the osgeo.gdal python package can install and work fine on this branch, as can gdal core functions that depend on python. reticulate's default discovery routines may not discover python here, and have a habit of searching for packages primarily in ~/ level directories, which make it difficult to do shared python libraries on python. some of these are hard to determine in reticulate due conflicting and stale code that is still in their codebase, see Conda discovery heuristics and environmental variables rstudio/reticulate#1490, but those should be resolved there. In general, we should be able to do shared recticulate libraries in /opt, though it looks like it will not discover them in /usr/local/lib/python3/dist-packages. (It can be told where to look manually of course, but I don't want to hardwire a solution). Anyway, these should be worked out either upstream in in a separate PR which can update install_python.sh.

@eitsupi
Copy link
Member

eitsupi commented Oct 31, 2023

@eitsupi open question about moving this script out of /rocker_scripts/experimental. Thoughts on that? I think that would mean updating the corresponding build recipe for rocker/geospatial:dev-osgeo. It could also potentially be a breaking change -- maybe we could symlink the to the old /rocker_scripts/experimental to avoid a break? Let me know how you want to proceed there (and what build recipe we need to update if we move it?)

I do not have a strong opinion on moving directories. (I believe there are no strict divisions)
The testing of these scripts on CI is distinct from other scripts, but maybe just changing the workflow name is enough.

name: test for experimental scripts

@yuvipanda
Copy link
Contributor

Unfortunately, I think it would be unrealistic to do that with this repository as Python support is very difficult to achieve.

One of my personal goals is to get more involved in this project to help with better python support :)

@eitsupi
Copy link
Member

eitsupi commented Nov 3, 2023

Ok, let's merge this for now.
I would be happy to open another PR for further improvements.

@eitsupi eitsupi merged commit c16cf1b into master Nov 3, 2023
8 checks passed
@eitsupi eitsupi deleted the patch/osgeo branch November 3, 2023 02:30
@Robinlovelace
Copy link
Contributor

Great stuff! Note to self: use these updated recipes as basis for geocompx docker images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants