Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t-route segfaults when loaded after netcdf-c<=4.9.1 shared library #705

Open
aaraney opened this issue Nov 28, 2023 · 2 comments
Open

t-route segfaults when loaded after netcdf-c<=4.9.1 shared library #705

aaraney opened this issue Nov 28, 2023 · 2 comments

Comments

@aaraney
Copy link
Member

aaraney commented Nov 28, 2023

TL;DR

T-Route's netCDF4(python library) dependency can segfault when another netcdf-c<=4.9.1 shared library has already been loaded. You are most likely to run into this issue when running NextGen with routing enabled. This issue is present in netCDF4 versions 1.6.5(latest) and 1.6.4. Versions <=1.6.3 are not affected.

To solve this issue either:

  • build a netCDF4 wheel from scratch locally: pip uninstall netcdf4; pip install --no-cache-dir --no-binary :all: netcdf4
  • install netCDF4<=1.6.3. pip install netCDF4==1.6.3.

Current behavior

Note: this has only been confirmed on linux.

If you compile and run NextGen with routing enabled with netcdf-c<=4.9.1 and netCDF4==1.6.4 or netCDF4==1.6.5 (python dep), when NextGen starts routing a segmentation fault will occur. This appears to have been resolved in the latest un-released version of netCDF4.

This occurs because of shared library function loading precedence. netCDF4 1.6.4 and 1.6.5 ship with a pre-compiled version of netcdf-c>4.9.1 as a shared library. netCDF4 calls a shared library function, nc_rc_set, from its included shared library. nc_rc_set calls another library function, NC_rcfile_insert that exists in both the netCDF4 included shared library and whatever netcdf-c shared library you have installed and loaded. When this call is made, it is possible that the NC_rcfile_insert function call, calls the function from your netcdf-c shared library and not the netCDF4 shared library. To make things worse, different versions of netcdf-c have different function signatures for NC_rcfile_insert. This means that nc_rc_set's NC_rcfile_insert call could (in my case it did) have the wrong number of parameters or the order is incorrect. This leads to a segmentation fault when NC_rcfile_insert tries to call the strdup function on one of the function input arguments (that are likely incorrect).

To solve this issue either:

  • build a netCDF4 wheel from scratch locally: pip uninstall netcdf4; pip install --no-cache-dir --no-binary :all: netcdf4 this is preferred, but takes longer.
  • install netCDF4<=1.6.3. pip install netCDF4==1.6.3.

edit:

small update. if you are a mac user and you use brew to manage your packages, this likely will not affect you. brew's netcdf formula ships with 4.9.2.

@aaraney
Copy link
Member Author

aaraney commented Nov 29, 2023

With a few small tweaks you can reproduce this issue locally:

# to run this script, first:
# linux: `export LD_LIBRARY_PATH=/usr/local/lib/python3.9/site-packages/netCDF4.libs`
#   mac: `export DYLD_LIBRARY_PATH=/usr/local/lib/python3.9/site-packages/netCDF4.libs`
# to find your site-package directory run
# `python -c 'import site; print(site.getsitepackages())'

import ctypes
import certifi
import os


def strencode(pystr,encoding=None):
    # encode a string into bytes.  If already bytes, do nothing.
    # uses 'utf-8' for default encoding.
    if encoding is None:
        encoding = 'utf-8'
    return pystr.encode(encoding)

# you likely need to change this
# use `nc-config --libs` to get the path to your `libnetcdf` shared library
og_nc = ctypes.CDLL("/usr/lib/aarch64-linux-gnu/libnetcdf.so", mode=ctypes.RTLD_GLOBAL)

# this likely will also need to change
# run `ls <the-path-to-netCDF4.libs>` and look for `libnetcdf-<xxx>.so.<xx>`
nc = ctypes.CDLL("/usr/local/lib/python3.9/site-packages/netCDF4.libs/libnetcdf-15d50133.so.19", mode=ctypes.RTLD_GLOBAL)

# segmentation fault
nc.nc_rc_set("HTTP.SSL.CAINFO", strencode(certifi.where()))

@aaraney
Copy link
Member Author

aaraney commented Jan 7, 2025

Note, the above solutions did not work for me with a parallel enabled HDF installation. Moreover, I was not able to build netCDF4 from source in a meaningful way (build succeeded but no actual bindings) with a parallel HDF installation in path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant