Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove cassandra from the shiv package #96

Merged
merged 1 commit into from
Jul 7, 2024

Conversation

fruch
Copy link
Collaborator

@fruch fruch commented Jun 30, 2024

so we can use the scylla-python3 driver version
the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.

it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)

Ref: scylladb/scylla-python3#40
Fixes: #95

@fruch fruch force-pushed the patch_so_files branch 2 times, most recently from f86f531 to 3dfd7f8 Compare June 30, 2024 13:24
fruch added a commit to fruch/scylladb that referenced this pull request Jun 30, 2024
@avikivity
Copy link
Member

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.

it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)

Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

@fruch
Copy link
Collaborator Author

fruch commented Jul 1, 2024

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.
it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)
Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.
it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)
Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

I don't like the idea of every change or fix we'll want to the cqlsh, it won't go via new dbuild image...

so we can use the scylla-python3 driver version
the only problem now, we depend on new dbuild version
for using newer version of scylla-driver with cqlsh.

it's really slow down the ability to update new feature
that depends on new driver chanages (not too many of those)

Depends: scylladb/scylla-python3#40
Fixes: scylladb#95
@avikivity
Copy link
Member

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.
it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)
Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.
it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)
Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

I don't like the idea of every change or fix we'll want to the cqlsh, it won't go via new dbuild image...

The python driver is also used for the tests, so it's a good idea to keep it updated.

@avikivity
Copy link
Member

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.
it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)
Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

so we can use the scylla-python3 driver version the only problem now, we depend on new dbuild version for using newer version of scylla-driver with cqlsh.
it's really slow down the ability to update new feature that depends on new driver changes (not too many of those)
Ref: scylladb/scylla-python3#40 Fixes: #95

It's a good idea to have fewer versions of the driver.

I don't like the idea of every change or fix we'll want to the cqlsh, it won't go via new dbuild image...

The python driver is also used for the tests, so it's a good idea to keep it updated.

The alternative is to include the python driver as a submodule and build it as part of the build process.

fruch added a commit to fruch/scylladb that referenced this pull request Jul 1, 2024
@fruch
Copy link
Collaborator Author

fruch commented Jul 1, 2024

The alternative is to include the python driver as a submodule and build it as part of the build process.

an alternative is todo exactly what scylla-python3 is doing for the downloaded wheel (without submoduling anything), i.e. patchelf the files,
the trick would be to make sure the shiv package is exactract to a known location (now it's defaults to ~/.shiv) with is relative to scylla-python3, and add to the rpath

I think it would work exactly as the driver from scylla-python3 works.

it just gonna take time to build it and try it out, and confirm it's working.
And it would take time for me to get to doing it.

I'm still trying to completely build this one in:
scylladb/scylladb#19558

@fruch
Copy link
Collaborator Author

fruch commented Jul 2, 2024

scylladb/scylladb#19558

Passed complication and gating tests

@syuu1228, can you review this change ?

@Annamikhlin
Copy link

scylladb/scylladb#19558

Passed complication and gating tests

@syuu1228, can you review this change ?

@syuu1228 - ping.. could you please review

@avikivity
Copy link
Member

The alternative is to include the python driver as a submodule and build it as part of the build process.

an alternative is todo exactly what scylla-python3 is doing for the downloaded wheel (without submoduling anything),

The submodule is a solution to having to regenerate the frozen toolchain on each driver change.

Since the driver changes often, and is part of our source base, it's reasonable to build it as part of the build process.

@syuu1228
Copy link
Contributor

syuu1228 commented Jul 6, 2024

I found that there are more shared library on shiv, it may cause similar error since these are not relocatable binaries:

$ find .shiv|grep -e "\.so"
.shiv/cqlsh_e2c7a692a44f163176c6c3c5fc35bbc486fdfecea430ed0a24707d6a759e7dfe/site-packages/scylla_driver.libs/libev-a3026d2b.so.4.0.0
.shiv/cqlsh_e2c7a692a44f163176c6c3c5fc35bbc486fdfecea430ed0a24707d6a759e7dfe/site-packages/yaml/_yaml.cpython-312-x86_64-linux-gnu.so
$ ldd .shiv/cqlsh_e2c7a692a44f163176c6c3c5fc35bbc486fdfecea430ed0a24707d6a759e7dfe/site-packages/scylla_driver.libs/libev-a3026d2b.so.4.0.0
        linux-vdso.so.1 =>  (0x00007ffcbd1fe000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f6240e94000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f6240ac6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f62413a5000)
$ ldd .shiv/cqlsh_e2c7a692a44f163176c6c3c5fc35bbc486fdfecea430ed0a24707d6a759e7dfe/site-packages/yaml/_yaml.cpython-312-x86_64-linux-gnu.so
        linux-vdso.so.1 =>  (0x00007ffff410b000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4aedc57000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f4aed889000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f4aede73000)

I saw previous discussion on the thread we should use scylla-driver on scylla-python3 or install specific version of scylla-driver, here are example code on both versions.

  1. If we want to use scylla-driver on scylla-python3, we also need to drop pyyaml and scylla-driver modules.
    This way the code will be simple, but we need to update frozen toolchain to update scylla-driver.
diff --git a/scripts/create-relocatable-package.py b/scripts/create-relocatable-package.py
index 2345289..d001503 100755
--- a/scripts/create-relocatable-package.py
+++ b/scripts/create-relocatable-package.py
@@ -94,7 +94,7 @@ ar.reloc_add('build/debian/debian', arcname='debian')
 # and pointing the correct lib folder)
 cqlsh_bin = pathlib.Path('bin/cqlsh').resolve()
 subprocess.check_call(["mv", cqlsh_bin, f'{cqlsh_bin}.zip'])
-subprocess.run(["zip", "--delete", cqlsh_bin, "site-packages/cassandra/*"])
+subprocess.run(["zip", "--delete", cqlsh_bin, "site-packages/cassandra/*", "site-packages/yaml/*", "site-packages/_yaml/*", "site-packages/scylla_driver-*.dist-info/*", "site-packages/scylla_driver.libs/*"])
 subprocess.check_call(["mv", f'{cqlsh_bin}.zip', cqlsh_bin])
 
 ar.reloc_add('bin/cqlsh')
\ No newline at end of file
  1. Or, if we want to install specifc version of scylla-driver, we need to implement bit complecated code something like this.
    There are reasons we cannot simply set rpath on create-relocatable-package.py.
    Because,
  • we have to use absolute path on rpath since ~/.shiv is out side of /opt/scylladb
  • and even we cannot hard code rpath to /opt/scylladb/python3/lib64, because we have nonroot mode
  • to configure rpath correctly on nonroot mode, we need to run patchelf on install.sh
    to do that, we need to bring relocatable patchelf on scylla-cqlsh.tar.gz
diff --git a/install.sh b/install.sh
index c9cb65c..1392dfa 100755
--- a/install.sh
+++ b/install.sh
@@ -129,6 +129,12 @@ fi
 install -d -m755 "$rprefix"/share/cassandra/pylib
 cp -rp pylib/cqlshlib "$rprefix"/share/cassandra/pylib
 
+shared_libs=(site-packages/cassandra/io/*.so site-packages/scylla_driver.libs/*.so.* site-packages/yaml/*.so site-packages/cassandra/*.so)
+unzip -o bin/cqlsh ${shared_libs[@]}
+for lib in ${shared_libs[@]}; do
+    patchelf/bin/patchelf --add-rpath "$prefix"/python3/lib64 "$lib"
+done
+zip bin/cqlsh ${shared_libs[@]}
 for i in bin/{cqlsh,cqlsh.py} ; do
     bn=$(basename $i)
     relocate_python3 "$rprefix"/share/cassandra/bin "$i"
diff --git a/scripts/create-relocatable-package.py b/scripts/create-relocatable-package.py
index 2345289..3627405 100755
--- a/scripts/create-relocatable-package.py
+++ b/scripts/create-relocatable-package.py
@@ -22,9 +22,14 @@
 #
 
 import argparse
+import io
+import os
 import subprocess
 import tarfile
+import shutil
+from tempfile import mkstemp
 import pathlib
+import time
 
 
 def erase_uid(tarinfo):
@@ -39,25 +44,90 @@ def reloc_add(self, name, arcname=None):
     else:
         return self.add(name, arcname="{}/{}".format(RELOC_PREFIX, name), filter=erase_uid)
 
-tarfile.TarFile.reloc_add = reloc_add
-
+def reloc_addfile(self, tarinfo, fileobj=None):
+    tarinfo.name = "{}/{}".format(RELOC_PREFIX, tarinfo.name)
+    return self.addfile(tarinfo, fileobj)
 
-def fix_binary(path, libpath):
+tarfile.TarFile.reloc_add = reloc_add
+tarfile.TarFile.reloc_addfile = reloc_addfile
+
+def ldd(executable):
+    '''Given an executable file, return a dictionary with the keys
+    containing its shared library dependencies and the values pointing
+    at the files they resolve to. A fake key ld.so points at the
+    dynamic loader.'''
+    libraries = {}
+    for ldd_line in subprocess.check_output(
+            ['ldd', executable],
+            universal_newlines=True).splitlines():
+        elements = ldd_line.split()
+        if ldd_line.endswith('not found'):
+            raise Exception('ldd {}: could not resolve {}'.format(executable, elements[0]))
+        if elements[1] != '=>':
+            if elements[0].startswith('linux-vdso.so'):
+                # provided by kernel
+                continue
+            libraries['ld.so'] = os.path.realpath(elements[0])
+        elif '//' in elements[0]:
+            # We know that the only DSO with a // in the path is the
+            # dynamic linker used by scylla, which is the same ld.so
+            # above.
+            pass
+        else:
+            libraries[elements[0]] = os.path.realpath(elements[2])
+    return libraries
+
+def fix_binary(ar, path, libpath):
     '''Makes one binary or shared library relocatable. To do that, we need to set RUNPATH to $ORIGIN/../lib64 so we get libraries
     from the relocatable directory and not from the system during runtime. We also want to copy the interpreter used so
     we can launch with it later.
     '''
     # it's a pity patchelf have to patch an actual binary.
+    patched_elf = mkstemp()[1]
+    shutil.copy2(path, patched_elf)
 
     subprocess.check_call(['patchelf',
-                           '--set-rpath',
+                           '--add-rpath',
                            libpath,
-                           path])
-
-
-def fix_sharedlib(binpath):
-    fix_binary(binpath, '$ORIGIN/lib64')
+                           patched_elf])
+    return patched_elf
 
+def fix_executable_binary(ar, binpath, prefix=''):
+    '''Makes the executable relocatable. To do that, we need to set RUNPATH to $ORIGIN/../lib64 so we get libraries
+    from the relocatable directory and not from the system during runtime. We also want to copy the interpreter used so
+    we can launch with it later.
+    '''
+    pyname = os.path.basename(binpath)
+    patched_binary = fix_binary(ar, binpath, '$ORIGIN/../lib64/')
+    interpreter = subprocess.check_output(['patchelf',
+                                           '--print-interpreter',
+                                           patched_binary], universal_newlines=True).splitlines()[0]
+    ar.reloc_add(os.path.realpath(interpreter), arcname=os.path.join(prefix, "libexec", "ld.so"))
+    ar.reloc_add(patched_binary, arcname=os.path.join(prefix, "libexec", pyname + ".bin"))
+    os.remove(patched_binary)
+
+def fix_sharedlib(ar, binpath, targetpath, prefix=''):
+    relpath =  os.path.join(os.path.relpath("lib64", targetpath), "lib64")
+    patched_binary = fix_binary(ar, binpath, f'$ORIGIN/{relpath}')
+    ar.reloc_add(patched_binary, arcname=os.path.join(prefix, targetpath))
+    os.remove(patched_binary)
+
+def gen_executable_thunk(ar, exebin, prefix=''):
+    base_thunk='''\
+#!/bin/bash
+x="$(readlink -f "$0")"
+b="$(basename "$x")"
+d="$(dirname "$x")/.."
+ldso="$d/libexec/ld.so"
+realexe="$d/libexec/$b.bin"
+exec -a "$0" "$ldso" "$realexe" "$@"
+'''
+    thunk = base_thunk.encode()
+    ti = tarfile.TarInfo(name=os.path.join(prefix, "bin", exebin))
+    ti.size = len(thunk)
+    ti.mode = 0o755
+    ti.mtime = time.time()
+    ar.reloc_addfile(ti, fileobj=io.BytesIO(thunk))
 
 ap = argparse.ArgumentParser(description='Create a relocatable scylla package.')
 ap.add_argument('--version', required=True,
@@ -76,6 +146,22 @@ with open('build/.relocatable_package_version', 'w') as f:
     f.write('2\n')
 ar.add('build/.relocatable_package_version', arcname='.relocatable_package_version', filter=erase_uid)
 
+gen_executable_thunk(ar, 'patchelf', 'patchelf')
+fix_executable_binary(ar, '/usr/bin/patchelf', 'patchelf')
+libs = {}
+libs.update(ldd('/usr/bin/patchelf'))
+for lib, f in libs.items():
+    libfile = f
+    if libfile.startswith("/usr/"):
+        libfile = libfile.replace("/usr/", "/", 1)
+    if libfile.startswith("/lib/"):
+        libfile = libfile.replace("/lib/", "lib64/", 1)
+    elif libfile.startswith("/lib64/"):
+        libfile = libfile.replace("/lib64/", "lib64/", 1)
+    else:
+        raise RuntimeError("unexpected path: don't know what to do with {}".format(f))
+    fix_sharedlib(ar, f, libfile, 'patchelf')
+
 pathlib.Path('build/SCYLLA-RELOCATABLE-FILE').touch()
 ar.reloc_add('build/SCYLLA-RELOCATABLE-FILE', arcname='SCYLLA-RELOCATABLE-FILE')
 ar.reloc_add('build/SCYLLA-RELEASE-FILE', arcname='SCYLLA-RELEASE-FILE')
@@ -87,14 +173,4 @@ ar.reloc_add('bin/cqlsh.py')
 ar.reloc_add('pylib')
 ar.reloc_add('install.sh')
 ar.reloc_add('build/debian/debian', arcname='debian')
-
-
-# clear scylla-driver out of the package
-# we assume that scylla-python3 already have it (and all it's .so are relocatable,
-# and pointing the correct lib folder)
-cqlsh_bin = pathlib.Path('bin/cqlsh').resolve()
-subprocess.check_call(["mv", cqlsh_bin, f'{cqlsh_bin}.zip'])
-subprocess.run(["zip", "--delete", cqlsh_bin, "site-packages/cassandra/*"])
-subprocess.check_call(["mv", f'{cqlsh_bin}.zip', cqlsh_bin])
-
 ar.reloc_add('bin/cqlsh')
\ No newline at end of file
-- 
2.45.2

@fruch
Copy link
Collaborator Author

fruch commented Jul 7, 2024

@syuu1228 we can put the .shiv directory into /opt/scylladb then we can do the rpath relative and it should work

fruch added a commit to fruch/scylladb that referenced this pull request Jul 7, 2024
@fruch
Copy link
Collaborator Author

fruch commented Jul 7, 2024

Meanwhile we'll deliver this change to unblock master, as it was proven to be working in:
scylladb/scylladb#19558

I'll try out the change of patching the .so in the shiv zip, without depending on the driver in scylla-python3
(something similar to what @syuu1228 suggested here, bit with setting the shiv to expend on a known path, and rpath would be know in build time, and not need patchelf during install.sh)

@fruch fruch merged commit 86a280a into scylladb:master Jul 7, 2024
8 checks passed
fruch added a commit to fruch/scylladb that referenced this pull request Jul 7, 2024
* tools/cqlsh 73bdbeb0...86a280a1 (1):
  > remove cassandra from the shiv package

Ref: scylladb/scylla-cqlsh#96
denesb pushed a commit to scylladb/scylladb that referenced this pull request Jul 8, 2024
* tools/cqlsh 73bdbeb0...86a280a1 (1):
  > remove cassandra from the shiv package

Ref: scylladb/scylla-cqlsh#96

Closes #19558
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cqlsh fails to work on old distros (undefined symbol: __libc_siglongjmp, version GLIBC_PRIVATE)
4 participants