Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thought: automagically clean up dead packages on charm upgrade to work around a juju bug #1300

Closed
PietroPasotti opened this issue Jul 25, 2024 · 4 comments
Labels
feature New feature or request investigate Needs further investigation - do we want to do this?

Comments

@PietroPasotti
Copy link
Contributor

We recently had an issue where the otel library's mechanism to discover resource detectors raised an Exception after a charm refresh, as described in canonical/tempo-coordinator-k8s-operator#108.
That was caused by a juju/charmcraft issue (https://bugs.launchpad.net/juju/+bug/2058335) where, when upgrading packages, Juju leaves behind package metadata from the old versions (something to do with empty package dirs). This caused otel's discovery mechanism to incorrectly use the old resource detector, leading to an exception.

The underlying issue is that on charm upgrade, juju isn't as thorough as one would hope with replacing the old venv the charm runs with, with the new one. So on a dependency change between versions during a refresh, we have an issue.

I realize this is a rather specific issue due to the fact that the lib we're using (on opentelemetry sdk lib) relies on the presence of directories (instead of fiels) to determine what modules to attempt to load.
However, it is a fact that juju isn't correctly cleaning up the venv and that might cause more issues to more people.

We can patch this in each charm using the library so that, on upgrade, we do a cleanup. However, if the dependency is a pydeps-introduced one, this means we'd have to update every single charm using the charm lib (as is our case).
We can patch this in the charm lib itself (this is our current workaround), to do the cleanup on load or every time the core functionality is invoked, but that's quite hacky.
We can fix the root issue in juju (hopefully we will) but that won't help people stuck with older juju versions. We can fix that in ops, and refreshing the charms will bring the fix to charms running with older jujus too.
Hence this thought.

The implementation could be:

  • on ops.main:
    • if event is upgrade-charm:
      • for each package:
        • if package metadata shows no entry points
          • rm -rf the package root
@ca-scribner
Copy link

Happy for us to figure something out here, but we'd need to be more nuanced than checking for entry_points. Entry points are an optional thing that a package might include (advertisements to other packages about things they offer). But there's probably some other (lack of) metadata we can detect to know if something is dead

@PietroPasotti
Copy link
Contributor Author

an empty package dir is also a good clue

@dimaqq
Copy link
Contributor

dimaqq commented Aug 26, 2024

Upstream bug has a note: Fix committed for Juju 3.5.4

@dimaqq dimaqq added feature New feature or request investigate Needs further investigation - do we want to do this? labels Aug 26, 2024
@benhoyt
Copy link
Collaborator

benhoyt commented Sep 27, 2024

This is fixed in Juju 3.5, and it seems like we should fix there and push on upstream anyway, rather than work around in Ops. Closing.

@benhoyt benhoyt closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request investigate Needs further investigation - do we want to do this?
Projects
None yet
Development

No branches or pull requests

4 participants