Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TODO] Find better way to manage pip dependencies in our monorepo #29

Open
alextremblay opened this issue Aug 29, 2024 · 2 comments
Open

Comments

@alextremblay
Copy link
Contributor

alextremblay commented Aug 29, 2024

The tools repository is a monorepo containing a lot of uoft projects.
All of which are pip-installable packages, some of which are python libraries meant to be imported, some of which are python CLIs meant to be installed by end users and executed, many of which are both, and many of which form a dependency graph (ie nautobot depends on aruba+ssh+bluecat, scripts depends on bluecat+ssh+librenms, all of them depend on core)

For the monorepo and our python packaging / dependency management tooling, we have the following goals:

  1. We should be able to create custom forks of external projects we depend on (to fix bugs or add features) and be able to reliably reference those forks until the changes we make get upstreamed. We should be able to track these custom forks in separate git repos (forks of the original) and add these forked repos as submodules of our monorepo

  2. any developer wanting to contribute should be able to pull down the repo and run a command to set up a venv and install all of our packages as well as all their dependencies, pinned to specific versions (through use of a lock file)

    • all packages in the monorepo should automatically be installed in editable mode in this context
    • all custom fork submodules should be initialized automatically and installed in editable mode in this context
  3. any end user who wants to install any of our projects should be able to pull down the repo and pipx install projects/<name> and have it just work

    • for any projects that depend on other projects in the monorepo, pip/pipx should be instructed to install those dependencies by relative path (ie if user installs "aruba" project, and aruba depends on uoft_core, pip/pipx should install uoft_core from path "../core" relative to the aruba project folder, instead of installing uoft_core from PYPI, for example
    • for any projects that depend on custom forks, pip/pipx should either (haven't yet decided which is better:
      1. automatically initialize submodules of the monorepo and install those dependencies by relative path
      2. install those dependencies from git repo URLs (ex: if we have a custom fork of nautobot at https://github.com/utsc-networking/nautobot, pip/pipx should, while installing any package that depends on nautobot, interpret that dependency as nautobot @ git+https://github.com/utsc-networking/nautobot)
  4. users should be able to install any of our projects from any branch of our git repo by calling pipx install git+https://uoft-networking/tools@branch_name#subdirectory=projects/<name> and have it behave the same way as #3

  5. we should be able to build python wheels suitable for publishing to pypi, which reference dependencies by bare name instead of by relative path, and which reference custom-fork dependencies by git URL (as described in #3.2 above)

  6. optional bonus I would love to be able to restructure our monorepo so that all code for all projects lives in a single source tree and gets automatically broken up into PEP420-style namespaced packages (ie instead of having a package called uoft_core whose code lives in projects/core/uoft_core and a package called uoft_aruba whose code lives in projects/aruba/uoft_aruba, I'd love to have a package called uoft.core whose code lives in src/uoft/core and a project called uoft.aruba whose code lives in src/uoft/aruba

each of these is easy to accomplish, but accomplishing ALL of them together is extremely hard.

To accomplish #1 and #2, we use rye, which automatically installs all projects in projects/* and custom-forks/* in editable mode. The downside to this is that all developers on our monorepo must install every project and every dependency in their venv, even if they only want to work on one small project

To accomplish #3 and #4, we've structured all of our projects as PEP517-compliant python packages, with project metadata defined in a pyproject.toml that lives alongside each project. each of these packages uses a PEP517 build backend called hatchling to tell pip/pipx how to install the package, and each project has a pyproject.py hooks file that gets called by hatchling, allowing us to automatically convert bare dependencies on monorepo projects into relative reference dependencies. for example, the aruba project's pyproject.toml declares that uoft_aruba depends on uoft_core. When pip/pipx installs uoft_aruba, projects/aruba/pyproject.py is triggered, and it automatically rewrites that uoft_core dependency into uoft_core @ ../core

To accomplish #5, I've tried to add logic to the pyproject.py hook files to not rewrite dependencies when building wheels, but it does not work and is difficult to debug, so more work is needed there

#6 WOULD be possible / accomplishable, but not in a way that's compatible with #1, #2, #3, or #4. The only way i can think of to accomplish that in a compatible way would be to replace hatchling, pyproject.py files, and per-project pyproject.toml files with a custom in-repo PEP517 build backend. the idea is complicated and still not fully-formed in my mind, but it's there

@alextremblay
Copy link
Contributor Author

You may ask yourself: how do other projects accomplish these things?
answer: they don't. python monorepos are quite rare simply because it's very difficult to accomplish even half of all these requirements in a single repo, let alone all of them.

As far as I am aware, we are breaking new ground here. I'm not aware of any existing python monorepo that accomplishes as many of these goals as we do. we may very well be on the cutting edge here, for better or for worse 🫠

@alextremblay
Copy link
Contributor Author

I think we can get a better outcomre for #1-4 by switching from rye to uv

we need to make the switch anyway, as rye is being deprecated and rye's developer is recommending users move to uv anyway. uv now, as of the latest release, has support for workspaces, which fits our monorepo multi-package requirements quite well

also, uv has a dependency override mechanism which which would be a huge help to us. given the massive size of our monorepo's dependency graph, it's not uncommon for two of our packages to contain transient dependencies with conflicting version constraints. when that happens, rye lock completely crashes, and we're forced to deep dive and figure out how to untangle the mess, including sometimes forking a sub-sub-sub-dependency just to update its version constraints on the transient dependency which caused the problem. It's a mess, and uv's dependency override mechanism may be the solution we've been looking for

alextremblay added a commit that referenced this issue Sep 6, 2024
until we find an elegant solution to #29, this is probably the best i can do
alextremblay added a commit that referenced this issue Sep 18, 2024
As part of an ongoing effort to improve package management in the monorepo, we're switching from hatchling to pdm-backend. The biggest reason is that hatchling does not support passing in config at runtime via environment variables or pep517 config-settings.
In order to satisfy all the requirements of issue #29, we have build hooks that run whenever one of our projects gets installed or built or put into a lock file. these build hooks rewrite dependency metadata to satisfy different requirements. without the ability to pass in a goal or target to the build hooks, the hatchling-based ones perform some implicit "magic" to rewrite dependencies based on whatever environment markers they're able to sus out of the build process. I've never really been a big fan of that, I prefer explicit behaviour rather than implicit behaviour. One of the biggest issues with the implicit build hooks is that they would fail to perform as intend when building wheel files, and would produce wheel files that were broken

The new pdm-based build hooks work much better and are much easier to control. The only downside is that in order for pip-installing a folder in the monorepo to "just work", you have to add `--config-settings 'dependencies=local'` to the pip command. it no longer "just works" without that cli flag. :(

On the plus side, this new approach will allow us to create "local wheels" for distribution, ie a wheel file whose monorepo dependencies are pulled from adjacent wheel files. (ex: a bundle of .whl files for all projects in the monorepo, and if you `pip install uoft_aruba.*.whl`, pip will automatically install its dependency `uoft_core` from `./uoft_core.*.whl`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant