Skip to content

Reproducible Python virtual environment using uv and nix #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2025

Conversation

yvan-sraka
Copy link
Contributor

This PR tries to achieve the same goal as #1387, but here the system dependencies needed by Cargo are quite straightforward. I might want to improve it later with rust-overlay to make it aware of the ./rust-toolchain config, or even try to package the whole thing with something like Crane!

@yvan-sraka yvan-sraka requested a review from torymur January 22, 2025 11:13
@yvan-sraka yvan-sraka self-assigned this Jan 22, 2025
@yvan-sraka yvan-sraka force-pushed the reproducibility-with-uv-and-nix branch from 215b718 to d9a8977 Compare January 22, 2025 11:15
@yvan-sraka yvan-sraka changed the title [draft] add direnv, nix configs, and uv lock file Reproducible Python virtual environment using uv and nix Jan 22, 2025
@yvan-sraka yvan-sraka marked this pull request as draft January 22, 2025 11:17
@yvan-sraka yvan-sraka force-pushed the reproducibility-with-uv-and-nix branch 2 times, most recently from 0204e2f to 6f615c0 Compare February 4, 2025 23:40
@yvan-sraka yvan-sraka requested a review from rlouf February 4, 2025 23:41
@yvan-sraka yvan-sraka marked this pull request as ready for review February 4, 2025 23:41
@yvan-sraka yvan-sraka force-pushed the reproducibility-with-uv-and-nix branch 3 times, most recently from 7aaa432 to a53f40c Compare February 5, 2025 10:05
Copy link
Contributor

@torymur torymur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, it's difficult for me to have a reasonably good take on this, since I don't use any of that.

My random thoughts:

  1. this isn't a python project, we're generating basic python bindings, for which in python we're testing only existence of the interfaces with minimal set of dependencies
  2. do we want to support this? if so, would be nice to know when it would be expected to be updated, considering that any lock files are usually require maintenance and it's always a certain compromise between debugging why CI is passing alright, but user brings you issues with dependencies
  3. do we want to have all this in the root of the rust repo? maybe there is a way to pack it somehow?

But I wouldn't want to insist on any particular resolution here, we might want to support this specific user group in this project? Or maybe it would make sense to have it consistent with outlines as well? Maybe there are other good angles to look at it.

@rlouf

@rlouf
Copy link
Member

rlouf commented Feb 5, 2025

I agree with @torymur, everything that's related to Python bindings (including tests) should go in the bindings/python directory, like HF did with its tokenizers library for instance.

I'm usually not a huge fan of lock files, but it might make sense in this context where the library is imported by other libraries that require stability? But I'm not an expert.

@yvan-sraka
Copy link
Contributor Author

yvan-sraka commented Feb 5, 2025

Lock files are essential to ensure that contributors and CI/CD share the same environment. Otherwise, many reproducibility issues can arise, and I’ve already been struggling with such problems too often since I started contributing to Outlines... I'm very open to discussing alternative technical solutions to uv and/or nix, but I strongly believe there is an engineering challenge here that needs to be addressed!

For reference, I’m quoting below the argument I previously shared with @torymur:

nix is just used to pin system dependencies that are not described in uv.lock file! I don't use nix to build the library itself, because unless we need some complex cross-compilation workflow I guess that's a bit overkill... but what's great is that we can now easily (I can do that in a follow-up PR) make CI/CD (that's really useful in release workflows) always use the exact same venv and Rust version (and basically the same system dependencies, all the way through, pinning e.g. glibc) than the one described in the repo (which is also really nice when we've to debug regressions and git bisect the history)!

TBH, I think it's better to have potentially unmaintained things in a public repo that external contributors could update rather than having nothing at all. It's feasible to hide thoses files in a subfolder, but it's pretty common to find them in the source root. It's also simple to update programmatically. Same for uv, which is kind of becoming a standard in the Python ecosystem, AFAIU. It's not just for my personal use (though it will definitely make my life much less painful), but rather for use in CI/CD, making debugging much easier. It's really just like a lock file for system dependencies, and lock files are pretty essential for reproducing something that happened on another machine!

I also looked a bit on the internet, and most of the top Rust repositories include these types of files:

An alternative approach could be maintaining the .nix files in a separate repository, similar to https://github.com/huggingface/text-generation-inference-nix, though that would be a bit more cumbersome...

I'm happy to split this PR into separate changes for nix and uv or explore other technical alternatives. IMHO, this is an essential step in making the repository production-ready!

@rlouf
Copy link
Member

rlouf commented Feb 5, 2025

You have a much more informed opinion than I do, so your call. The remark on moving everything related to Python to bindings/python is still valid, but can be done in a separate PR.

@yvan-sraka
Copy link
Contributor Author

yvan-sraka commented Feb 5, 2025

Yes, will do! And indeed, I understand that there is always some pushback when introducing a new fancy thing into a repository’s technical stack, as it adds another brick to maintain... So, I want to really emphasize that this is entirely optional and would never be required to work with the library, just like how we also have a Conda environment.yml file!

@torymur
Copy link
Contributor

torymur commented Feb 6, 2025

I closed this newly created issue #170 since it was a duplicate of this #153

everything that's related to Python bindings (including tests) should go in the bindings/python directory, like HF did with its tokenizers library for instance.

Exactly that was done in that branch mentioned in #153, where majority of the work was implemented, but as explained there as well, there are some things which we need to do first. Moving things are relatively easy (though still require code adaptations, but these I consider are for the best, since they separate tied logic), what is challenging, however, and requires a lot of fidgeting is making sure that everything which was moved still works in tests, benchmarks and all of the workflows.


To the topic.

Rust lock files when Python artifacts being generated has its merit (meaning properly separated code in non-existent yet bindings/python), although it will require certain adaptations in the workflows. But for a project at this stage, where Rust library logic is still somewhat tied together with artifacts generation, even these Rust lock files more troubling rather than helping: #139 (comment)

As for Python lock files in a project that simply generates Python bindings, I'm not fully convinced of these, but what's certain is that #153 work has to be done first

@rlouf
Copy link
Member

rlouf commented Feb 21, 2025

I'm in favor of this change if we move the python bindings in a bindings/python directory and the files are located there.

@yvan-sraka yvan-sraka force-pushed the reproducibility-with-uv-and-nix branch from a53f40c to b0a32de Compare February 25, 2025 13:50
@yvan-sraka yvan-sraka requested a review from 414owen February 25, 2025 13:51
@yvan-sraka yvan-sraka force-pushed the reproducibility-with-uv-and-nix branch from b0a32de to 7a3beb7 Compare February 25, 2025 17:58
@rlouf rlouf merged commit f08d62a into main Mar 11, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants