Description
Sometimes you want to use a custom wheel that can't be represented by Python packaging constraints, or simply want to use a particular whl at a particular location for your own testing, development, or reasons.
The particular case I have in mind is pytorch and building pytorch from source. Doing this presents some problems
There are (at least) 5 different distributions of PyTorch for different accelerators (cuda 11.8, cuda 12.6, cuda 12.8, rocm 6.3, and cpu). Unfortunately, environment markers can't represent these conditions, so it's not possible to express which of these "torch" should resolve to in a requirements or pylock file.
The above have public URLs, but using a local file is also desirable in some cases.
- In JAX: In some tests, they want to build a wheel (using bazel), and then use that wheel in some tests (also run by Bazel)
- In Pytorch XLA: They want to build Pytorch manually, then use that for the torch dependency.
- I've seen various slack posts of people building torch (or other ML, C++ heavy things) manually.
Local files would also be helpful for our own testing -- we could generate exactly what we needed without incurring the overhead of remote fetching.
To make this work, we basically need to mixin additional settings to the hub's routing aliases. Ultimately, we want to generate something like this in the hub:
# File: @pypi//torch:BUILD.bazel
alias(
name = "torch",
actual = select({
"@user//:is_torch_1": "@pypi_torch_cuda_11.8//:pkg",
"@user//:is_torch_2": "@pypi_torch_cpu//:pkg",
"//conditions:default": ":_default",
})
)
alias(
name = "_default",
actual = <select that is generated today>
)
I'm not entirely sure how to end up there, though. I'm thinking a pip.override() API that takes the conditions and their destinations
pip.parse(
hub_name = "my_pypi",
requirements = "requirements.txt"
)
pip.override(
hub_name = "my_pypi",
package = "torch",
config_setting = ["@user//:is_torch_cuda_11.8"],
urls = ["https://torch.com/torch-cuda-11.8.whl"],
)
pip.override(
hub_name = "my_pypi",
package = "torch",
config_setting = ["@user//:is_torch_cpu"],
wheel = "@user//:torch-cpu.whl"
)
Under the hood, each wheel/url
value turns into a repo, generating a whl_library-compatible repo (i.e downloads and extracts it). The config_settings are fed into whatever generates the hub code's select routing.
Alternative: don't do the repo creation part. Just plumb through the config condition and the repo name. Forcing users to create the repo doesn't feel ideal. We'd probably want to provide some sort of helper for that (but not whl_library directly -- its API is full of internal details).
There's two other pieces of the system where I'm not sure how the interaction will work:
(1) experimental_index_url. IIUC, this works by traversing the simpleapi graph to find a whl that satisfies. If we are providing our own wheels separately, how do those fit into the process?
For example, maybe the simpleapi doesn't find a compatible wheel, but that's expected because we're providing our own wheel?
Or: if we know we're going to use our own wheel, then traversing through the index (for that package) is wasted effort.
(2) To support conditions that python packaging can't express, an idea we had was having multiple requirements.txt files with a select() layer that chooses between them. e.g.
pip.parse(
hub_name = "bla",
requirements = "requirements-cpu.txt"
condition = "@//:is_accelerator_cpu"
)
pip.parse(
hub_name = "bla",
requirements = "requirements-cuda.txt"
condition = "@//:is_accelerator_cpu"
)
and @bla//somepkg
routes to @bla_X_somepkg
or @bla_Y_somepkg
.
Which looks pretty similar to my proposal above, just a different level of granularity.
cc @aignas