-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package source priority (HPC cluster use case) #10156
Comments
There was additional discussion in #8606, but I'm not moving that over for my own sanity. :) |
Other relevant information in this comment: We are not trying to provide a "secure" version and exclude installing other versions from PyPI. We are trying to provide an "optimized" and "working" version which PyPI fails to do in some instances. Our users install everything from PyPI anyway. We are not trying to make it more secure than PyPI, we are trying to make it more optimal, and to repair broken binary packages provided on PyPI. In that use case, shadowing PyPI in some circumstances is precisely what we want and need to do. Yes, I acknowledge that PyPI is subject to supply chain attack. But that's not what we are trying to fix. |
IIUC, your use case is "we have wheels that we want to use preferentially over what's on PyPI for the same versions of the package". Since you're building these wheels yourself, can you add local version identifiers to those built wheels? Specifically those would look like |
I am not familiar with those, but carry on ? Would these involve just renaming the wheel files ? Or do we need to inject some metadata inside of them ? We have over 6000 wheel files at the moment, so rebuilding them is a large endeavour. Assuming we can, how do these impact dependency and version resolution in pip ? |
I guess a way to better explain why your use case broke would be -- substitute "PyPI" for "target company's package index" and hopefully you see that it models the intent of a supply chain attack.
Unless I'm missing something, you'll need to do 2 things -- rename the wheels and update the version in the METADATA file inside the wheel (that's
pip will preferentially use local versions of packages. If there's If (somehow) that doesn't happen, that's a bug in pip that we'd need to fix. :) To be clear, it's not that anyone thinks that your use case isn't important / worth catering to somehow, but it was extremely unclear what you're trying to do and the story came together in pieces (which can all be put together now, hopefully). That said, it is very likely that you'll need to change something to be compatible with the new way that things work -- based on ComputeCanada/software-stack#80 being filed, I'm gonna be optimistic and guess that you'd be open to changes on this front to keep things working without needing to fork things and add maintenance workload for yourself. :) [1] The file's format is documented at https://packaging.python.org/specifications/core-metadata/ and how-to-parse example at https://github.com/pradyunsg/installer/blob/35ce9141733f1d3fcedfd5e19ea5e34d732fe822/src/installer/utils.py#L69. |
Another option is wheel build tags -- which might be an even better answer. If a wheel has a build tag, that wheel is preferred over a wheel that doesn't. That should only require renaming the wheel files. You can read more about the filename here: https://packaging.python.org/specifications/binary-distribution-format/#file-name-convention |
From #8606 (comment):
Where's this SO answer/comment? |
I am hoping we don't need to hack the wheels manually (i.e. unzip them and
Even with |
|
Thanks. I'll need to do some testing of how it actually behaves in dependency and version resolution with the latest pip. The local version is more tempting because we could tag them with a meaningful word (computecanada in our case), rather than just a number. It would be useful to have a description of how dependency resolution is expected to behave, because unfortunately, history has shown that we can run test and have something that works, just to have the way it works changed in the next version of pip. In particular, if I have, locally:
and PyPI has
and I run |
Build tags can be 1computecanada or something like that. It's a digit + (optional) string. Wheels on PyPI can't have build tags, IIRC. Ignoring that, it's gonna be the local version, followed by decreasing build tags followed by remote 1.0.0 followed by local 1.0.0 I think. None the less, I think the local version will be prefered over build tags. The order of preference is specified at: https://github.com/pypa/pip/blob/main/src/pip/_internal/index/package_finder.py#L530 |
I believe build tags are possible on PyPI, although practically nobody uses them. |
I have confirmed that either build tags or local version specifiers will work for this use case, with a slight preference for local version specifiers. Thanks for pointing out these options! I know I annoyed some people, and I apologize for that. I realize we come from a very different point of view, and I think both sides misunderstood the other side's point of view. I suggest (and I am volunteering to write the draft) to write a use case description/blog article/documentation page about this, presenting my understand of both the security use case and the priority use case, how they differ, and which solution is valid for both, hopefully with input from the devs if I get something wrong. I saw many issues (not just from me) being opened about what users misunderstood as setting an index that has priority over another one, and I genuinely think I can contribute to making it clearer for everyone. Are the devs interested in such a contribution, and if so, what form should it take ? |
Color me very interested! The whole "dependency confusion / package source priority" situation has suffered from fairly poor communication -- there seems to no single place describing what the users can do and what mechanisms/tools they have at hand to deal with the situation (which leads to them asking for a "would work for me" approach -> "what do you mean it's unmaintainable" -- basically, generally a bad experience for everyone involved). In other words, I think we would benefit a lot from this being documented clearly in pip's documentation. :) What we do to get there though, depends on how you'd prefer to do it. I personally prefer just having you write a blog post somewhere -- it avoids the overhead of needing to go through multiple rounds of review before adding something into pip's documentation. :)
|
Ok. I posted this here: Feedback is welcome, especially if I got things wrong. Hopefully this post can be useful to others in order to untangle some misconceptions. |
After meandering into the way |
By two packages with the same label, do you mean e.g. there are two pages for package |
I meant if there are two entries for a given file. Or more generally, if there are two entries which have the same priority as far as pip goes, i.e. the behavior of which one gets installed is undefined and could change over time. That is ubiquitous if you build wheels yourself and don't use a build-tag or local version. It will definitely yield wheels that have the same name as the versions available on PyPI. |
Sounds reasonable to me. We can also try to quelch the warning when the files have identical hash (obtained from the index page or directly hashing the file on the file system). I’d very much welcome a PR on this. |
Closing this out, since this seems to have been resolved. |
Originally posted by @mboisson in #8606 (comment)
The text was updated successfully, but these errors were encountered: