forked from python/peps
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PEP 752: Address feedback, round 5 (python#4018)
Co-authored-by: Carol Willing <[email protected]>
- Loading branch information
1 parent
76cfa87
commit 13426bf
Showing
1 changed file
with
213 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -329,10 +329,15 @@ Representatives from the following organizations have expressed support for | |
this PEP (with a link to the discussion): | ||
|
||
* `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999>`__ | ||
(`expanded <https://discuss.python.org/t/63191/75>`__) | ||
* `pytest <https://discuss.python.org/t/63192/68>`__ | ||
* `Typeshed <https://discuss.python.org/t/1609/37>`__ | ||
* `Project Jupyter <https://discuss.python.org/t/61227/16>`__ | ||
(`expanded <https://discuss.python.org/t/61227/48>`__) | ||
* `Microsoft <https://discuss.python.org/t/63191/40>`__ | ||
* `Sentry <https://discuss.python.org/t/63192/67>`__ | ||
(in favor of the NuGet approach over others but not negatively impacted | ||
by the current lack of capability) | ||
* `DataDog <https://discuss.python.org/t/63191/53>`__ | ||
|
||
Backwards Compatibility | ||
|
@@ -344,6 +349,8 @@ chosen to signal a shared purpose with a prefix like `typeshed has done`__. | |
|
||
__ https://github.com/python/typeshed/issues/2491#issuecomment-578456045 | ||
|
||
.. _security-implications: | ||
|
||
Security Implications | ||
===================== | ||
|
||
|
@@ -368,6 +375,19 @@ None at this time. | |
Rejected Ideas | ||
============== | ||
|
||
.. _artifact-level-association: | ||
|
||
Artifact-level Namespace Association | ||
------------------------------------ | ||
|
||
An earlier version of this PEP proposed that metadata be associated with | ||
individual artifacts at the point of release. This was rejected because it | ||
had the potential to cause confusion for users who would expect the namespace | ||
authorization guarantee to be at the project level based on current grants | ||
rather than the time at which a given release occurred. | ||
|
||
.. _organization-scoping: | ||
|
||
Organization Scoping | ||
-------------------- | ||
|
||
|
@@ -398,6 +418,8 @@ packages released with the scoping would be incompatible with older tools and | |
would cause confusion for users along with frustration from maintainers having | ||
to triage such complaints. | ||
|
||
.. _dedicated-repositories: | ||
|
||
Encourage Dedicated Package Repositories | ||
---------------------------------------- | ||
|
||
|
@@ -422,6 +444,191 @@ and ``Y``. If each repository has both packages but one is malicious on ``X`` | |
and the other is malicious on ``Y`` then the user would be unable to satisfy | ||
their requirements without encountering a malicious package. | ||
|
||
.. _provenance-assertions: | ||
|
||
Exclusive Reliance on Provenance Assertions | ||
------------------------------------------- | ||
|
||
The idea here [5]_ would be to design a general purpose way for clients to make | ||
provenance assertions to verify certain properties of dependencies, each with | ||
custom syntax. Some examples: | ||
|
||
* The package was uploaded by a specific organization or user name e.g. | ||
``pip install "azure-loganalytics from microsoft"`` | ||
* The package was uploaded by an owner of a specific domain name e.g. | ||
``pip install "google-cloud-compute from cloud.google.com"`` | ||
* The package was uploaded by a user with a specific email address e.g. | ||
``pip install "aws-cdk-lib from [email protected]"`` | ||
* The package matching a namespace was uploaded by an authorized party (this | ||
PEP) | ||
|
||
A fundamental downside is that it doesn't play well with multiple | ||
repositories. For example, say a user wants the ``azure-loganalytics`` package | ||
and wants to ensure it comes from the organization named ``microsoft``. If | ||
Microsoft's organization name on PyPI is ``microsoft`` then a package manager | ||
that defaults to PyPI could accept ``azure-loganalytics from microsoft``. | ||
However, if multiple repositories are used for dependency resolution then the | ||
user would have to specify the repository as part of the definition which is | ||
unrealistic for reasons outlined in the dedicated section on | ||
`asserting package owner names <asserting-package-owner-names_>`_. | ||
|
||
Another general weakness with this approach is that a user attempting to | ||
perform a simple ``pip install`` without special syntax, which is the most | ||
common scenario, would already be vulnerable to malicious packages. In order to | ||
overcome this there would have to be some default trust mechanism, which in all | ||
cases would impose certain UX or resolver logic upon every tool. | ||
|
||
For example, package managers could be changed such that the first time a | ||
package is installed the user would receive a confirmation prompt displaying | ||
the provenance details. This would be very confusing and noisy, especially for | ||
new users, and would be a breaking UX change for existing users. Many methods | ||
of installation wouldn't work for this scenario such as running in CI or | ||
installing from a requirements file where the user would potentially be getting | ||
hundreds of prompts. | ||
|
||
One solution to make this less disruptive for users would be to manually | ||
maintain a list of trustworthy details (organization/user names, domain names, | ||
email addresses, etc.). This could be discoverable by packages providing | ||
`entry points`__ which package managers could learn to detect and which | ||
corporate environments could install by default. This has the major downside of | ||
not providing automatic guarantees which would limit the usefulness for the | ||
average user who is more likely to be affected. | ||
|
||
__ https://packaging.python.org/en/latest/specifications/entry-points/ | ||
|
||
There are two ideas that could be used to provide automatic protection, which | ||
could be based on :pep:`740` attestations or a new mechanism for utilizing | ||
third-party APIs that host the metadata. | ||
|
||
First, each repository could offer a service that verifies the owner of a | ||
package using whatever criteria they deem appropriate. After verification, the | ||
repository would add the details to a dedicated package that would be installed | ||
by default. | ||
|
||
This would require dedicated maintenance which is unrealistic for most | ||
repositories, even PyPI currently. It's unclear how community projects without | ||
the resources for something like a domain name would be supported. Critically, | ||
this solution would cause extra confusion for users in the case of multiple | ||
repositories as each might have their own verification processes, attestation | ||
criteria and default package containing the verified details. It would be | ||
challenging to get community buy-in of every package manager to be aware of | ||
each repositories' chosen verification package and install that by default | ||
before dependency resolution. | ||
|
||
Should digital attestations become the chosen mechanism, a downside is that | ||
implementing this in custom package repositories would require a significant | ||
amount of work. In the case of PyPI, the prerequisite work on | ||
`Trusted Publishing`__ and then the `PEP 740 implementation`__ itself took the | ||
equivalent of a full-time engineer one year whose time was paid for by a | ||
corporate sponsor. Other organizations are unlikely to implement similar work | ||
because simpler mechanisms make it possible to implement reproducible builds. | ||
When everything is internally managed, attestations are also not very useful. | ||
Community projects are unlikely to undertake this effort because they would | ||
likely lack the resources to maintain the necessary infrastructure themselves | ||
and moreover there are significant downsides to | ||
`encouraging dedicated package repositories <dedicated-repositories_>`_. | ||
|
||
__ https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/#acknowledgements | ||
__ https://blog.trailofbits.com/2024/10/01/securing-the-software-supply-chain-with-the-slsa-framework/ | ||
|
||
The other idea would be to host provenance assertions externally and push more | ||
logic client-side. A possible implementation might be to specify a provenance | ||
API that could be hosted at a designated relative path like | ||
``/provenance``. Projects on each repository could then be configured to point | ||
to a particular domain and this information would be passed on to clients | ||
during installation. | ||
|
||
While this distributed approach does impose less of an infrastructure burden on | ||
repositories, it has the potential to be a security risk. If an external | ||
provenance API is compromised, it could lead to malicious packages being | ||
installed. If an external API is down, it could lead to package installation | ||
failing or package managers might only emit warnings in which case there is no | ||
security benefit. | ||
|
||
Additionally, this disadvantages community projects that do not have the | ||
resources to maintain such an API. They could use free hosting solutions such | ||
as what many do for documentation but they do not technically own the | ||
infrastructure and they would be compromised should the generous offerings be | ||
restricted. | ||
|
||
Finally, while both of these theoretical approaches are not yet prescriptive, | ||
they imply assertions at the artifact level which was already a | ||
`rejected idea <artifact-level-association_>`_. | ||
|
||
.. _asserting-package-owner-names: | ||
|
||
Asserting Package Owner Names | ||
----------------------------- | ||
|
||
This is about asserting that the package came from a specific organization or | ||
user name. It's quite similar to the | ||
`organization scoping <organization-scoping_>`_ idea except that a flat | ||
namespace is the base assumption. | ||
|
||
This would require modifications to the :pep:`JSON API <691>` of each supported | ||
repository and could be implemented by exposing extra metadata or as proper | ||
`provenance assertions <provenance-assertions_>`_. | ||
|
||
As with the organization scoping idea, a new `syntax`__ would be required like | ||
``microsoft::azure-loganalytics`` where ``microsoft`` is the organization and | ||
``azure-loganalytics`` is the package. Although this plays well with the | ||
existing flat namespace in comparison, it retains the critical downside of | ||
being a disruption for the community with the number of changes required. | ||
|
||
__ https://packaging.python.org/en/latest/specifications/dependency-specifiers/ | ||
|
||
A unique downside is that names are an implementation detail of repositories. | ||
On PyPI, the names of organizations are separate from user names so there is | ||
potential for conflicts. In the case of multiple repositories, users might run | ||
into cases of dependency confusion similar to the one at the end of the | ||
`Encourage Dedicated Package Repositories <dedicated-repositories_>`_ | ||
rejected idea. | ||
|
||
To ameliorate this, it was suggested that the syntax be expanded to also | ||
include the expected repository URL like | ||
``[email protected]::azure-loganalytics``. This syntax or something like it | ||
is so verbose that it could lead to user confusion, and even worse, frustration | ||
should it gain increased adoption among those able to maintain dedicated | ||
infrastructure (community projects would not benefit). | ||
|
||
The expanded syntax is an attempt to standardize resolver behavior and | ||
configuration within dependency specifiers. Not only would this be mandating | ||
the UX of tools, it lacks precedent in package managers for language ecosystems | ||
with or without the concept of package repositories. In such cases, the | ||
resolver configuration is separate from the dependency definition. | ||
|
||
======== ======== ============================================================= | ||
Language Tool Resolution behavior | ||
======== ======== ============================================================= | ||
Rust Cargo Dependency resolution can be `modified`__ within | ||
``Cargo.toml`` using the the ``[patch]`` table. | ||
JS Yarn Although they have the concept of `protocols`__ (which are | ||
similar to the URL schemes of our `direct references`__), | ||
users configure the `resolutions`__ field in the | ||
``package.json`` file. | ||
JS npm Users can configure the `overrides`__ field in the | ||
``package.json`` file. | ||
Ruby Bundler The ``Gemfile`` allows for specifying an | ||
`explicit source`__ for a gem. | ||
C# NuGet It's possible to `override package versions`__ by configuring | ||
the ``Directory.Packages.props`` file. | ||
PHP Composer The ``composer.json`` file allows for specifying | ||
`repository`__ sources for specific packages. | ||
Go go The ``go.mod`` file allows for specifying a `replace`__ | ||
directive. Note that this is used for direct dependencies | ||
as well as transitive dependencies. | ||
======== ======== ============================================================= | ||
|
||
__ https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html | ||
__ https://yarnpkg.com/protocols | ||
__ https://packaging.python.org/en/latest/specifications/version-specifiers/#direct-references | ||
__ https://yarnpkg.com/configuration/manifest#resolutions | ||
__ https://docs.npmjs.com/cli/v10/configuring-npm/package-json#overrides | ||
__ https://bundler.io/v2.5/man/gemfile.5.html#SOURCE-PRIORITY | ||
__ https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management#overriding-package-versions | ||
__ https://getcomposer.org/doc/articles/repository-priorities.md#filtering-packages | ||
__ https://go.dev/ref/mod#go-mod-file-replace | ||
|
||
Use Fixed Prefixes | ||
------------------ | ||
|
||
|
@@ -501,10 +708,9 @@ Footnotes | |
Markdown files. They also have the concept of | ||
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be | ||
developed by anyone and are usually prefixed by ``mkdocs-``. | ||
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service | ||
for organizations at any scale. The | ||
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box | ||
with | ||
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service. | ||
The `Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships | ||
out-of-the-box with | ||
`official integrations <https://github.com/DataDog/integrations-core>`__ | ||
for many products, like various databases and web servers, which are | ||
distributed as Python packages that are prefixed by ``datadog-``. There is | ||
|
@@ -533,6 +739,9 @@ Footnotes | |
`squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5>`__ | ||
and this would be useful to prevent as a `hidden grant <hidden-grants_>`__. | ||
.. [5] `Detailed write-up <https://discuss.python.org/t/64679>`__ of the | ||
potential for provenance assertions. | ||
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html | ||
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html | ||
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html | ||
|