Skip to content

Commit

Permalink
PEP 752: Address feedback, round 5 (python#4018)
Browse files Browse the repository at this point in the history
Co-authored-by: Carol Willing <[email protected]>
  • Loading branch information
2 people authored and gvanrossum committed Dec 10, 2024
1 parent 76cfa87 commit 13426bf
Showing 1 changed file with 213 additions and 4 deletions.
217 changes: 213 additions & 4 deletions peps/pep-0752.rst
Original file line number Diff line number Diff line change
Expand Up @@ -329,10 +329,15 @@ Representatives from the following organizations have expressed support for
this PEP (with a link to the discussion):

* `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999>`__
(`expanded <https://discuss.python.org/t/63191/75>`__)
* `pytest <https://discuss.python.org/t/63192/68>`__
* `Typeshed <https://discuss.python.org/t/1609/37>`__
* `Project Jupyter <https://discuss.python.org/t/61227/16>`__
(`expanded <https://discuss.python.org/t/61227/48>`__)
* `Microsoft <https://discuss.python.org/t/63191/40>`__
* `Sentry <https://discuss.python.org/t/63192/67>`__
(in favor of the NuGet approach over others but not negatively impacted
by the current lack of capability)
* `DataDog <https://discuss.python.org/t/63191/53>`__

Backwards Compatibility
Expand All @@ -344,6 +349,8 @@ chosen to signal a shared purpose with a prefix like `typeshed has done`__.

__ https://github.com/python/typeshed/issues/2491#issuecomment-578456045

.. _security-implications:

Security Implications
=====================

Expand All @@ -368,6 +375,19 @@ None at this time.
Rejected Ideas
==============

.. _artifact-level-association:

Artifact-level Namespace Association
------------------------------------

An earlier version of this PEP proposed that metadata be associated with
individual artifacts at the point of release. This was rejected because it
had the potential to cause confusion for users who would expect the namespace
authorization guarantee to be at the project level based on current grants
rather than the time at which a given release occurred.

.. _organization-scoping:

Organization Scoping
--------------------

Expand Down Expand Up @@ -398,6 +418,8 @@ packages released with the scoping would be incompatible with older tools and
would cause confusion for users along with frustration from maintainers having
to triage such complaints.

.. _dedicated-repositories:

Encourage Dedicated Package Repositories
----------------------------------------

Expand All @@ -422,6 +444,191 @@ and ``Y``. If each repository has both packages but one is malicious on ``X``
and the other is malicious on ``Y`` then the user would be unable to satisfy
their requirements without encountering a malicious package.

.. _provenance-assertions:

Exclusive Reliance on Provenance Assertions
-------------------------------------------

The idea here [5]_ would be to design a general purpose way for clients to make
provenance assertions to verify certain properties of dependencies, each with
custom syntax. Some examples:

* The package was uploaded by a specific organization or user name e.g.
``pip install "azure-loganalytics from microsoft"``
* The package was uploaded by an owner of a specific domain name e.g.
``pip install "google-cloud-compute from cloud.google.com"``
* The package was uploaded by a user with a specific email address e.g.
``pip install "aws-cdk-lib from [email protected]"``
* The package matching a namespace was uploaded by an authorized party (this
PEP)

A fundamental downside is that it doesn't play well with multiple
repositories. For example, say a user wants the ``azure-loganalytics`` package
and wants to ensure it comes from the organization named ``microsoft``. If
Microsoft's organization name on PyPI is ``microsoft`` then a package manager
that defaults to PyPI could accept ``azure-loganalytics from microsoft``.
However, if multiple repositories are used for dependency resolution then the
user would have to specify the repository as part of the definition which is
unrealistic for reasons outlined in the dedicated section on
`asserting package owner names <asserting-package-owner-names_>`_.

Another general weakness with this approach is that a user attempting to
perform a simple ``pip install`` without special syntax, which is the most
common scenario, would already be vulnerable to malicious packages. In order to
overcome this there would have to be some default trust mechanism, which in all
cases would impose certain UX or resolver logic upon every tool.

For example, package managers could be changed such that the first time a
package is installed the user would receive a confirmation prompt displaying
the provenance details. This would be very confusing and noisy, especially for
new users, and would be a breaking UX change for existing users. Many methods
of installation wouldn't work for this scenario such as running in CI or
installing from a requirements file where the user would potentially be getting
hundreds of prompts.

One solution to make this less disruptive for users would be to manually
maintain a list of trustworthy details (organization/user names, domain names,
email addresses, etc.). This could be discoverable by packages providing
`entry points`__ which package managers could learn to detect and which
corporate environments could install by default. This has the major downside of
not providing automatic guarantees which would limit the usefulness for the
average user who is more likely to be affected.

__ https://packaging.python.org/en/latest/specifications/entry-points/

There are two ideas that could be used to provide automatic protection, which
could be based on :pep:`740` attestations or a new mechanism for utilizing
third-party APIs that host the metadata.

First, each repository could offer a service that verifies the owner of a
package using whatever criteria they deem appropriate. After verification, the
repository would add the details to a dedicated package that would be installed
by default.

This would require dedicated maintenance which is unrealistic for most
repositories, even PyPI currently. It's unclear how community projects without
the resources for something like a domain name would be supported. Critically,
this solution would cause extra confusion for users in the case of multiple
repositories as each might have their own verification processes, attestation
criteria and default package containing the verified details. It would be
challenging to get community buy-in of every package manager to be aware of
each repositories' chosen verification package and install that by default
before dependency resolution.

Should digital attestations become the chosen mechanism, a downside is that
implementing this in custom package repositories would require a significant
amount of work. In the case of PyPI, the prerequisite work on
`Trusted Publishing`__ and then the `PEP 740 implementation`__ itself took the
equivalent of a full-time engineer one year whose time was paid for by a
corporate sponsor. Other organizations are unlikely to implement similar work
because simpler mechanisms make it possible to implement reproducible builds.
When everything is internally managed, attestations are also not very useful.
Community projects are unlikely to undertake this effort because they would
likely lack the resources to maintain the necessary infrastructure themselves
and moreover there are significant downsides to
`encouraging dedicated package repositories <dedicated-repositories_>`_.

__ https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/#acknowledgements
__ https://blog.trailofbits.com/2024/10/01/securing-the-software-supply-chain-with-the-slsa-framework/

The other idea would be to host provenance assertions externally and push more
logic client-side. A possible implementation might be to specify a provenance
API that could be hosted at a designated relative path like
``/provenance``. Projects on each repository could then be configured to point
to a particular domain and this information would be passed on to clients
during installation.

While this distributed approach does impose less of an infrastructure burden on
repositories, it has the potential to be a security risk. If an external
provenance API is compromised, it could lead to malicious packages being
installed. If an external API is down, it could lead to package installation
failing or package managers might only emit warnings in which case there is no
security benefit.

Additionally, this disadvantages community projects that do not have the
resources to maintain such an API. They could use free hosting solutions such
as what many do for documentation but they do not technically own the
infrastructure and they would be compromised should the generous offerings be
restricted.

Finally, while both of these theoretical approaches are not yet prescriptive,
they imply assertions at the artifact level which was already a
`rejected idea <artifact-level-association_>`_.

.. _asserting-package-owner-names:

Asserting Package Owner Names
-----------------------------

This is about asserting that the package came from a specific organization or
user name. It's quite similar to the
`organization scoping <organization-scoping_>`_ idea except that a flat
namespace is the base assumption.

This would require modifications to the :pep:`JSON API <691>` of each supported
repository and could be implemented by exposing extra metadata or as proper
`provenance assertions <provenance-assertions_>`_.

As with the organization scoping idea, a new `syntax`__ would be required like
``microsoft::azure-loganalytics`` where ``microsoft`` is the organization and
``azure-loganalytics`` is the package. Although this plays well with the
existing flat namespace in comparison, it retains the critical downside of
being a disruption for the community with the number of changes required.

__ https://packaging.python.org/en/latest/specifications/dependency-specifiers/

A unique downside is that names are an implementation detail of repositories.
On PyPI, the names of organizations are separate from user names so there is
potential for conflicts. In the case of multiple repositories, users might run
into cases of dependency confusion similar to the one at the end of the
`Encourage Dedicated Package Repositories <dedicated-repositories_>`_
rejected idea.

To ameliorate this, it was suggested that the syntax be expanded to also
include the expected repository URL like
``[email protected]::azure-loganalytics``. This syntax or something like it
is so verbose that it could lead to user confusion, and even worse, frustration
should it gain increased adoption among those able to maintain dedicated
infrastructure (community projects would not benefit).

The expanded syntax is an attempt to standardize resolver behavior and
configuration within dependency specifiers. Not only would this be mandating
the UX of tools, it lacks precedent in package managers for language ecosystems
with or without the concept of package repositories. In such cases, the
resolver configuration is separate from the dependency definition.

======== ======== =============================================================
Language Tool Resolution behavior
======== ======== =============================================================
Rust Cargo Dependency resolution can be `modified`__ within
``Cargo.toml`` using the the ``[patch]`` table.
JS Yarn Although they have the concept of `protocols`__ (which are
similar to the URL schemes of our `direct references`__),
users configure the `resolutions`__ field in the
``package.json`` file.
JS npm Users can configure the `overrides`__ field in the
``package.json`` file.
Ruby Bundler The ``Gemfile`` allows for specifying an
`explicit source`__ for a gem.
C# NuGet It's possible to `override package versions`__ by configuring
the ``Directory.Packages.props`` file.
PHP Composer The ``composer.json`` file allows for specifying
`repository`__ sources for specific packages.
Go go The ``go.mod`` file allows for specifying a `replace`__
directive. Note that this is used for direct dependencies
as well as transitive dependencies.
======== ======== =============================================================

__ https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html
__ https://yarnpkg.com/protocols
__ https://packaging.python.org/en/latest/specifications/version-specifiers/#direct-references
__ https://yarnpkg.com/configuration/manifest#resolutions
__ https://docs.npmjs.com/cli/v10/configuring-npm/package-json#overrides
__ https://bundler.io/v2.5/man/gemfile.5.html#SOURCE-PRIORITY
__ https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management#overriding-package-versions
__ https://getcomposer.org/doc/articles/repository-priorities.md#filtering-packages
__ https://go.dev/ref/mod#go-mod-file-replace

Use Fixed Prefixes
------------------

Expand Down Expand Up @@ -501,10 +708,9 @@ Footnotes
Markdown files. They also have the concept of
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
developed by anyone and are usually prefixed by ``mkdocs-``.
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service
for organizations at any scale. The
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
with
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service.
The `Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships
out-of-the-box with
`official integrations <https://github.com/DataDog/integrations-core>`__
for many products, like various databases and web servers, which are
distributed as Python packages that are prefixed by ``datadog-``. There is
Expand Down Expand Up @@ -533,6 +739,9 @@ Footnotes
`squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5>`__
and this would be useful to prevent as a `hidden grant <hidden-grants_>`__.
.. [5] `Detailed write-up <https://discuss.python.org/t/64679>`__ of the
potential for provenance assertions.
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
Expand Down

0 comments on commit 13426bf

Please sign in to comment.