Skip to content

Commit 5795f7b

Browse files
ofekwillingc
andauthored
PEP 752: Address feedback, round 5 (#4018)
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
1 parent dbac24d commit 5795f7b

File tree

1 file changed

+213
-4
lines changed

1 file changed

+213
-4
lines changed

peps/pep-0752.rst

Lines changed: 213 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -329,10 +329,15 @@ Representatives from the following organizations have expressed support for
329329
this PEP (with a link to the discussion):
330330

331331
* `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999>`__
332+
(`expanded <https://discuss.python.org/t/63191/75>`__)
333+
* `pytest <https://discuss.python.org/t/63192/68>`__
332334
* `Typeshed <https://discuss.python.org/t/1609/37>`__
333335
* `Project Jupyter <https://discuss.python.org/t/61227/16>`__
334336
(`expanded <https://discuss.python.org/t/61227/48>`__)
335337
* `Microsoft <https://discuss.python.org/t/63191/40>`__
338+
* `Sentry <https://discuss.python.org/t/63192/67>`__
339+
(in favor of the NuGet approach over others but not negatively impacted
340+
by the current lack of capability)
336341
* `DataDog <https://discuss.python.org/t/63191/53>`__
337342

338343
Backwards Compatibility
@@ -344,6 +349,8 @@ chosen to signal a shared purpose with a prefix like `typeshed has done`__.
344349

345350
__ https://github.com/python/typeshed/issues/2491#issuecomment-578456045
346351

352+
.. _security-implications:
353+
347354
Security Implications
348355
=====================
349356

@@ -368,6 +375,19 @@ None at this time.
368375
Rejected Ideas
369376
==============
370377

378+
.. _artifact-level-association:
379+
380+
Artifact-level Namespace Association
381+
------------------------------------
382+
383+
An earlier version of this PEP proposed that metadata be associated with
384+
individual artifacts at the point of release. This was rejected because it
385+
had the potential to cause confusion for users who would expect the namespace
386+
authorization guarantee to be at the project level based on current grants
387+
rather than the time at which a given release occurred.
388+
389+
.. _organization-scoping:
390+
371391
Organization Scoping
372392
--------------------
373393

@@ -398,6 +418,8 @@ packages released with the scoping would be incompatible with older tools and
398418
would cause confusion for users along with frustration from maintainers having
399419
to triage such complaints.
400420

421+
.. _dedicated-repositories:
422+
401423
Encourage Dedicated Package Repositories
402424
----------------------------------------
403425

@@ -422,6 +444,191 @@ and ``Y``. If each repository has both packages but one is malicious on ``X``
422444
and the other is malicious on ``Y`` then the user would be unable to satisfy
423445
their requirements without encountering a malicious package.
424446

447+
.. _provenance-assertions:
448+
449+
Exclusive Reliance on Provenance Assertions
450+
-------------------------------------------
451+
452+
The idea here [5]_ would be to design a general purpose way for clients to make
453+
provenance assertions to verify certain properties of dependencies, each with
454+
custom syntax. Some examples:
455+
456+
* The package was uploaded by a specific organization or user name e.g.
457+
``pip install "azure-loganalytics from microsoft"``
458+
* The package was uploaded by an owner of a specific domain name e.g.
459+
``pip install "google-cloud-compute from cloud.google.com"``
460+
* The package was uploaded by a user with a specific email address e.g.
461+
``pip install "aws-cdk-lib from contact@amazon.com"``
462+
* The package matching a namespace was uploaded by an authorized party (this
463+
PEP)
464+
465+
A fundamental downside is that it doesn't play well with multiple
466+
repositories. For example, say a user wants the ``azure-loganalytics`` package
467+
and wants to ensure it comes from the organization named ``microsoft``. If
468+
Microsoft's organization name on PyPI is ``microsoft`` then a package manager
469+
that defaults to PyPI could accept ``azure-loganalytics from microsoft``.
470+
However, if multiple repositories are used for dependency resolution then the
471+
user would have to specify the repository as part of the definition which is
472+
unrealistic for reasons outlined in the dedicated section on
473+
`asserting package owner names <asserting-package-owner-names_>`_.
474+
475+
Another general weakness with this approach is that a user attempting to
476+
perform a simple ``pip install`` without special syntax, which is the most
477+
common scenario, would already be vulnerable to malicious packages. In order to
478+
overcome this there would have to be some default trust mechanism, which in all
479+
cases would impose certain UX or resolver logic upon every tool.
480+
481+
For example, package managers could be changed such that the first time a
482+
package is installed the user would receive a confirmation prompt displaying
483+
the provenance details. This would be very confusing and noisy, especially for
484+
new users, and would be a breaking UX change for existing users. Many methods
485+
of installation wouldn't work for this scenario such as running in CI or
486+
installing from a requirements file where the user would potentially be getting
487+
hundreds of prompts.
488+
489+
One solution to make this less disruptive for users would be to manually
490+
maintain a list of trustworthy details (organization/user names, domain names,
491+
email addresses, etc.). This could be discoverable by packages providing
492+
`entry points`__ which package managers could learn to detect and which
493+
corporate environments could install by default. This has the major downside of
494+
not providing automatic guarantees which would limit the usefulness for the
495+
average user who is more likely to be affected.
496+
497+
__ https://packaging.python.org/en/latest/specifications/entry-points/
498+
499+
There are two ideas that could be used to provide automatic protection, which
500+
could be based on :pep:`740` attestations or a new mechanism for utilizing
501+
third-party APIs that host the metadata.
502+
503+
First, each repository could offer a service that verifies the owner of a
504+
package using whatever criteria they deem appropriate. After verification, the
505+
repository would add the details to a dedicated package that would be installed
506+
by default.
507+
508+
This would require dedicated maintenance which is unrealistic for most
509+
repositories, even PyPI currently. It's unclear how community projects without
510+
the resources for something like a domain name would be supported. Critically,
511+
this solution would cause extra confusion for users in the case of multiple
512+
repositories as each might have their own verification processes, attestation
513+
criteria and default package containing the verified details. It would be
514+
challenging to get community buy-in of every package manager to be aware of
515+
each repositories' chosen verification package and install that by default
516+
before dependency resolution.
517+
518+
Should digital attestations become the chosen mechanism, a downside is that
519+
implementing this in custom package repositories would require a significant
520+
amount of work. In the case of PyPI, the prerequisite work on
521+
`Trusted Publishing`__ and then the `PEP 740 implementation`__ itself took the
522+
equivalent of a full-time engineer one year whose time was paid for by a
523+
corporate sponsor. Other organizations are unlikely to implement similar work
524+
because simpler mechanisms make it possible to implement reproducible builds.
525+
When everything is internally managed, attestations are also not very useful.
526+
Community projects are unlikely to undertake this effort because they would
527+
likely lack the resources to maintain the necessary infrastructure themselves
528+
and moreover there are significant downsides to
529+
`encouraging dedicated package repositories <dedicated-repositories_>`_.
530+
531+
__ https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/#acknowledgements
532+
__ https://blog.trailofbits.com/2024/10/01/securing-the-software-supply-chain-with-the-slsa-framework/
533+
534+
The other idea would be to host provenance assertions externally and push more
535+
logic client-side. A possible implementation might be to specify a provenance
536+
API that could be hosted at a designated relative path like
537+
``/provenance``. Projects on each repository could then be configured to point
538+
to a particular domain and this information would be passed on to clients
539+
during installation.
540+
541+
While this distributed approach does impose less of an infrastructure burden on
542+
repositories, it has the potential to be a security risk. If an external
543+
provenance API is compromised, it could lead to malicious packages being
544+
installed. If an external API is down, it could lead to package installation
545+
failing or package managers might only emit warnings in which case there is no
546+
security benefit.
547+
548+
Additionally, this disadvantages community projects that do not have the
549+
resources to maintain such an API. They could use free hosting solutions such
550+
as what many do for documentation but they do not technically own the
551+
infrastructure and they would be compromised should the generous offerings be
552+
restricted.
553+
554+
Finally, while both of these theoretical approaches are not yet prescriptive,
555+
they imply assertions at the artifact level which was already a
556+
`rejected idea <artifact-level-association_>`_.
557+
558+
.. _asserting-package-owner-names:
559+
560+
Asserting Package Owner Names
561+
-----------------------------
562+
563+
This is about asserting that the package came from a specific organization or
564+
user name. It's quite similar to the
565+
`organization scoping <organization-scoping_>`_ idea except that a flat
566+
namespace is the base assumption.
567+
568+
This would require modifications to the :pep:`JSON API <691>` of each supported
569+
repository and could be implemented by exposing extra metadata or as proper
570+
`provenance assertions <provenance-assertions_>`_.
571+
572+
As with the organization scoping idea, a new `syntax`__ would be required like
573+
``microsoft::azure-loganalytics`` where ``microsoft`` is the organization and
574+
``azure-loganalytics`` is the package. Although this plays well with the
575+
existing flat namespace in comparison, it retains the critical downside of
576+
being a disruption for the community with the number of changes required.
577+
578+
__ https://packaging.python.org/en/latest/specifications/dependency-specifiers/
579+
580+
A unique downside is that names are an implementation detail of repositories.
581+
On PyPI, the names of organizations are separate from user names so there is
582+
potential for conflicts. In the case of multiple repositories, users might run
583+
into cases of dependency confusion similar to the one at the end of the
584+
`Encourage Dedicated Package Repositories <dedicated-repositories_>`_
585+
rejected idea.
586+
587+
To ameliorate this, it was suggested that the syntax be expanded to also
588+
include the expected repository URL like
589+
``microsoft@pypi.org::azure-loganalytics``. This syntax or something like it
590+
is so verbose that it could lead to user confusion, and even worse, frustration
591+
should it gain increased adoption among those able to maintain dedicated
592+
infrastructure (community projects would not benefit).
593+
594+
The expanded syntax is an attempt to standardize resolver behavior and
595+
configuration within dependency specifiers. Not only would this be mandating
596+
the UX of tools, it lacks precedent in package managers for language ecosystems
597+
with or without the concept of package repositories. In such cases, the
598+
resolver configuration is separate from the dependency definition.
599+
600+
======== ======== =============================================================
601+
Language Tool Resolution behavior
602+
======== ======== =============================================================
603+
Rust Cargo Dependency resolution can be `modified`__ within
604+
``Cargo.toml`` using the the ``[patch]`` table.
605+
JS Yarn Although they have the concept of `protocols`__ (which are
606+
similar to the URL schemes of our `direct references`__),
607+
users configure the `resolutions`__ field in the
608+
``package.json`` file.
609+
JS npm Users can configure the `overrides`__ field in the
610+
``package.json`` file.
611+
Ruby Bundler The ``Gemfile`` allows for specifying an
612+
`explicit source`__ for a gem.
613+
C# NuGet It's possible to `override package versions`__ by configuring
614+
the ``Directory.Packages.props`` file.
615+
PHP Composer The ``composer.json`` file allows for specifying
616+
`repository`__ sources for specific packages.
617+
Go go The ``go.mod`` file allows for specifying a `replace`__
618+
directive. Note that this is used for direct dependencies
619+
as well as transitive dependencies.
620+
======== ======== =============================================================
621+
622+
__ https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html
623+
__ https://yarnpkg.com/protocols
624+
__ https://packaging.python.org/en/latest/specifications/version-specifiers/#direct-references
625+
__ https://yarnpkg.com/configuration/manifest#resolutions
626+
__ https://docs.npmjs.com/cli/v10/configuring-npm/package-json#overrides
627+
__ https://bundler.io/v2.5/man/gemfile.5.html#SOURCE-PRIORITY
628+
__ https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management#overriding-package-versions
629+
__ https://getcomposer.org/doc/articles/repository-priorities.md#filtering-packages
630+
__ https://go.dev/ref/mod#go-mod-file-replace
631+
425632
Use Fixed Prefixes
426633
------------------
427634

@@ -501,10 +708,9 @@ Footnotes
501708
Markdown files. They also have the concept of
502709
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
503710
developed by anyone and are usually prefixed by ``mkdocs-``.
504-
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service
505-
for organizations at any scale. The
506-
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
507-
with
711+
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service.
712+
The `Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships
713+
out-of-the-box with
508714
`official integrations <https://github.com/DataDog/integrations-core>`__
509715
for many products, like various databases and web servers, which are
510716
distributed as Python packages that are prefixed by ``datadog-``. There is
@@ -533,6 +739,9 @@ Footnotes
533739
`squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5>`__
534740
and this would be useful to prevent as a `hidden grant <hidden-grants_>`__.
535741
742+
.. [5] `Detailed write-up <https://discuss.python.org/t/64679>`__ of the
743+
potential for provenance assertions.
744+
536745
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
537746
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
538747
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html

0 commit comments

Comments
 (0)