@@ -329,10 +329,15 @@ Representatives from the following organizations have expressed support for
329329this PEP (with a link to the discussion):
330330
331331* `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999 >`__
332+ (`expanded <https://discuss.python.org/t/63191/75 >`__)
333+ * `pytest <https://discuss.python.org/t/63192/68 >`__
332334* `Typeshed <https://discuss.python.org/t/1609/37 >`__
333335* `Project Jupyter <https://discuss.python.org/t/61227/16 >`__
334336 (`expanded <https://discuss.python.org/t/61227/48 >`__)
335337* `Microsoft <https://discuss.python.org/t/63191/40 >`__
338+ * `Sentry <https://discuss.python.org/t/63192/67 >`__
339+ (in favor of the NuGet approach over others but not negatively impacted
340+ by the current lack of capability)
336341* `DataDog <https://discuss.python.org/t/63191/53 >`__
337342
338343Backwards Compatibility
@@ -344,6 +349,8 @@ chosen to signal a shared purpose with a prefix like `typeshed has done`__.
344349
345350__ https://github.com/python/typeshed/issues/2491#issuecomment-578456045
346351
352+ .. _security-implications :
353+
347354Security Implications
348355=====================
349356
@@ -368,6 +375,19 @@ None at this time.
368375Rejected Ideas
369376==============
370377
378+ .. _artifact-level-association :
379+
380+ Artifact-level Namespace Association
381+ ------------------------------------
382+
383+ An earlier version of this PEP proposed that metadata be associated with
384+ individual artifacts at the point of release. This was rejected because it
385+ had the potential to cause confusion for users who would expect the namespace
386+ authorization guarantee to be at the project level based on current grants
387+ rather than the time at which a given release occurred.
388+
389+ .. _organization-scoping :
390+
371391Organization Scoping
372392--------------------
373393
@@ -398,6 +418,8 @@ packages released with the scoping would be incompatible with older tools and
398418would cause confusion for users along with frustration from maintainers having
399419to triage such complaints.
400420
421+ .. _dedicated-repositories :
422+
401423Encourage Dedicated Package Repositories
402424----------------------------------------
403425
@@ -422,6 +444,191 @@ and ``Y``. If each repository has both packages but one is malicious on ``X``
422444and the other is malicious on ``Y `` then the user would be unable to satisfy
423445their requirements without encountering a malicious package.
424446
447+ .. _provenance-assertions :
448+
449+ Exclusive Reliance on Provenance Assertions
450+ -------------------------------------------
451+
452+ The idea here [5 ]_ would be to design a general purpose way for clients to make
453+ provenance assertions to verify certain properties of dependencies, each with
454+ custom syntax. Some examples:
455+
456+ * The package was uploaded by a specific organization or user name e.g.
457+ ``pip install "azure-loganalytics from microsoft" ``
458+ * The package was uploaded by an owner of a specific domain name e.g.
459+ ``pip install "google-cloud-compute from cloud.google.com" ``
460+ * The package was uploaded by a user with a specific email address e.g.
461+ ``pip install "aws-cdk-lib from contact@amazon.com" ``
462+ * The package matching a namespace was uploaded by an authorized party (this
463+ PEP)
464+
465+ A fundamental downside is that it doesn't play well with multiple
466+ repositories. For example, say a user wants the ``azure-loganalytics `` package
467+ and wants to ensure it comes from the organization named ``microsoft ``. If
468+ Microsoft's organization name on PyPI is ``microsoft `` then a package manager
469+ that defaults to PyPI could accept ``azure-loganalytics from microsoft ``.
470+ However, if multiple repositories are used for dependency resolution then the
471+ user would have to specify the repository as part of the definition which is
472+ unrealistic for reasons outlined in the dedicated section on
473+ `asserting package owner names <asserting-package-owner-names _>`_.
474+
475+ Another general weakness with this approach is that a user attempting to
476+ perform a simple ``pip install `` without special syntax, which is the most
477+ common scenario, would already be vulnerable to malicious packages. In order to
478+ overcome this there would have to be some default trust mechanism, which in all
479+ cases would impose certain UX or resolver logic upon every tool.
480+
481+ For example, package managers could be changed such that the first time a
482+ package is installed the user would receive a confirmation prompt displaying
483+ the provenance details. This would be very confusing and noisy, especially for
484+ new users, and would be a breaking UX change for existing users. Many methods
485+ of installation wouldn't work for this scenario such as running in CI or
486+ installing from a requirements file where the user would potentially be getting
487+ hundreds of prompts.
488+
489+ One solution to make this less disruptive for users would be to manually
490+ maintain a list of trustworthy details (organization/user names, domain names,
491+ email addresses, etc.). This could be discoverable by packages providing
492+ `entry points `__ which package managers could learn to detect and which
493+ corporate environments could install by default. This has the major downside of
494+ not providing automatic guarantees which would limit the usefulness for the
495+ average user who is more likely to be affected.
496+
497+ __ https://packaging.python.org/en/latest/specifications/entry-points/
498+
499+ There are two ideas that could be used to provide automatic protection, which
500+ could be based on :pep: `740 ` attestations or a new mechanism for utilizing
501+ third-party APIs that host the metadata.
502+
503+ First, each repository could offer a service that verifies the owner of a
504+ package using whatever criteria they deem appropriate. After verification, the
505+ repository would add the details to a dedicated package that would be installed
506+ by default.
507+
508+ This would require dedicated maintenance which is unrealistic for most
509+ repositories, even PyPI currently. It's unclear how community projects without
510+ the resources for something like a domain name would be supported. Critically,
511+ this solution would cause extra confusion for users in the case of multiple
512+ repositories as each might have their own verification processes, attestation
513+ criteria and default package containing the verified details. It would be
514+ challenging to get community buy-in of every package manager to be aware of
515+ each repositories' chosen verification package and install that by default
516+ before dependency resolution.
517+
518+ Should digital attestations become the chosen mechanism, a downside is that
519+ implementing this in custom package repositories would require a significant
520+ amount of work. In the case of PyPI, the prerequisite work on
521+ `Trusted Publishing `__ and then the `PEP 740 implementation `__ itself took the
522+ equivalent of a full-time engineer one year whose time was paid for by a
523+ corporate sponsor. Other organizations are unlikely to implement similar work
524+ because simpler mechanisms make it possible to implement reproducible builds.
525+ When everything is internally managed, attestations are also not very useful.
526+ Community projects are unlikely to undertake this effort because they would
527+ likely lack the resources to maintain the necessary infrastructure themselves
528+ and moreover there are significant downsides to
529+ `encouraging dedicated package repositories <dedicated-repositories _>`_.
530+
531+ __ https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/#acknowledgements
532+ __ https://blog.trailofbits.com/2024/10/01/securing-the-software-supply-chain-with-the-slsa-framework/
533+
534+ The other idea would be to host provenance assertions externally and push more
535+ logic client-side. A possible implementation might be to specify a provenance
536+ API that could be hosted at a designated relative path like
537+ ``/provenance ``. Projects on each repository could then be configured to point
538+ to a particular domain and this information would be passed on to clients
539+ during installation.
540+
541+ While this distributed approach does impose less of an infrastructure burden on
542+ repositories, it has the potential to be a security risk. If an external
543+ provenance API is compromised, it could lead to malicious packages being
544+ installed. If an external API is down, it could lead to package installation
545+ failing or package managers might only emit warnings in which case there is no
546+ security benefit.
547+
548+ Additionally, this disadvantages community projects that do not have the
549+ resources to maintain such an API. They could use free hosting solutions such
550+ as what many do for documentation but they do not technically own the
551+ infrastructure and they would be compromised should the generous offerings be
552+ restricted.
553+
554+ Finally, while both of these theoretical approaches are not yet prescriptive,
555+ they imply assertions at the artifact level which was already a
556+ `rejected idea <artifact-level-association _>`_.
557+
558+ .. _asserting-package-owner-names :
559+
560+ Asserting Package Owner Names
561+ -----------------------------
562+
563+ This is about asserting that the package came from a specific organization or
564+ user name. It's quite similar to the
565+ `organization scoping <organization-scoping _>`_ idea except that a flat
566+ namespace is the base assumption.
567+
568+ This would require modifications to the :pep: `JSON API <691 >` of each supported
569+ repository and could be implemented by exposing extra metadata or as proper
570+ `provenance assertions <provenance-assertions _>`_.
571+
572+ As with the organization scoping idea, a new `syntax `__ would be required like
573+ ``microsoft::azure-loganalytics `` where ``microsoft `` is the organization and
574+ ``azure-loganalytics `` is the package. Although this plays well with the
575+ existing flat namespace in comparison, it retains the critical downside of
576+ being a disruption for the community with the number of changes required.
577+
578+ __ https://packaging.python.org/en/latest/specifications/dependency-specifiers/
579+
580+ A unique downside is that names are an implementation detail of repositories.
581+ On PyPI, the names of organizations are separate from user names so there is
582+ potential for conflicts. In the case of multiple repositories, users might run
583+ into cases of dependency confusion similar to the one at the end of the
584+ `Encourage Dedicated Package Repositories <dedicated-repositories _>`_
585+ rejected idea.
586+
587+ To ameliorate this, it was suggested that the syntax be expanded to also
588+ include the expected repository URL like
589+ ``microsoft@pypi.org::azure-loganalytics ``. This syntax or something like it
590+ is so verbose that it could lead to user confusion, and even worse, frustration
591+ should it gain increased adoption among those able to maintain dedicated
592+ infrastructure (community projects would not benefit).
593+
594+ The expanded syntax is an attempt to standardize resolver behavior and
595+ configuration within dependency specifiers. Not only would this be mandating
596+ the UX of tools, it lacks precedent in package managers for language ecosystems
597+ with or without the concept of package repositories. In such cases, the
598+ resolver configuration is separate from the dependency definition.
599+
600+ ======== ======== =============================================================
601+ Language Tool Resolution behavior
602+ ======== ======== =============================================================
603+ Rust Cargo Dependency resolution can be `modified `__ within
604+ ``Cargo.toml `` using the the ``[patch] `` table.
605+ JS Yarn Although they have the concept of `protocols `__ (which are
606+ similar to the URL schemes of our `direct references `__),
607+ users configure the `resolutions `__ field in the
608+ ``package.json `` file.
609+ JS npm Users can configure the `overrides `__ field in the
610+ ``package.json `` file.
611+ Ruby Bundler The ``Gemfile `` allows for specifying an
612+ `explicit source `__ for a gem.
613+ C# NuGet It's possible to `override package versions `__ by configuring
614+ the ``Directory.Packages.props `` file.
615+ PHP Composer The ``composer.json `` file allows for specifying
616+ `repository `__ sources for specific packages.
617+ Go go The ``go.mod `` file allows for specifying a `replace `__
618+ directive. Note that this is used for direct dependencies
619+ as well as transitive dependencies.
620+ ======== ======== =============================================================
621+
622+ __ https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html
623+ __ https://yarnpkg.com/protocols
624+ __ https://packaging.python.org/en/latest/specifications/version-specifiers/#direct-references
625+ __ https://yarnpkg.com/configuration/manifest#resolutions
626+ __ https://docs.npmjs.com/cli/v10/configuring-npm/package-json#overrides
627+ __ https://bundler.io/v2.5/man/gemfile.5.html#SOURCE-PRIORITY
628+ __ https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management#overriding-package-versions
629+ __ https://getcomposer.org/doc/articles/repository-priorities.md#filtering-packages
630+ __ https://go.dev/ref/mod#go-mod-file-replace
631+
425632Use Fixed Prefixes
426633------------------
427634
@@ -501,10 +708,9 @@ Footnotes
501708 Markdown files. They also have the concept of
502709 `plugins <https://www.mkdocs.org/dev-guide/plugins/ >`__ which may be
503710 developed by anyone and are usually prefixed by ``mkdocs- ``.
504- - `Datadog <https://www.datadoghq.com >`__ offers observability as a service
505- for organizations at any scale. The
506- `Datadog Agent <https://docs.datadoghq.com/agent/ >`__ ships out-of-the-box
507- with
711+ - `Datadog <https://www.datadoghq.com >`__ offers observability as a service.
712+ The `Datadog Agent <https://docs.datadoghq.com/agent/ >`__ ships
713+ out-of-the-box with
508714 `official integrations <https://github.com/DataDog/integrations-core >`__
509715 for many products, like various databases and web servers, which are
510716 distributed as Python packages that are prefixed by ``datadog- ``. There is
@@ -533,6 +739,9 @@ Footnotes
533739 `squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5 >`__
534740 and this would be useful to prevent as a `hidden grant <hidden-grants _>`__.
535741
742+ .. [5 ] `Detailed write-up <https://discuss.python.org/t/64679 >`__ of the
743+ potential for provenance assertions.
744+
536745__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
537746__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
538747__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
0 commit comments