From 38a7fb8ea1fc5ed02461f2f050b91262418977b6 Mon Sep 17 00:00:00 2001 From: jaimergp Date: Wed, 3 Sep 2025 11:52:17 +0200 Subject: [PATCH 1/3] 804: An external dependency registry and name mapping mechanism Co-authored-by: Pradyun Gedam Co-authored-by: rgommers Co-authored-by: mgorny Co-authored-by: Mike Sarahan --- peps/pep-0804.rst | 1348 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1348 insertions(+) create mode 100644 peps/pep-0804.rst diff --git a/peps/pep-0804.rst b/peps/pep-0804.rst new file mode 100644 index 00000000000..e0115af1824 --- /dev/null +++ b/peps/pep-0804.rst @@ -0,0 +1,1348 @@ +PEP: 804 +Title: An external dependency registry and name mapping mechanism +Author: Pradyun Gedam , + Ralf Gommers , + Michał Górny , + Jaime Rodríguez-Guerra , + Michael Sarahan +Discussions-To: +Status: Draft +Type: Standards Track +Topic: Packaging +Requires: 725 +Created: 03-Sep-2025 +Post-History: 03-Sep-2025, +Resolution: + + +Abstract +======== + +This PEP specifies a name mapping mechanism that allows packaging tools to map +external dependency identifiers (as introduced in :pep:`725`) to their +counterparts in other package repositories. + +Motivation +========== + +Packages on PyPI often require build-time and runtime dependencies that are not +present on PyPI. :pep:`725` introduced metadata to express +such dependencies. Using concrete external dependency metadata for +a Python package requires mapping the given dependency identifiers to the specifiers +used in other ecosystems, which would allow to: + +- Enabling tools to automatically map external dependencies to packages in other + packaging repositories/ecosystems, +- Including the needed external dependencies *with the package + names used by the relevant system package manager on the user's system* in + error messages emitted by Python package installers and build frontends, + as well as allowing the user to query for those names directly to obtain install + instructions. + +Packaging ecosystems like Linux distros, conda, Homebrew, Spack, and Nix need +full sets of dependencies for Python packages, and have tools like pyp2rpm_ +(Fedora), Grayskull_ (conda), and dh_python_ (Debian) which attempt to +automatically generate dependency information from the metadata available in +upstream Python packages. Before PEP 725, external dependencies were handled manually, +because there was no metadata for this in ``pyproject.toml`` or any other +standard metadata file. Enabling its automatic conversion is a key benefit of +this PEP, making Python packaging easier and more reliable. In addition, the +authors envision other types of tools making use of this information; e.g., +dependency analysis tools like Repology_, Dependabot_ and libraries.io_. + + +Rationale +========= + +Prior art +--------- + +The R language has a `System Requirements for R packages +`__ with a central +registry that knows how to translate external dependency metadata to install +commands for package managers like ``apt-get``. This registry centralises the +mappings for a series of Linux distributions, and also Windows. macOS is not +present. The `"Rule Coverage" of its README +`__ +used to show that this system improves the chance of success of building packages +from CRAN from source. Across all CRAN packages, +Ubuntu 18 improved from 78.1% to 95.8%, CentOS 7 from 77.8% to 93.7% and openSUSE +15.0 from 78.2% to 89.7%. The chance of success depends on how well the registry +is maintained, but the gain is significant: ~4x fewer packages fail to build on +Ubuntu and CentOS in a Docker container. + +RPM-based distributions, like Fedora, can use a `rule-based implementation +`__ +(``NameConvertor``) in pyp2rpm_. The main rule is that the RPM name for a PyPI package is +``f"python-{pypi_package_name}"``. This seems to work quite well; there are a +few variants like Python version specific names, where the prefix contains the +Python major and minor version numbers (e.g. ``python311-`` instead of +``python-``). + +Gentoo follows a similar approach to naming Python packages, using the ``dev-python/`` +category and some `well-specified rules `__. + +Conda-forge has a more explicit name mapping, because the base names are the +same in conda-forge as on PyPI (e.g., ``numpy`` maps to ``numpy``), but there +are many exceptions because of both name collisions and renames (e.g., the PyPI +name for PyTorch is ``torch`` while in conda-forge it's ``pytorch``). There are +several name mappings efforts maintained by different teams. Conda-forge's infrastructure +generates one in `regro/cf-graph-countyfair `__. +Grayskull maintains `its own curated mapping `__. +Prefix.dev created the `parselmouth mappings `__ +to support conda and PyPI integrations in their tooling. A more complete overview of +their approaches, strengths and weaknesses can be found in +`conda/grayskull#564 `__. + +The `OpenStack `__ ecosystem also needs to deal with +some mapping efforts. All of them focus on Linux distributions, exclusively. +`pkg-map `__ +accompanies ``diskimage-builder`` and provides a file format where the user defines +arbitrary variable names and their corresponding names in the target distro +(Red Hat, Debian, OpenSUSE, etc). See `example for PyYAML `__. +`bindep `__ defines a file ``bindep.txt`` +(see `example `__) +where users can write down dependencies that are not installable from PyPI. The format is +line-based, with each line containing a dependency as found in the Debian ecosystem. +For other distributions, it offers a "filters" syntax between square brackets where users +can indicate other target platforms, optional dependencies and extras. + +The need for mappings is also found in other ecosystems like `SageMath `__, +but also by end-users themselves who want to install PyPI packages with their system +package manager of choice (`example StackOverflow question `__). + + +Governance and maintenance costs of name mappings +------------------------------------------------- + +The maintenance cost of external dependency mappings to a large number of packaging +ecosystems is potentially high. We choose to define the registry in such +a way that: + +- A central authority maintains the list of recognized DepURLs and the + known ecosystem mappings. +- The mappings themselves are maintained by the target packaging ecosystems. + +Hence this system is opt-in for a given ecosystem, and the associated +maintenance costs are distributed. + +Generating package manager-specific install commands +---------------------------------------------------- + +Python package authors with external dependencies usually have installation +instructions for those external dependencies in their documentation. These +instructions are difficult to write and keep up-to-date, and are usually only +covering one or at most a handful of platforms. As an example, here are SciPy's +instructions for its external build dependencies (C/C++/Fortran compilers, +OpenBLAS, pkg-config): + +- Debian/Ubuntu: ``sudo apt install -y gcc g++ gfortran libopenblas-dev liblapack-dev pkg-config python3-pip python3-dev`` +- Fedora: ``sudo dnf install gcc-gfortran python3-devel openblas-devel lapack-devel pkgconfig`` +- CentOS/RHEL: ``sudo yum install gcc-gfortran python3-devel openblas-devel lapack-devel pkgconfig`` +- Arch Linux: ``sudo pacman -S gcc-fortran openblas pkgconf`` +- Homebrew on macOS: ``brew install gfortran openblas pkg-config`` + +The package names vary a lot, and there are differences like some distros +splitting off headers and other build-time dependencies in a separate +``-dev``/``-devel`` package while others do not. With the registry in this PEP, +this could be made both more comprehensive and easier to maintain through a tool +command with semantics of *"show this ecosystem's preferred package manager +install command for all external dependencies"*. This may be done as a +standalone tool, or as a new subcommand in any Python development workflow tool +(e.g., Pip, Poetry, Hatch, PDM, uv). + +To this end, each ecosystem mapping can provide a list of package managers +known to be compatible, with templated instructions on how to install and query +packages. The provided install command templates are paired with query command templates +so those tools can check whether the needed packages are already present without +having to attempt an install operation (which might be expensive and have unintended +side effects like version upgrades). + +Registry design +--------------- + +The mapping infrastructure has been designed to present the following components and properties: + +- A central registry of PEP 725 identifiers (DepURLs), including at least the + well-known generic and virtual identifiers considered canonical. +- A list of known ecosystems, where ecosystem maintainers can register their name mapping(s). +- A standardized schema that defines how mappings should be structured. Each mapping can + also provide programmatic details about how their supported package manager(s) work. + +The above documents are provided as JSON files validated by accompanying JSON schemas. +A Python library and CLI is provided to query and utilize these resources. The user can +configure which system package manager they prefer to use for the default package mappings +and command generation (e.g. a user on Ubuntu may prefer ``conda``, ``brew`` or ``spack`` +instead of ``apt`` as their package manager of choice to provide external dependencies). + + +Specification +============= + +Three schemas are proposed: + +1. A central registry of known DepURLs, as introduced in PEP 725. +2. A list of known ecosystems and the canonical URL for their mappings. +3. The ecosystem-specific mappings of DepURLs to their + corresponding ecosystem specifiers, plus details of their package manager(s). + +Central registry +---------------- + +The central registry defines which identifiers are recognized as canonical, +plus known aliases. Each entry MUST provide a valid DepURL in the +field ``id``, with an optional free form ``description`` text. Additionally +an entry MAY refer to another entry via its ``provides`` field, which takes +a string or a list of strings already defined as ``id`` in the registry. This is useful +for both aliases (e.g. ``dep:generic/arrow`` and ``dep:github/apache/arrow``) and +concrete implementations of a ``dep:virtual/`` entry (e.g. ``dep:generic/gcc`` +would provide ``dep:virtual/compiler/c``). Entries without ``provides`` content +or, if populated, only with ``dep:virtual/`` identifiers, are considered +canonical. The ``provides`` field MUST NOT be present in ``dep:virtual/`` definitions. + +Having a central registry enables the validation of the ``[external]`` table. +All involved tools MUST check that the provided identifiers are well formed. +Additionally, some tools MAY check whether the identifiers in use are recognized as +canonical. More specifically: + +- Build backends, build frontends, and installers SHOULD NOT do any validation + of identifiers being canonical by default. +- Uploaders like ``twine`` SHOULD validate if the identifiers are canonical + and warn or report an error to the user, with opt-out mechanisms. They + SHOULD suggest a canonical replacement, if available. +- Index servers like PyPI MAY perform the same validation as the uploaders and + reject the artifact if necessary. + +This registry SHOULD also centralize authoritative decisions about its +contents, such as which entry of a collection of aliases is preferred as +canonical, or which versioning scheme applies to virtual DepURLs (see Appendix +B). The corresponding answers are not given in this PEP; instead we delegate +that responsibility to the central registry maintainers. + +Mappings +-------- + +The mappings specify which ecosystem-specific identifiers provide the canonical +entries available in the central registry. A mapping mainly consists of a list +of dictionaries, in which each entry consists of: + +- an ``id`` field with the canonical DepURL. + +- an optional free form ``description`` text. + +- a ``specs`` field whose value MUST be one of: + + - a dictionary with three keys (``build``, ``host``, ``run``). The values + MUST be a string or list of strings representing the ecosystem-specific package + identifiers as needed as build-, host- and runtime dependencies (see PEP 725 for + details on these definitions). + + - for convenience, a string or a list of strings are also accepted as a + shorthand form. In this case, the identifier(s) will be used to populate + the three categories mentioned in the item above. + + - an empty list, which is understood as the ecosystem not having packages to + provide such dependency. + +- a ``specs_from`` field whose value is a DepURL from which the ``specs`` + field will be imported. Either ``specs`` or ``specs_from`` MUST be present. + +- an optional ``urls`` field whose value MUST be a URL, a list of URLs, or a + dictionary that maps a string to a URL. This is useful to link to external + resources that provide more information about the mapped packages. + +The mappings SHOULD also specify another section ``package_managers``, reporting +which package managers are available in the ecosystem and how to use them. This field MUST +take a list of dictionaries, with each of them reporting the following fields: + +- ``name`` (string), unique identifier for this package manager. Usually, the executable name. +- ``commands`` (list of dictionaries), the commands to run to install the mapped package(s) and + check whether they are already installed. +- ``specifier_syntax``: instructions on how to map a subset of PEP 440 specifiers to + the target package manager. Three levels of support are offered: name-only, exact-version-only, + and version-range compatibility (with per-operator translations). + +Each mapping MUST have a canonical URL for online retrieval. These mappings +MAY also be packaged for offline distribution in each platform. The authors +recommend placing in the standard location for data artifacts in each operating +system; e.g. ``$XDG_DATA_DIRS`` on Linux and others, ``~/Library/Application Support`` on +macOS, and ``%LOCALAPPDATA%`` for Windows. The subdirectory identifier MUST +be ``external-packaging-metadata-mappings``. This data directory SHOULD only +contain mapping documents named ``{ecosystem-identifier}.mapping.json``. The central +registry and known ecosystem documents MAY also be distributed in this directory, +as ``registry.json`` and ``known-ecosystems.json``, respectively. + +Known ecosystems +---------------- + +The list of known ecosystems has two roles: + +1. Reporting the canonical URL for its mapping. +2. Assigning a short identifier to each ecosystem. This is the identifier + that MUST be used in the mapping filenames mentioned above so they can be + found in the local filesystem. + +For ecosystems corresponding to Linux distributions, the identifier MUST be the +one reported by their `os-release `__ +``ID`` parameter. For other ecosystems, it MUST be decided during the submission to +the list of known ecosystems document. It MUST only use the characters allowed in +``os-release``'s ``ID`` field, as per this regex ``[a-z0-9\-_.]+``. + +Schema details +-------------- + +Three JSON Schema documents are provided to fully standardize the registries and mappings. + +Central registry schema +^^^^^^^^^^^^^^^^^^^^^^^ + +The central registry is specified by the following +`JSON schema `__: + +``$schema`` +~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``string`` + * - Description + - URL of the definition list schema in use for the document. + * - Required + - False + +``schema_version`` +~~~~~~~~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``integer`` + * - Required + - False + +``definitions`` +~~~~~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``array`` + * - Description + - List of DepURLs currently recognized. + * - Required + - True + +Each entry in this list is defined as: + +.. list-table:: + :header-rows: 1 + :widths: 20 25 40 15 + + * - Field + - Type + - Description + - Required + * - ``id`` + - ``DepURLField`` (``string`` matching regex ``^dep:.+$``) + - DepURL + - True + * - ``description`` + - ``string`` + - Free-form field to add some details about the package. Allows Markdown. + - False + * - ``provides`` + - ``DepURLField | list[DepURLField]`` + - List of identifiers this entry connects to. + Useful to annotate aliases or virtual package implementations. + - False + * - ``urls`` + - ``AnyUrl | list[AnyUrl] | dict[NonEmptyString, AnyUrl]`` + - Hyperlinks to web locations that provide more information about the definition. + - False + +Known ecosystems schema +^^^^^^^^^^^^^^^^^^^^^^^ + +The known ecosystems list is specified by the following +`JSON Schema `__: + +``$schema`` +~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``string`` + * - Description + - URL of the mappings schema in use for the document. + * - Required + - False + +``schema_version`` +~~~~~~~~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``integer`` + * - Required + - False + +``ecosystems`` +~~~~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``dict`` + * - Description + - Ecosystems names and their corresponding details. + * - Required + - True + +This dictionary maps non-empty string keys referring to the ecosystem identifiers +to a sub-dictionary defined as: + +.. list-table:: + :header-rows: 1 + :widths: 20 25 40 15 + + * - Key + - Value type + - Value description + - Required + * - ``Literal['mapping']`` + - ``AnyURL`` + - URL to the mapping for this ecosystem + - True + +Mappings schema +^^^^^^^^^^^^^^^ + +The mappings are specified by the following +`JSON Schema `__: + +``$schema`` +~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``string`` + * - Description + - URL of the mappings schema in use for the document. + * - Required + - False + +``schema_version`` +~~~~~~~~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``integer`` + * - Required + - False + +``name`` +~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``string`` + * - Description + - Name of the schema + * - Required + - True + +``description`` +~~~~~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``string | None`` + * - Description + - Free-form field to add information this mapping. Allows + Markdown. + * - Required + - False + +``mappings`` +~~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``array`` + * - Description + - List of DepURL-to-specs mappings. + * - Required + - True + +Each entry in this list is defined as: + +.. list-table:: + :header-rows: 1 + :widths: 20 25 40 15 + + * - Field + - Type + - Description + - Required + * - ``id`` + - ``DepURLField`` (``string`` matching regex ``^dep:.+$``) + - DepURL, as provided in the central registry + - True + * - ``description`` + - ``string`` + - Free-form field to add some details about the package. Allows Markdown. + - False + * - ``urls`` + - ``AnyUrl | list[AnyUrl] | dict[NonEmptyString, AnyUrl]`` + - Hyperlinks to web locations that provide more information about the definition. + - False + * - ``specs`` + - ``string | list[string] | dict[Literal['build', 'host', 'run'], string | list[string]]`` + - Ecosystem-specific identifiers for this package. The full form is a dictionary + that maps the categories ``build``, ``host`` and ``run`` to their corresponding + package identifiers. As a shorthand, a single string or a list of strings can be + provided, in which case will be used to populate the three categories identically. + - Either ``specs`` or ``specs_from`` MUST be present. + * - ``specs_from`` + - ``DepURLField`` (``string`` matching regex ``^dep:.+$``) + - Take specs from another mapping entry. + - Either ``specs`` or ``specs_from`` MUST be present. + * - ``extra_metadata`` + - ``dict[NonEmptyString, Any]`` + - Free-form key-value store for arbitrary metadata. + - False + +``package_managers`` +~~~~~~~~~~~~~~~~~~~~ + +.. list-table:: + :widths: 25 75 + + * - Type + - ``array`` + * - Description + - List of tools that can be used to install packages in this + ecosystem. + * - Required + - True + +Each entry in this list is defined as a dictionary with these fields: + +.. list-table:: + :header-rows: 1 + :widths: 20 25 40 15 + + * - Field + - Type + - Description + - Required + * - ``name`` + - ``string`` + - Short identifier for this package manager (usually the command name) + - True + * - ``commands`` + - ``dict[Literal['install', 'query'], dict[Literal['command', 'requires_elevation', 'multiple_specifiers'], list[str] | bool | Literal['always', 'name-only', 'never']]]`` + - Commands used to install or query the given package(s). Only two keys + are allowed: ``install`` and ``query``. Their value is a dictionary + with: + + - a required key ``command`` that takes a list of strings + (as expected by ``subprocess.run``). + + - an optional ``requires_elevation`` boolean (``False`` by default) + to indicate whether the command must run with elevated permissions + (e.g. administrator on Windows, superuser on Linux and macOS). + + - an enum ``multiple_specifiers`` that determines whether the command + accepts multiple package specifiers at the same time, accepting one of: + + - ``always``, default in ``install``. + + - ``name-only``, the command only accepts multiple specifiers if they do + not contain version constraints. + + - ``never``, default in ``query``. + + Exactly one of the ``command`` items MUST include a ``{}`` placeholder, + which will be replaced by the mapped package identifier(s). The + ``install`` command SHOULD support the placeholder being replaced by + multiple identifiers, ``query`` MUST only receive a single identifier + per command. + - True + * - ``specifier_syntax`` + - ``dict[Literal['name_only', 'exact_version', 'version_ranges'], None | list[str] | dict[Literal['and', 'equal', 'greater_than', 'greater_than_equal', 'less_than', 'less_than_equal', 'not_equal', 'syntax'], None | str | list[str]]`` + - Mapping of allowed PEP440 version specifiers to the syntax used in this + package manager. Three top-level keys are expected and required: + + - ``name_only`` MUST take a list of strings as the syntax used for specifiers + that do not contain any version information; it MUST include the placeholder + ``{name}``. + + - ``exact_version`` MUST be ``None`` or a list of strings that describe + the syntax used for specifiers that only express exact version + constraints; in the latter case, the placeholders ``{name}`` + and ``{version}`` MUST be present in at least one of the strings + (although not necessary the same string for both). + + - ``version_ranges`` MUST be ``None`` or a dictionary with the + following required keys: + + - the key ``syntax`` takes a list of strings where at least one MUST + include the ``{ranges}`` placeholder (to be replaced by the + maybe-joined version constraints, as determined by the value of + ``and``). They MAY also include the ``{name}`` placeholder. + + - the keys ``equal``, ``greater_than``, ``greater_than_equal``, + ``less_than``, ``less_than_equal``, and ``not_equal`` take a string + if the operator is supported, ``None`` otherwise. In the former case, + the value MUST include the ``{version}`` placeholder, and MAY include + ``{name}``. + + - the key ``{and}`` takes a string used to join multiple version + constraints in a single token, or ``None`` if only a single + constraint can be used per token. In the latter case, the different + constraints will be "exploded" into several tokens using the + ``syntax`` template. + + When ``exact_version`` or ``version_ranges`` are set to ``None``, it + indicates that the respective types of specifiers are not supported + by the package manager. + + - True + + +Examples +-------- + +Registry, known ecosystems and mappings +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A simplified registry would look like this: + +.. code-block:: js + + { + "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/central-registry.schema.json", + "schema_version": 1, + "definitions": [ + { + "id": "dep:generic/zlib", + "description": "A Massively Spiffy Yet Delicately Unobtrusive Compression Library" + }, + { + "id": "dep:generic/libwebp", + "description": "WebP codec is a library to encode and decode images in WebP format. This package contains the library that can be used in other programs to add WebP support" + }, + { + "id": "dep:generic/clang", + "description": "Language front-end and tooling infrastructure for languages in the C language family for the LLVM project." + } + ] + } + +A minimal list of known ecosystems with a single entry would look like this: + +.. code-block:: js + + { + "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/known-ecosystems.schema.json", + "schema_version": 1, + "ecosystems": { + "conda-forge": { + "mapping": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/refs/heads/main/data/conda-forge.mapping.json" + } + } + +That hypothetical conda-forge mapping (``conda-forge.mapping.json``), with only a couple entries +for brevity, could look like: + +.. code-block:: js + + { + "schema_version": 1, + "name": "conda-forge", + "description": "Mapping for the conda-forge ecosystem", + "mappings": [ + { + "id": "dep:generic/zlib", + "description": "zlib data compression library for the next generation systems. From zlib-ng/zlib-ng.", + "specs": "zlib-ng", // Simplest form + "urls": { + "feedstock": "https://github.com/conda-forge/zlib-ng-feedstock" + } + }, + { + "id": "dep:generic/libwebp", + "description": "WebP image library. libwebp-base ships libraries; libwebp ships the binaries.", + "specs": { // expanded form with single spec per category + "build": "libwebp", + "host": "libwebp-base", + "run": "libwebp" + }, + "urls": { + "feedstock": "https://github.com/conda-forge/libwebp-feedstock" + } + }, + { + "id": "dep:generic/clang", + "description": "Development headers and libraries for Clang", + "specs": { // expanded form with specs list + "build": [ + "clang", + "clangxx" + ], + "host": [ + "clangdev" + ], + "run": [ + "clang", + "clangxx", + "clang-format", + "clang-tools" + ] + }, + "urls": { + "feedstock": "https://github.com/conda-forge/clangdev-feedstock" + } + }, + ], + "package_managers": [ + { + "name": "conda", + "commands": { + "install": { + "command": [ + "conda", + "install", + "{}" + ], + "multiple_specifiers": "always", + "requires_elevation": false, + }, + "query": { + "command": [ + "conda", + "list", + "-f", + "{}" + ], + "multiple_specifiers": "never", + "requires_elevation": false, + } + }, + "specifier_syntax": { + "exact_version": [ + "{name}=={version}" + ], + "name_only": [ + "{name}" + ], + "version_ranges": { + "and": ",", + "equal": "={version}", + "greater_than": ">{version}", + "greater_than_equal": ">={version}", + "less_than": "<{version}", + "less_than_equal": "<={version}", + "not_equal": "!={version}", + "syntax": [ + "{name}{ranges}" + ] + } + } + } + ] + } + +The following repository provides examples of how these schemas *could* look like in real cases. +They are not meant to be prescriptive, but just illustrative of how to apply these schemas: + +- `Central registry `__. + +- `Known ecosystems `__. + +- Mappings: + + - `Arch-linux `__. + + - `Chocolatey `__. + + - `Conan `__. + + - `Conda-forge `__. + + - `Fedora `__. + + - `Gentoo `__. + + - `Homebrew `__. + + - `Nix `__. + + - `PyPI `__. + + - `Scoop `__. + + - `Spack `__. + + - `Ubuntu `__. + + - `Vcpkg `__. + + - `Winget `__. + + +pyproject-external CLI +^^^^^^^^^^^^^^^^^^^^^^ + +The following examples illustrate how the name mapping mechanism may be used. +They use the CLI implemented as part of the ``pyproject-external`` package. + +Say we have cloned the source of a Python package named ``my-cxx-pkg`` with a +single extension module, implemented in C++, linking to ``zlib``, using ``pybind11``, +plus ``meson-python`` as the build backend: + +.. code:: toml + + [build-system] + build-backend = 'mesonpy' + requires = [ + "meson-python>=0.13.1", + "pybind11>=2.10.4", + ] + + [external] + build-requires = [ + "dep:virtual/compiler/cxx", + ] + host-requires = [ + "dep:generic/zlib", + ] + +With complete name mappings for ``apt`` on Ubuntu, this may then show the +following: + +.. code:: bash + + # show all external dependencies as DepURLs + $ python -m pyproject_external show . + [external] + build-requires = [ + "dep:virtual/compiler/cxx", + ] + host-requires = [ + "dep:generic/zlib", + ] + + # show all external dependencies, but mapped to the autodetected ecosystem + $ python -m pyproject_external show --output=mapped . + [external] + build_requires = [ + "g++", + "python3", + ] + host_requires = [ + "zlib1g", + "zlib1g-dev", + ] + + # show how to install external dependencies + $ python -m pyproject_external show --output=command . + sudo apt install --yes g++ zlib1g zlib1g-dev python3 + +We have not yet run those install commands, so the external dependency may be +missing. If we get a build failure, the output may look like: + +.. code:: + + $ pip install . + ... + × Encountered error while generating package metadata. + ╰─> See above for output. + + note: This is an issue with the package mentioned above, not pip. + + This package has the following external dependencies, if those are missing + on your system they are likely to be the cause of this build failure: + + dep:virtual/compiler/cxx + dep:generic/zlib + +If Pip has implemented support for querying the name mapping registry, the end +of that message could improve to: + +.. code:: bash + + The following external dependencies are needed to install the package + mentioned above. You may need to install them with `apt`: + + g++ + zlib1g + zlib1g-dev + +If the user wants to use conda packages and the ``mamba`` package manager to +install external dependencies, they may specify that in their +``~/.config/pyproject-external/config.toml`` (or equivalent) file: + +.. code:: toml + + preferred_package_manager = "mamba" + +This will then change the output of ``pyproject-external``: + +.. code:: bash + + $ python -m pyproject_external show --output command . + mamba install --yes --channel=conda-forge --channel-priority=strict cxx-compiler zlib python + + +The ``pyproject-external`` CLI also provides a simple way to perform +``[external]`` table validation against the central registry to check +whether the identifiers are considered canonical or not: + +.. code-block:: bash + + $ python -m pyproject_external show --validate grpcio-1.71.0.tar.gz + WARNING Dep URL 'dep:virtual/compiler/cpp' is not recognized in the + central registry. Did you mean any of ['dep:virtual/compiler/c', + 'dep:virtual/compiler/cxx', 'dep:virtual/compiler/cuda', + 'dep:virtual/compiler/go', 'dep:virtual/compiler/c-sharp']? + [external] + build-requires = [ + "dep:virtual/compiler/c", + "dep:virtual/compiler/cpp", + ] + + +pyproject-external API +^^^^^^^^^^^^^^^^^^^^^^ + +The ``pyproject-external`` Python API also allows users to do these operations programmatically: + +.. code-block:: python + + >>> from pyproject_external import External + >>> external = External.from_pyproject_data( + { + "external": { + "build-requires": [ + "dep:virtual/compiler/c", + "dep:virtual/compiler/cpp", + ] + } + } + ) + >>> external.validate() + Dep URL 'dep:virtual/compiler/cpp' is not recognized in the central registry. Did you + mean any of ['dep:virtual/compiler/c', 'dep:virtual/compiler/cxx', + 'dep:virtual/compiler/cuda', 'dep:virtual/compiler/go', 'dep:virtual/compiler/c-sharp']? + >>> external = External.from_pyproject_data( + { + "external": { + "build-requires": [ + "dep:virtual/compiler/c", + "dep:virtual/compiler/cxx", # fixed + ] + } + } + ) + >>> external.validate() + >>> external.to_dict() + {'external': {'build_requires': ['dep:virtual/compiler/c', 'dep:virtual/compiler/cxx']}} + >>> from pyproject_external import detect_ecosystem_and_package_manager + >>> ecosystem, package_manager = detect_ecosystem_and_package_manager() + >>> ecosystem + 'conda-forge' + >>> package_manager + 'pixi' + >>> external.to_dict(mapped_for=ecosystem, package_manager=package_manager) + {'external': {'build_requires': ['c-compiler', 'cxx-compiler', 'python']}} + >>> external.install_command(ecosystem, package_manager=package_manager) + # {"command": ["pixi", "add", "{}"]} + ['pixi', 'add', 'c-compiler', 'cxx-compiler', 'python'] + >>> external.query_commands(ecosystem, package_manager=package_manager) + # {"command": ["pixi", "list", "{}"]} + [ + ['pixi', 'list', 'c-compiler'], + ['pixi', 'list', 'cxx-compiler'], + ['pixi', 'list', 'python'], + ] + +Grayskull +^^^^^^^^^ + +A prototype proof of concept implementation was contributed to Grayskull, a conda recipe generator for +Python packages, via `conda/grayskull#518 `__. + +In order to use the name mappings for the recipe generator of our package, we +can now run Grayskull_: + +.. code:: + + $ grayskull pypi my-cxx-pkg + #### Initializing recipe for my-cxx-pkg (pypi) #### + + Recovering metadata from pypi... + Starting the download of the sdist package my-cxx-pkg + my-cxx-pkg 100% Time: 0:00:10 5.3 MiB/s|###########| + Checking for pyproject.toml + ... + + Build requirements: + - python # [build_platform != target_platform] + - cross-python_{{ target_platform }} # [build_platform != target_platform] + - meson-python >= 0.13.1 # [build_platform != target_platform] + - pybind11 >= 2.10.4 # [build_platform != target_platform] + - ninja # [build_platform != target_platform] + - libboost-devel # [build_platform != target_platform] + - {{ compiler('cxx') }} + Host requirements: + - python + - meson-python >=0.13.1 + - pybind11 >=2.10.4 + - ninja + - libboost-devel + Run requirements: + - python + + #### Recipe generated on /path/to/recipe/dir for my-cxx-pkg #### + + + +Backwards Compatibility +======================= + +There is no impact on backwards compatibility. + + +Security Implications +===================== + +This proposal does not impose any security implications on existing projects. +The proposed schemas, registries and mappings are available resources for downstream +tooling to use at their own will, in whatever way they find suitable. + +We do have some recommendations for future implementors. The mapping schema +proposes fields to encode instructions for command execution +(``package_managers[].commands``). A tampered mapping may change these +instructions into something else. Hence, tools should not rely on internet +connectivity to fetch the mappings from their online sources. Instead: + +- they should vendor the relevant documents in the distributed packages, +- or depend on prepackaged, offline distributions of these documents, +- or implement best-practices for authenticity verification of the fetched documents. + +The install commands have the potential to modify the system configuration of the user. +When available, tools should prefer creating ephemeral, isolated environments for the +installation of external dependencies. If the ecosystem lacks that feature natively, +other solutions like containerization may be used. At the very least, informative messaging +of the impact of the operation should be provided. + +How to Teach This +================= + +There are at least four audiences that may need to get familiar with the contents of this PEP: + +1. Central registry maintainers, who are responsible for curating the list of + well-known DepURLs and mapped ecosystems. +2. Packaging ecosystem maintainers, who are responsible for keeping the + mapping for their ecosystem up-to-date. +3. Maintainers of Python projects that require external dependencies. +4. End users of packages that have external dependency metadata. + +Central DepURL registry maintainers +----------------------------------- + +Central DepURL registry maintainers curate the collection of DepURLs and the +known ecosystems. These contributors need to be able to refer to clearly +defined rules for when a new DepURL can be defined. It is undesirable to be +loose with canonical DepURL definitions, because each definition added increases +maintenance effort in the mappings in the target ecosystems. + +The central registry maintainers should agree on the ground rules and write them +down as part of the repository documentation, perhaps supported by additional +affordances like issue and pull request templates, or linting tools. + +Package ecosystem maintainers usage +----------------------------------- + +Missing mapping entries will result in the absence tailored error messages and +other UX affordances for end users of the impacted ecosystems. It is thus +recommended that each package ecosystem keeps their mappings up-to-date with +the central registry. The key to this will be automation, like linting scripts +(see example at `external-metadata-mappings `__), +or periodic notifications via issues or draft submissions. + +Establishing the initial mapping is likely to involve a lot of work, but ideally the maintenance on an ongoing basis effort should require smaller effort. + +As best practices are discovered and agreed on, they should get documented +in the central registry repository as learning materials for the mapping +maintainers. + +Maintainers of Python projects +------------------------------ + +A package maintainer's responsibility is to decide the DepURL that best +represents the external dependency that their package needs. This is covered +in :pep:`725`; the interactive mappings browser demo located at +`external-metadata-mappings.streamlit.app `__ +may come handy. The central registry documentation may include examples and +frequently asked questions to guide newcomers with their decisions. + +If no suitable DepURL is available for a given dependency, maintainers may +consider submitting a request in the central registry. Instructions on how to do +this should be provided as part of the central registry documentation. + +End user package consumers +-------------------------- + +There will be no change in the user experience by default. This is particularly +true if the user only relies on wheels, since the only impact will be driven by +external runtime dependencies (expected to be rare), and even in those cases +they need to opt-in by installing a compatible tool. + +Users that do opt-in may find missing entries in for their target ecosystems, for +which they should obtain informative error messages that point to the relevant +documentation sections. This will allow them to get acquainted with the nature +of the issue and its potential solutions. + +We hope that this results in a subset of them reporting the missing entries, +submitting a fix to the affected mapping or, if totally absent, even deciding +to maintain a new one on their own. To that end, they should get familiar with +the responsibilties of mapping maintainers (discussed above). + +Reference Implementation +======================== + +A reference implementation should include three components: + +1. A central registry that captures at a minimum a DepURL and its description. This registry MUST + NOT contain specifics of package ecosystem mappings. +2. A standard specification for a collection of mappings. JSON Schema is widely used for schema + in many text editors, and would be a natural choice for expression of the standard specification. +3. An implementation of (2), providing mappings from the contents of the central + registry to the ecosystem-specific package names. + +For (1), the JSON Schema is defined at `central-registry.schema.json `__. +An example registry can be found at `registry.json `__. +For (2), the JSON Schema is defined at `external-mapping.schema.json `__. +A collection of example mappings for a sample of packages can be found at `external-metadata-mappings `__. +For (3), the JSON Schema is defined at `known-ecosystems.schema.json `__. +An example list can be found at `known-ecosystems.json `__. +The JSON Schemas are created with `these Pydantic models `__. + +The reference CLI and Python API to consume the different JSON documents and ``[external]`` tables +can be found in `pyproject-external `__. + +Rejected Ideas +============== + +Centralized mappings governed by the same body +---------------------------------------------- + +While a central authority for the registry is useful, the maintenance burden +of handling the mappings for multiple ecosystems is unfeasible at the scale of PyPI. +Hence, we propose that the central authority only governs the central registry and +the list of known ecosystems, while the maintenance of the mappings themselves is handled +by the target ecosystems. + +Allowing ecosystem-specific variants of packages +------------------------------------------------ + +Some ecosystems have their own variants of known packages; e.g. Debian's +``libsymspg2-dev``. While an identifier such as ``dep:debian/libsymspg2-dev`` +is syntactically valid, the central registry should not recognize it as a +well-known identifier, preferring its ``generic`` counterpart instead. Users +may still choose to use it, but tools may warn about it and suggest using the +generic one. This is meant to encourage ecosystem-agnostic metadata whenever +possible to facilitate adoption across platforms and operating systems. + +Adding more package metadata to the central registry +---------------------------------------------------- + +A central registry should only contain a list of DepURLs and a +minimal set of metadata fields to facilitate its identification (a free-form +text description, and one or more URLs to relevant locations). + +We have chosen to leave additional details out of the central registry, and instead +suggest external contributors to maintain their own mappings where they can +annotate the identifiers with extra metadata via the free-form ``extra_metadata`` field. + +The reasons include: + +- The existing fields should be sufficient to identify the project home, + where that extra metadata can be obtained (e.g. the repository at the URL will likely + include details about authorship and licensing). +- These details can also be obtained from the actual target ecosystems. In some + cases this might even be preferable; e.g., for licenses, where downstream packaging + can actually affect it by unvendoring dependencies or adjusting optional bits. +- Those details may change over the lifetime of the project, and keeping them + up-to-date would increase the maintenance burden on the governance body. +- Centralizing additional metadata would hence introduce ambiguities and + discrepancies across target ecosystems, where different versions may be + available or required. + + +Mapping PyPI projects to repackaged counterparts in target ecosystems +--------------------------------------------------------------------- + +It is common that other ecosystems redistribute Python projects with their own +packaging system. While this is required for packages with compiled extensions, it +is theoretically unnecessary for pure Python wheels; the only need for this seems to +be metadata translation. See `Wanting a singular packaging tool/vision #68 `__, +`Wanting a singular packaging tool/vision #103 `__, +and `spack/spack#28282 `__ +for examples of discussions in this direction. + +The proposals in this PEP do not consider PyPI -> *ecosystem* mappings, but +the same schemas can be repurposed to that end. After all, it is trivial to build a PURL or +DepURL from a PyPI name (e.g. ``numpy`` becomes ``pkg:pypi/numpy``). A hypothetical +mapping maintainer could annotate their repackaging efforts with the source PURL identifier, +and then use that metadata to generate compatible mappings, such as: + +.. code:: json + + { + "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/external-mapping.schema.json", + "schema_version": 1, + "name": "PyPI packages in Ubuntu 24.04", + "description": "PyPI mapping for the Ubuntu 24.04 LTS (Noble) distro", + "mappings": [ + { + "id": "dep:pypi/numpy", + "description": "The fundamental package for scientific computing with Python", + "specs": ["python3-numpy"], + "urls": { + "home": "https://numpy.org/" + } + } + ] + } + +Such a mapping would allow downstream redistribution efforts to focus on the +compiled packages and instead delegate pure wheels to Python packaging +solutions directly. + +Strict validation of identifiers +-------------------------------- + +The central registry provides a list of canonical identifiers, which may tempt +implementors into ensuring that all supplied identifiers are indeed canonical. We +have decided to only *recommend* this practice for some tool categories, but in no +case *require* such checks. + +It is expected that as the ``[external]`` metadata tables are adopted by the +packaging community, the *canonical* identifier list grows to accommodate the +requirements found in different projects. For example, a new C++ library or a +new language compiler are introduced. + +If validation is made too strict and rejects unknown identifiers, this would +introduce unnecessary friction in the external metadata adoption, and require +human interaction to review and accept the newly requested identifiers in +a time-critical manner, potentially blocking publication of the package +that needs a new identifier added to the central registry. + +We suggest simply checking that the provided identifiers are well-formed. Future +work may choose to also enforce that the identifiers are recognized as canonical, +once the central registry has matured with significant adoption. + +Open Issues +=========== + +None at this time. + +References +========== + +- https://github.com/jaimergp/pyproject-external +- https://github.com/rgommers/external-deps-build +- https://github.com/jaimergp/external-metadata-mappings +- https://github.com/conda/grayskull/pull/518 + +Appendix A: Operational suggestions +=================================== + +In contrast with the ecosystem mappings, the central registry and the list of known +ecosystems need to be maintained by a central authority. The authors propose to: + +- Host the ``external-metadata-mappings`` and ``pyproject-external`` repositories under the PyPA_ + GitHub organization (or equivalent as per :pep:`772`). +- Create a maintainers team for these two repositories, seeded with the authors of this PEP and + regulated as per :pep:`772`. + +Appendix B: Virtual versioning proposal +======================================= + +While virtual dependencies can be versioned with the same syntax as non-virtual +dependencies, its meaning can be ambiguous (e.g. there can be multiple +implementations, and virtual interfaces may not be unambiguously versioned). +Below we provide some suggestions for the central registry maintainers to +consider when standardizing such meaning: + +- OpenMP: has regular ``MAJOR.MINOR`` versions of its standard, so would look + like ``>=4.5``. +- BLAS/LAPACK: should use the versioning used by `Reference LAPACK`_, which + defines what the standard APIs are. Uses ``MAJOR.MINOR.MICRO``, so would look + like ``>=3.10.0``. +- Compilers: these implement language standards. For C, C++ and Fortran these + are versioned by year. In order for versions to sort correctly, we recommend + using the full year (four digits). So "at least C99" would be ``>=1999``, and + selecting C++14 or Fortran 77 would be ``==2014`` or ``==1977`` respectively. + Other languages may use different versioning schemes. These should be + described somewhere before they are used in ``pyproject.toml``. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + +.. _PyPI: https://pypi.org +.. _core metadata: https://packaging.python.org/specifications/core-metadata/ +.. _setuptools: https://setuptools.readthedocs.io/ +.. _setuptools metadata: https://setuptools.readthedocs.io/en/latest/setuptools.html#metadata +.. _SPDX: https://spdx.dev/ +.. _PURL: https://github.com/package-url/purl-spec/ +.. _vers: https://github.com/package-url/purl-spec/blob/version-range-spec/VERSION-RANGE-SPEC.rst +.. _vers implementation for PURL: https://github.com/package-url/purl-spec/pull/139 +.. _pyp2rpm: https://github.com/fedora-python/pyp2rpm +.. _Grayskull: https://github.com/conda/grayskull +.. _dh_python: https://www.debian.org/doc/packaging-manuals/python-policy/index.html#dh-python +.. _Repology: https://repology.org/ +.. _Dependabot: https://github.com/dependabot +.. _libraries.io: https://libraries.io/ +.. _crossenv: https://github.com/benfogle/crossenv +.. _Python Packaging User Guide: https://packaging.python.org +.. _pyOpenSci Python Open Source Package Development Guide: https://www.pyopensci.org/python-package-guide/ +.. _Scikit-HEP packaging guide: https://scikit-hep.org/developer/packaging +.. _PyPA: https://github.com/pypa +.. _Reference LAPACK: https://github.com/Reference-LAPACK/lapack + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: From 20ae8dbff701a433f49c9b4ebb57e374d1c4be6b Mon Sep 17 00:00:00 2001 From: jaimergp Date: Wed, 3 Sep 2025 11:52:28 +0200 Subject: [PATCH 2/3] Update CODEOWNERS Co-authored-by: Pradyun Gedam --- .github/CODEOWNERS | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 1c58bc21620..ea9934afba2 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -678,6 +678,7 @@ peps/pep-0799.rst @pablogsal peps/pep-0800.rst @JelleZijlstra peps/pep-0801.rst @warsaw peps/pep-0802.rst @AA-Turner +peps/pep-0804.rst @pradyunsg # ... peps/pep-2026.rst @hugovk # ... From 634ce5cc8e267d2351d6c8e03c281a98c5fd2dd0 Mon Sep 17 00:00:00 2001 From: jaimergp Date: Wed, 3 Sep 2025 12:51:47 +0200 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> --- peps/pep-0804.rst | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/peps/pep-0804.rst b/peps/pep-0804.rst index e0115af1824..cb4a28be5fe 100644 --- a/peps/pep-0804.rst +++ b/peps/pep-0804.rst @@ -5,14 +5,13 @@ Author: Pradyun Gedam , Michał Górny , Jaime Rodríguez-Guerra , Michael Sarahan -Discussions-To: +Discussions-To: Pending Status: Draft Type: Standards Track Topic: Packaging Requires: 725 Created: 03-Sep-2025 Post-History: 03-Sep-2025, -Resolution: Abstract @@ -1337,12 +1336,3 @@ CC0-1.0-Universal license, whichever is more permissive. .. _Scikit-HEP packaging guide: https://scikit-hep.org/developer/packaging .. _PyPA: https://github.com/pypa .. _Reference LAPACK: https://github.com/Reference-LAPACK/lapack - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - coding: utf-8 - End: