From e09ad02b5763da8c226464e9acbb7ec69be77bd7 Mon Sep 17 00:00:00 2001 From: Brett Cannon Date: Thu, 5 Jun 2025 09:36:06 -0700 Subject: [PATCH 1/4] PEP 794: import name metadata --- peps/pep-0794.rst | 256 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100644 peps/pep-0794.rst diff --git a/peps/pep-0794.rst b/peps/pep-0794.rst new file mode 100644 index 00000000000..bb3a7cd7ee0 --- /dev/null +++ b/peps/pep-0794.rst @@ -0,0 +1,256 @@ +PEP: 794 +Title: Import Name Metadata +Author: Brett Cannon +Discussions-To: Pending +Status: Draft +Type: Standards Track +Topic: Packaging +Created: 05-Jun-2025 +Post-History: `02-May-2025 `__ + + +Abstract +======== + +This PEP proposes extending the core metadata specification for Python +packaging to include a new, repeatable field named ``Import-Name`` to record +the import names that a project owns once installed. A new key named +``import-names`` will be added to the ``[project]`` table in +``pyproject.toml``. This also leads to the introduction of core metadata +version 2.5. + + +Motivation +========== + +In Python packaging there is no requirement that a project name match the +name(s) that you can import for that project. As such, there is no clean, +easy, accurate way to go from import name to project name and vice-versa. +This can make it difficult for tools that try to help people in discovering +the right project to install when they know the import name or knowing what +import names a project will provide once installed. + +As an example, a code editor may detect a user has an unsatisfied import in a +selected virtual environment. But with no way to reliably gather the import +names that various projects provide, the code editor cannot accurately +provide a user with a list of potential projects to install to satisfy that +import requirement (e.g. it is not obvious that ``import PIL`` very likely +implies the user wants the `Pillow project +`__ installed). This also applies to when a +user vaguely remembers the project name but does not remember the import +name(s) and would have their memory jogged when seeing a list of import names +a package provides. Finally, tools would be able to notify users what import +names will become available once they install a project. + +It may also help with spam detection. If a project specifies the same import +names as a very popular project it can act as a signal to take a closer look +at the validity of the less popular project. A project found to be lying +about what import names it provides would be another signal. + + +Rationale +========= + +This PEP proposes extending the packaging :ref:`packaging:core-metadata` so +that project owners can specify the highest-level import names that a project +provides and owns if installed. + +By keeping the information to the import names a project would own (i.e. not +implicit namespace packages but modules, regular packages, submodules, and +subpackages in an explicit namespace package), it makes it clear which +project maps directly to what import name once the project is installed. + +By keeping it to the highest-level name that's owned, it keeps the data small +and allows for inferring implicit namespace packages that a project +contributes to. This will hopefully encourage use when appropriate by not +being a burden to provide appropriate information. + +Putting this metadata in the core metadata means the data is (potentially) +served independently of any sdist or wheel by an index server. That negates +needing to come up with another way to expose the metadata to tools to avoid +having to download an entire e.g. wheel. + +Various other attempts have been made to solve this, but they all have to +make various trade-offs. For instance, one could download every wheel for +every project release and look at what files are provided via the +:ref:`packaging:binary-distribution-format`, but that's a lot of CPU and +bandwidth for something that is static information (although tricks can be +used to lessen the data requests such as using HTTP range requests to only +read the table of contents of the zip file). This sort of calculation is also currently repeated by everyone +independently instead of having the metadata hosted by a central index server +like PyPI. It also doesn't work for sdists as the structure of the wheel +isn't known yet, and so inferring the structure of the code installed isn't +known yet. As well, these solutions are not necessarily accurate as it is +based on inference instead of being explicitly provided by the project +owners. + + +Specification +============= + +Because this PEP introduces a new field to the core metadata, it bumps the +latest core metadata version to 2.5. + +The ``Import-Name`` field is a "multiple uses" field. Each entry of +``Import-Name`` represents an importable name that the project provides. The +names provided MUST be importable via *some* artifact the project provides +for that version, i.e. the metadata MUST be consistent across all sdists and +wheels for a project release to avoid having to read every file to find +variances. It also avoids having to declare this field as dynamic in an +sdist due to the import names varying across wheels. This does imply that the +information isn't specific to the distribution artifact it is found in, but +for the release version the distribution artifact belongs to. + +The names provided MUST be one of the following: + +- Highest-level, regular packages +- Top-level modules +- The submodules and regular subpackages within implicit namespace packages + +provided by the project. This makes the vast majority of projects only +needing a single ``Import-Name`` entry which represents the top-level, +regular package the project provides. But it also allows for implicit +namespace packages to be able to differentiate among themselves (e.g., it +avoids having all projects contributing to the ``azure`` namespace via an +implicit namespace package all having ``azure`` as their entry for +``Import-Name``, but instead a more accurate entry like +``azure.mgmt.search``) + +If a project chooses not to provide any ``Import-Name`` entries, tools MAY +assume the import name matches the project name. + +Project owners MUST specify accurate information when provided and SHOULD be +exhaustive in what they provide. Project owners SHOULD NOT filter out names +that they consider private. This is because even "private" names can be +imported by anyone and can "take up space" in the namespace of the +environment. Tools consuming the metadata SHOULD consider the information +provided in ``Import-Name`` as accurate, but not exhaustive. + +The :ref:`declaring-project-metadata` will gain an ``import-names`` key. It will +be an array of strings that stores what will be written out to +``Import-Name``. Build back-ends MAY support dynamically calculating the +value on the user's behalf if desired, if the user declares the key to be +dynamic. + + +Examples +-------- + +`In httpx 0.28.1 +`__ +there would be only a single entry for the ``httpx`` package as it's a +regular package and there are no other regular packages or modules at the top +of the project. + +`In pytest 8.3.5 +`__ +there would be 3 entries: + +1. ``_pytest`` (a top-level, regular package) +2. ``py`` (a top-level module) +3. ``pytest`` (a top-level, regular package) + +In `azure-mgmt-search 9.1.0 +`__, +there would be a single entry for ``azure.mgmt.search`` as ``azure`` and +``azure.mgmt`` are implicit namespace packages. + + +Backwards Compatibility +======================= + +As this is a new field for the core metadata and a new core metadata version, +there should be no backwards compatibility concerns. + + +Security Implications +===================== + +Tools should treat the metadata as potentially inaccurate. As such, any +decisions made based on the provided metadata should be assumed to be +malicious in some way. + + +How to Teach This +================= + +Project owners should be taught that they can now record what namespaces +their project provides. They should be told that if their project has a +non-obvious namespace from the file structure of the project that they should +specify the appropriate information. They should have it explained to them +that they should use the shortest name possible that appropriately explains +what the project provides (i.e. what the specification requires to be +recorded). + +Users of projects don't necessarily need to know about this new metadata. +While they may be exposed to it via tooling, the details of where that data +came from isn't critical. It's possible they may come across it if an index +server exposed it (e.g., listed the values from ``Import-Name`` and marked +whether the file structure backed up the claims the metadata makes), but that +still wouldn't require users to know the technical details of this PEP. + + +Reference Implementation +======================== + +XXX + + +Rejected Ideas +============== + +Re-purpose the ``Provides`` field +---------------------------------- + +Introduced in metadata version 1.1 and deprecated in 1.2, the ``Provides`` +field was meant to provide similar information, except for **all** names +provided by a project instead of the distinguishing namespaces as this PEP +proposes. Based on that difference and the fact that ``Provides`` is +deprecated and thus could be ignored by preexisting code, the decision was +made to go with a new field. + +Name the field ``Namespace`` +---------------------------- + +While the term "namespace" name is technically accurate from an import +perspective, it could be confused with implicit namespace packages. + +Serving the ``RECORD`` file +--------------------------- + +During `discussions about a pre-PEP version +`__ of this +PEP, it was suggested that the ``RECORD`` file from wheels be served from +index servers instead of this new metadata. That would have the benefit of +being implementable immediately. But in order to provide the equivalent +information there would be necessary inference based on the file structure of +what would be installed by the wheel. That could lead to inaccurate +information. It also doesn't support sdists. + +In the end a `poll +`__ was +held and the approach this PEP takes won out. + + +Open Issues +=========== + +N/A + + +Acknowledgments +=============== + +Thanks to Heejae Chang for ~~complaining about~~ bringing up regularly the +usefulness that this metadata would provide. Thanks to Josh Cannon (no +relation) for reviewing drafts of this PEP and providing feedback. Also, +thanks to everyone who participated in a `previous discussion +`__ +on this topic. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. From e016fc88258b78d76db48814cf20fb6700ceaf29 Mon Sep 17 00:00:00 2001 From: Brett Cannon Date: Thu, 5 Jun 2025 10:39:31 -0700 Subject: [PATCH 2/4] Reference PoC for 'packaging' --- peps/pep-0794.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/peps/pep-0794.rst b/peps/pep-0794.rst index bb3a7cd7ee0..8abea967354 100644 --- a/peps/pep-0794.rst +++ b/peps/pep-0794.rst @@ -193,7 +193,8 @@ still wouldn't require users to know the technical details of this PEP. Reference Implementation ======================== -XXX +https://github.com/brettcannon/packaging/tree/pep-794 is a branch to update +'packaging' to support this PEP. Rejected Ideas @@ -209,12 +210,14 @@ proposes. Based on that difference and the fact that ``Provides`` is deprecated and thus could be ignored by preexisting code, the decision was made to go with a new field. + Name the field ``Namespace`` ---------------------------- While the term "namespace" name is technically accurate from an import perspective, it could be confused with implicit namespace packages. + Serving the ``RECORD`` file --------------------------- From fcab83310775236223c26f57e864cc0b1e30b949 Mon Sep 17 00:00:00 2001 From: Brett Cannon Date: Thu, 5 Jun 2025 10:42:30 -0700 Subject: [PATCH 3/4] Reflow --- peps/pep-0794.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/peps/pep-0794.rst b/peps/pep-0794.rst index 8abea967354..130b7e6995c 100644 --- a/peps/pep-0794.rst +++ b/peps/pep-0794.rst @@ -76,13 +76,13 @@ every project release and look at what files are provided via the :ref:`packaging:binary-distribution-format`, but that's a lot of CPU and bandwidth for something that is static information (although tricks can be used to lessen the data requests such as using HTTP range requests to only -read the table of contents of the zip file). This sort of calculation is also currently repeated by everyone -independently instead of having the metadata hosted by a central index server -like PyPI. It also doesn't work for sdists as the structure of the wheel -isn't known yet, and so inferring the structure of the code installed isn't -known yet. As well, these solutions are not necessarily accurate as it is -based on inference instead of being explicitly provided by the project -owners. +read the table of contents of the zip file). This sort of calculation is also +currently repeated by everyone independently instead of having the metadata +hosted by a central index server like PyPI. It also doesn't work for sdists as +the structure of the wheel isn't known yet, and so inferring the structure of +the code installed isn't known yet. As well, these solutions are not +necessarily accurate as it is based on inference instead of being explicitly +provided by the project owners. Specification @@ -126,8 +126,8 @@ imported by anyone and can "take up space" in the namespace of the environment. Tools consuming the metadata SHOULD consider the information provided in ``Import-Name`` as accurate, but not exhaustive. -The :ref:`declaring-project-metadata` will gain an ``import-names`` key. It will -be an array of strings that stores what will be written out to +The :ref:`declaring-project-metadata` will gain an ``import-names`` key. It +will be an array of strings that stores what will be written out to ``Import-Name``. Build back-ends MAY support dynamically calculating the value on the user's behalf if desired, if the user declares the key to be dynamic. From 957163f247659e550205a17deaf2eb9a2d0274f1 Mon Sep 17 00:00:00 2001 From: Brett Cannon Date: Thu, 5 Jun 2025 11:19:46 -0700 Subject: [PATCH 4/4] Fix typo --- peps/pep-0794.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0794.rst b/peps/pep-0794.rst index 130b7e6995c..ff63eadaa1c 100644 --- a/peps/pep-0794.rst +++ b/peps/pep-0794.rst @@ -244,7 +244,7 @@ N/A Acknowledgments =============== -Thanks to Heejae Chang for ~~complaining about~~ bringing up regularly the +Thanks to HeeJae Chang for ~~complaining about~~ bringing up regularly the usefulness that this metadata would provide. Thanks to Josh Cannon (no relation) for reviewing drafts of this PEP and providing feedback. Also, thanks to everyone who participated in a `previous discussion