diff --git a/peps/pep-0794.rst b/peps/pep-0794.rst new file mode 100644 index 00000000000..ff63eadaa1c --- /dev/null +++ b/peps/pep-0794.rst @@ -0,0 +1,259 @@ +PEP: 794 +Title: Import Name Metadata +Author: Brett Cannon +Discussions-To: Pending +Status: Draft +Type: Standards Track +Topic: Packaging +Created: 05-Jun-2025 +Post-History: `02-May-2025 `__ + + +Abstract +======== + +This PEP proposes extending the core metadata specification for Python +packaging to include a new, repeatable field named ``Import-Name`` to record +the import names that a project owns once installed. A new key named +``import-names`` will be added to the ``[project]`` table in +``pyproject.toml``. This also leads to the introduction of core metadata +version 2.5. + + +Motivation +========== + +In Python packaging there is no requirement that a project name match the +name(s) that you can import for that project. As such, there is no clean, +easy, accurate way to go from import name to project name and vice-versa. +This can make it difficult for tools that try to help people in discovering +the right project to install when they know the import name or knowing what +import names a project will provide once installed. + +As an example, a code editor may detect a user has an unsatisfied import in a +selected virtual environment. But with no way to reliably gather the import +names that various projects provide, the code editor cannot accurately +provide a user with a list of potential projects to install to satisfy that +import requirement (e.g. it is not obvious that ``import PIL`` very likely +implies the user wants the `Pillow project +`__ installed). This also applies to when a +user vaguely remembers the project name but does not remember the import +name(s) and would have their memory jogged when seeing a list of import names +a package provides. Finally, tools would be able to notify users what import +names will become available once they install a project. + +It may also help with spam detection. If a project specifies the same import +names as a very popular project it can act as a signal to take a closer look +at the validity of the less popular project. A project found to be lying +about what import names it provides would be another signal. + + +Rationale +========= + +This PEP proposes extending the packaging :ref:`packaging:core-metadata` so +that project owners can specify the highest-level import names that a project +provides and owns if installed. + +By keeping the information to the import names a project would own (i.e. not +implicit namespace packages but modules, regular packages, submodules, and +subpackages in an explicit namespace package), it makes it clear which +project maps directly to what import name once the project is installed. + +By keeping it to the highest-level name that's owned, it keeps the data small +and allows for inferring implicit namespace packages that a project +contributes to. This will hopefully encourage use when appropriate by not +being a burden to provide appropriate information. + +Putting this metadata in the core metadata means the data is (potentially) +served independently of any sdist or wheel by an index server. That negates +needing to come up with another way to expose the metadata to tools to avoid +having to download an entire e.g. wheel. + +Various other attempts have been made to solve this, but they all have to +make various trade-offs. For instance, one could download every wheel for +every project release and look at what files are provided via the +:ref:`packaging:binary-distribution-format`, but that's a lot of CPU and +bandwidth for something that is static information (although tricks can be +used to lessen the data requests such as using HTTP range requests to only +read the table of contents of the zip file). This sort of calculation is also +currently repeated by everyone independently instead of having the metadata +hosted by a central index server like PyPI. It also doesn't work for sdists as +the structure of the wheel isn't known yet, and so inferring the structure of +the code installed isn't known yet. As well, these solutions are not +necessarily accurate as it is based on inference instead of being explicitly +provided by the project owners. + + +Specification +============= + +Because this PEP introduces a new field to the core metadata, it bumps the +latest core metadata version to 2.5. + +The ``Import-Name`` field is a "multiple uses" field. Each entry of +``Import-Name`` represents an importable name that the project provides. The +names provided MUST be importable via *some* artifact the project provides +for that version, i.e. the metadata MUST be consistent across all sdists and +wheels for a project release to avoid having to read every file to find +variances. It also avoids having to declare this field as dynamic in an +sdist due to the import names varying across wheels. This does imply that the +information isn't specific to the distribution artifact it is found in, but +for the release version the distribution artifact belongs to. + +The names provided MUST be one of the following: + +- Highest-level, regular packages +- Top-level modules +- The submodules and regular subpackages within implicit namespace packages + +provided by the project. This makes the vast majority of projects only +needing a single ``Import-Name`` entry which represents the top-level, +regular package the project provides. But it also allows for implicit +namespace packages to be able to differentiate among themselves (e.g., it +avoids having all projects contributing to the ``azure`` namespace via an +implicit namespace package all having ``azure`` as their entry for +``Import-Name``, but instead a more accurate entry like +``azure.mgmt.search``) + +If a project chooses not to provide any ``Import-Name`` entries, tools MAY +assume the import name matches the project name. + +Project owners MUST specify accurate information when provided and SHOULD be +exhaustive in what they provide. Project owners SHOULD NOT filter out names +that they consider private. This is because even "private" names can be +imported by anyone and can "take up space" in the namespace of the +environment. Tools consuming the metadata SHOULD consider the information +provided in ``Import-Name`` as accurate, but not exhaustive. + +The :ref:`declaring-project-metadata` will gain an ``import-names`` key. It +will be an array of strings that stores what will be written out to +``Import-Name``. Build back-ends MAY support dynamically calculating the +value on the user's behalf if desired, if the user declares the key to be +dynamic. + + +Examples +-------- + +`In httpx 0.28.1 +`__ +there would be only a single entry for the ``httpx`` package as it's a +regular package and there are no other regular packages or modules at the top +of the project. + +`In pytest 8.3.5 +`__ +there would be 3 entries: + +1. ``_pytest`` (a top-level, regular package) +2. ``py`` (a top-level module) +3. ``pytest`` (a top-level, regular package) + +In `azure-mgmt-search 9.1.0 +`__, +there would be a single entry for ``azure.mgmt.search`` as ``azure`` and +``azure.mgmt`` are implicit namespace packages. + + +Backwards Compatibility +======================= + +As this is a new field for the core metadata and a new core metadata version, +there should be no backwards compatibility concerns. + + +Security Implications +===================== + +Tools should treat the metadata as potentially inaccurate. As such, any +decisions made based on the provided metadata should be assumed to be +malicious in some way. + + +How to Teach This +================= + +Project owners should be taught that they can now record what namespaces +their project provides. They should be told that if their project has a +non-obvious namespace from the file structure of the project that they should +specify the appropriate information. They should have it explained to them +that they should use the shortest name possible that appropriately explains +what the project provides (i.e. what the specification requires to be +recorded). + +Users of projects don't necessarily need to know about this new metadata. +While they may be exposed to it via tooling, the details of where that data +came from isn't critical. It's possible they may come across it if an index +server exposed it (e.g., listed the values from ``Import-Name`` and marked +whether the file structure backed up the claims the metadata makes), but that +still wouldn't require users to know the technical details of this PEP. + + +Reference Implementation +======================== + +https://github.com/brettcannon/packaging/tree/pep-794 is a branch to update +'packaging' to support this PEP. + + +Rejected Ideas +============== + +Re-purpose the ``Provides`` field +---------------------------------- + +Introduced in metadata version 1.1 and deprecated in 1.2, the ``Provides`` +field was meant to provide similar information, except for **all** names +provided by a project instead of the distinguishing namespaces as this PEP +proposes. Based on that difference and the fact that ``Provides`` is +deprecated and thus could be ignored by preexisting code, the decision was +made to go with a new field. + + +Name the field ``Namespace`` +---------------------------- + +While the term "namespace" name is technically accurate from an import +perspective, it could be confused with implicit namespace packages. + + +Serving the ``RECORD`` file +--------------------------- + +During `discussions about a pre-PEP version +`__ of this +PEP, it was suggested that the ``RECORD`` file from wheels be served from +index servers instead of this new metadata. That would have the benefit of +being implementable immediately. But in order to provide the equivalent +information there would be necessary inference based on the file structure of +what would be installed by the wheel. That could lead to inaccurate +information. It also doesn't support sdists. + +In the end a `poll +`__ was +held and the approach this PEP takes won out. + + +Open Issues +=========== + +N/A + + +Acknowledgments +=============== + +Thanks to HeeJae Chang for ~~complaining about~~ bringing up regularly the +usefulness that this metadata would provide. Thanks to Josh Cannon (no +relation) for reviewing drafts of this PEP and providing feedback. Also, +thanks to everyone who participated in a `previous discussion +`__ +on this topic. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.