Skip to content

Commit 296e606

Browse files
committed
PEP 9999: JSON Package Metadata
This is the first revision of a PEP to change package metadata to JSON format.
1 parent 02f5423 commit 296e606

File tree

3 files changed

+548
-0
lines changed

3 files changed

+548
-0
lines changed

peps/pep-9999.rst

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
PEP: 9999
2+
Title: JSON Package Metadata
3+
Author: Emma Harper Smith <emma@python.org>
4+
PEP-Delegate: Paul Moore
5+
Discussions-To: Pending
6+
Status: Draft
7+
Type: Standards Track
8+
Topic: Packaging
9+
Created: 2025-12-09
10+
Post-History: Pending
11+
12+
13+
Abstract
14+
========
15+
16+
Python package metadata ("core metadata") was first defined in :pep:`241` to
17+
use :rfc:`822` email headers to encode information about packages. This was
18+
reasonable at the time; email messages were the only widely used, standardized
19+
text format that had a parser in the standard library at the time. However,
20+
issues with handling different encodings, differing handling of line breaks,
21+
and other differences between implementations have caused numerous packaging
22+
bugs. To resolve these issues, this PEP proposes introducing a
23+
`Javascript Object Notation (JSON) <https://www.json.org/json-en.html>`_
24+
encoded file containing core metadata in Python packages.
25+
26+
27+
Motivation
28+
==========
29+
30+
The email message format has a number of complexities and limitations which
31+
reduce its utility as a portable textual interchange format for packaging
32+
metadata. Due to the :mod:`email` parser requiring configuration changes to
33+
properly generate valid core metadata, many projects do not use the
34+
:mod:`!email` module and instead generate core metadata in a custom manner.
35+
There are many pitfalls with generating email headers that these custom
36+
generators can hit. First, core metadata fields may contain newlines in the
37+
value of fields. These newlines must be handled properly to "unfolded" multiple
38+
lines per :rfc:`822`. Improperly escaped newlines can lead to generating
39+
invalid core metadata. Second, as discussed in the core metadata
40+
specifications:
41+
42+
.. epigraph::
43+
The standard file format for metadata (including in wheels and installed
44+
projects) is based on the format of email headers. However, email formats
45+
have been revised several times, and exactly which email RFC applies to
46+
packaging metadata is not specified. In the absence of a precise
47+
definition, the practical standard is set by what the standard library
48+
:mod:`email.parser` module can parse using the
49+
:attr:`email.policy.compat32` policy.
50+
51+
Since no specific email RFC is selected, the current core metadata
52+
specification is ambiguous whether a given core metadata document is valid.
53+
:rfc:`822` is the only email standard to be explicitly listed in a PEP.
54+
However, the core metadata specifications also requires that core metadata is
55+
encoded using UTF-8 when written to a file. This de-facto makes the core
56+
metadata follow :rfc:`6532`, which specifies internationalization of email
57+
headers. This has practical interoperability concerns. Until a few years ago,
58+
it was unspecified how to handle non-ASCII encoded content in core metadata,
59+
causing confusion about how to properly encode non-ASCII emails in core
60+
metadata. Third, the current format is difficult to properly validate and
61+
parse. Many tools do not check for issues with the output of the :mod:`!email`
62+
parser. If a document is malformed, it may still parse without error by the
63+
:mod:`!email` module as a valid email message. Furthermore, due to limitations
64+
in the email format, fields like ``Project-Url`` must create custom encodings
65+
of nested key-value items, further complicating parsing. Finally, the lack of
66+
a schema makes it difficult to validate the contents of email message encoded
67+
metadata. While introducing a specification for the current format has been
68+
`discussed previously <https://discuss.python.org/t/python-metadata-format-specification-and-implementation/7550>`_,
69+
no progress had been made, and converting to JSON was a suggested resolution
70+
to the issues raised.
71+
72+
73+
Rationale
74+
=========
75+
76+
Introducing a new core metadata file with a well-specified format will greatly
77+
ease generating, parsing, and validating metadata. JSON is a natural choice for
78+
storing package core metadata. It is easily machine readable and writable, is
79+
understandable to humans, and is well supported across many languages.
80+
Furthermore, :pep:`566` already specifies a canonicalization of email formatted
81+
core metadata to JSON. JSON is also a frequently used format for data
82+
interchange on the web. For discussion of other formats considered, please
83+
refer to the rejected ideas section.
84+
85+
To maintain backwards compatibility, the JSON metadata file MUST be generated
86+
alongside the existing email formatted metadata file. This ensures that tools
87+
that do not support the new format can still read package metadata for new
88+
packages.
89+
90+
The JSON formatted metadata file must be semantically equivalent to the email
91+
encoded file. This ensures that the metadata is unambiguous between the two
92+
formats, and tools may read either when both are present. To maintain
93+
performance, this equivalence is not required to be verified by installers,
94+
though other tools may do so. Some tools may choose to make the check dependent
95+
on a configuration flag.
96+
97+
Package indexes SHOULD check that the metadata files are semantically
98+
equivalent when the package is added to the index. This is a low-cost, one-time
99+
check that ensures users of the index are served valid packages.
100+
101+
102+
Specification
103+
=============
104+
105+
JSON Format Core Metadata File
106+
------------------------------
107+
108+
A new optional file ``METADATA.json`` shall be introduced as a metadata file
109+
for Python packages. If generated, the ``METADATA.json`` file MUST be placed in
110+
the same directory as the current email formatted ``METADATA`` or ``PKG-INFO``
111+
file.
112+
113+
For wheels, this means that ``METADATA.json`` MUST be located in the
114+
``.dist-info`` directory. The wheel format minor version will be incremented to
115+
indicate the change in the format.
116+
117+
For source distribution packages, the ``METADATA.json`` file MUST be located
118+
in the root directory of the project sources. Tools that prefer the JSON
119+
formatted metadata file MUST check for the existence of a ``METADATA.json``
120+
in the source distribution before reading the file.
121+
122+
The semantic contents of the ``METADATA`` and ``METADATA.json`` files MUST be
123+
equivalent if ``METADATA.json`` is present. Installers MAY verify this
124+
information. Public package indexes SHOULD verify the files are semantically
125+
equivalent.
126+
127+
Conversion to JSON Encoding
128+
---------------------------
129+
130+
Conversion from the current email format for core metadata to JSON should
131+
follow the process described in :pep:`566`, with the following modification:
132+
the ``Project-URL`` entries should be converted into an object with keys
133+
containing the labels and values containing the URLs from the original email
134+
value. The overall process thus becomes:
135+
136+
#. The original key-value format should be read with
137+
``email.parser.HeaderParser``;
138+
#. All transformed keys should be reduced to lower case. Hyphens should be
139+
replaced with underscores, but otherwise should retain all other characters;
140+
#. The transformed value for any field marked with "(Multiple-use") should be a
141+
single list containing all the original values for the given key;
142+
#. The ``Keywords`` field should be converted to a list by splitting the
143+
original value on commas;
144+
#. The ``Project-URL`` field should be converted into a JSON object with keys
145+
containing the labels and values containing the URLs from the original email
146+
value.
147+
#. The message body, if present, should be set to the value of the
148+
``description`` key.
149+
#. The result should be stored as a string-keyed dictionary.
150+
151+
One edge case in the above conversion is that the ``Project-URL`` label is
152+
"free text, with a maximum length of 32 characters." This presents a problem
153+
when trying to decode the label. Therefore this PEP sets the requirement that
154+
the ``Project-URL`` label be any text *except* the comma (``,``) character.
155+
This allows for unambiguous parsing of the ``Project-URL`` entries by splitting
156+
the text on the left-most comma (``,``) character.
157+
158+
JSON Schema for Core Metadata
159+
-----------------------------
160+
161+
To enable verification of JSON encoded core metadata, a
162+
`JSON schema <https://json-schema.org/>`_ for core metadata has been produced.
163+
This schema will be updated with each revision to the core metadata
164+
specification. The schema is available in
165+
:ref:`9999-core-metadata-json-schema`.
166+
167+
TODO: where should the schema be served/what should the $id be?
168+
169+
Serving METADATA.json in the Simple Repository API
170+
--------------------------------------------------
171+
172+
:pep:`658` introduced a means of serving package metadata in the Simple
173+
Repository API. The JSON encoded version of the package metadata may also be
174+
served, via the following modifications to the Simple Repository API:
175+
176+
A new attribute ``data-dist-info-metadata-json`` may be added to anchor tags
177+
in the Simple API. This attribute should have a value containing the hash
178+
information for the ``METADATA.json`` file in the same format as
179+
``data-dist-info-metadata``. If ``data-dist-info-metadata-json`` is present,
180+
the repository MUST serve the JSON encoded metadata file at the
181+
distribution's path with ``.metadata.json`` appended to it. For example, if a
182+
distribution is served at ``/simple/foo-1.0-py3-none-any.whl``, the JSON
183+
encoded core metadata file MUST be served at
184+
``/simple/foo-1.0-py3-none-any.whl.metadata.json``.
185+
186+
Deprecation of the ``METADATA`` and ``PKG-INFO`` Files
187+
------------------------------------------------------
188+
189+
The ``METADATA`` and ``PKG-INFO`` files are now deprecated. This means that a
190+
future PEP may make the ``METADATA`` and ``PKG-INFO`` files optional and
191+
require ``METADATA.json`` to be present. Please see the next section for
192+
caveats to that change.
193+
194+
Despite the ``METADATA`` and ``PKG-INFO`` files being deprecated, new core
195+
metadata revisions should be implemented for both JSON and email to ensure that
196+
they may remain semantically equivalent.
197+
198+
Backwards Compatibility
199+
=======================
200+
201+
The specification for ``METADATA.json`` is designed such that the new format is
202+
completely backwards compatible. Existing tools may read metadata from the
203+
existing email formatted files, and new tools may take advantage of the new
204+
format.
205+
206+
A future major revision of the wheel specification may make the ``METADATA``
207+
and ``PKG-INFO`` files optional and make the ``METADATA.json`` file required.
208+
Note that tools will need to maintain parsing of email metadata indefinitely to
209+
support parsing metadata for old packages which only have the ``METADATA`` or
210+
``PKG-INFO`` files.
211+
212+
213+
Security Implications
214+
=====================
215+
216+
One attack vector with JSON encoded core metadata is if the JSON payload is
217+
designed to consume excessive memory or CPU resources in a denial of service
218+
attack. While this attack is not likely to affect users whom can cancel
219+
resource-intensive operations, it may be an issue for package indexes.
220+
221+
There are several mitigations that can be made to prevent this:
222+
223+
#. The length of the JSON payload can be restricted to a reasonable size.
224+
#. The reader may use a :class:`~json.JSONDecoder` to omit parsing :class:`int`
225+
and :class:`float` values to avoid quadratic number parsing time complexity
226+
attacks.
227+
#. I plan to contribute a change to the :class:`~json.JSONDecoder` in Python
228+
3.15+ that will allow it to be configured to restrict the nesting of JSON
229+
payloads to a reasonable depth.
230+
231+
With these mitigations in place, concerns about denial of service attacks with
232+
JSON encoded core metadata are minimal.
233+
234+
235+
Reference Implementation
236+
========================
237+
238+
A reference implementation of the JSON schema for JSON core metadata is
239+
available in :ref:`9999-core-metadata-json-schema`.
240+
241+
Furthermore, a reference implementation in the ``packaging`` library `is
242+
available
243+
<https://github.com/wheelnext/packaging/tree/PEP-9999-JSON-metadata>`__.
244+
245+
246+
Rejected Ideas
247+
==============
248+
249+
Using Another File Format (TOML, YAML, etc.)
250+
--------------------------------------------
251+
252+
While TOML or another format could be used for the new core metadata file
253+
format, JSON has been chosen for a few reasons:
254+
255+
#. Core metadata is mostly meant as a machine interchange format to be used by
256+
tools and services which wish to interoperate. Therefore the
257+
human-readability of TOML is not an important consideration in this
258+
selection.
259+
#. JSON parsers are implemented in many languages' standard libraries and the
260+
:mod:`json` module has been part of Python's standard library for a very
261+
long time.
262+
#. JSON is fast to parse and emit.
263+
#. JSON schemas are JSON native and commonly used.
264+
265+
266+
Open Issues
267+
===========
268+
269+
Where Should the JSON Schema be Served?
270+
---------------------------------------
271+
272+
Where should the standard JSON Schema be served? Some options would be
273+
packaging.python.org, pypi.org, python.org, or pypa.org.
274+
275+
My first choice would be packaging.python.org, but I am open to other options.
276+
277+
Should we also update the ``WHEEL`` metadata file format to be JSON encoded?
278+
----------------------------------------------------------------------------
279+
280+
The ``WHEEL`` metadata file format is also an email formatted file. This means
281+
that it is subject to the same parsing and validation issues as the
282+
``METADATA`` and ``PKG-INFO`` files. However, the ``WHEEL`` file is part of the
283+
initial wheel format version check done by installers. Changing the file format
284+
might harm backwards compatibility by making old installers unable to read new
285+
metadata.
286+
287+
I think it could make sense to introduce a ``WHEEL.json`` file. Then a future
288+
wheel major version could remove the ``WHEEL`` file and require the
289+
``WHEEL.json`` file instead.
290+
291+
292+
Copyright
293+
=========
294+
295+
This document is placed in the public domain or under the
296+
CC0-1.0-Universal license, whichever is more permissive.
297+
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
:orphan:
2+
3+
.. _9999-core-metadata-json-schema:
4+
5+
Appendix: JSON Schema for Core Metadata
6+
=======================================
7+
8+
.. literalinclude:: core-metadata.schema.json
9+
:language: json
10+
:linenos:
11+
:name: core-metadata-schema

0 commit comments

Comments
 (0)