From cff1956c31e4ca741f2073dc7591da6564e81cfa Mon Sep 17 00:00:00 2001 From: Seth Michael Larson Date: Wed, 22 Jan 2025 13:04:15 -0600 Subject: [PATCH 1/6] Add sections for Users, Projects, and SCA tools in 'How to Teach' --- peps/pep-0770.rst | 64 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/peps/pep-0770.rst b/peps/pep-0770.rst index 2880368fefc..3f67dde24cd 100644 --- a/peps/pep-0770.rst +++ b/peps/pep-0770.rst @@ -371,6 +371,9 @@ of this standard. The details of this standard are most important to either maintainers of Python packages and developers of SCA tools such as SBOM generation tools and vulnerability scanners. +What do Python package maintainers need to know? +------------------------------------------------ + Most Python packages don't contain code from other software components and thus are already measurable by SCA tools without the need of this standard or additional SBOM documents. Pure-Python packages are about `~90% `__ @@ -380,9 +383,68 @@ For projects that do contain other software components, documentation will be added to the Python Packaging User Guide for how to specify and maintain SBOM documents for Python packages in source code. +There are two "camps" of projects that contain other software, those from +a "packaging ecosystem" (PyPI, Linux distros, Rust, NPM, etc) and those from +outside a packaging ecosystem (vendored C, C++, Fortran). Software that is +a part of a packaging ecosystem is much easier to identify meaning +that package maintainers may have their package SBOM data annotated +automatically by common build tools (auditwheel, cibuildwheel, multibuild, etc). + +For projects that cannot be automatically annotated, the approach will be to +generate SBOM files by some means and then include those files manually using +``pyproject.toml``: + + .. code-block:: toml + + [project] + ... + sbom-files = [ + "sboms/bom.cdx.json" + ] + +For projects manually specifying an SBOM document the challenge will be +keeping the document up-to-date. The CPython project has some +`customized tooling `__ +for this task, but it can likely be generalized into a tool reusable by other +projects. + +What do SBOM tool authors need to know? +--------------------------------------- + +Developers of SBOM generation tooling will need to know about the existence +of this PEP and that Python packages may begin publishing SBOM documents +within package archives. This information needs to be included as a part of +generating an SBOM document for a particular Python package or Python +environment. + A follow-up informational PEP will be authored to describe how to transform Python packaging metadata, including the mechanism described in this PEP, -into an SBOM document describing Python packages. +into an SBOM document describing Python packages. Once the informational PEP is +complete, tracking issues will be opened specifically linking to the +informational PEP to spur the adoption of PEP 770 by SBOM tools. + +A `benchmark is being created `__ +to compare the outputs of different SBOM tools when run with various Python +packaging inputs (package archive, installed package, environment, container +image) is being created to track the progress of different SBOM generation +tools. This benchmark will inform where tools have gaps in support +of this PEP and Python packages. + +What do users of SBOM documents need to know? +--------------------------------------------- + +Many users of this PEP won't know of its existence, instead their software +composition analysis tools, SBOM tools, or vulnerability scanners will simply +begin giving more comprehensive information after an upgrade. For users that are +interested in the sources of this new information, the "tool" field of SBOM +metadata already provides linkages to the projects generating their SBOMs. + +For users who need SBOM documents describing their open source dependencies the +first step should always be "create them yourself". Using the benchmarks above +a list of tools that are known to be accurate for Python packages can be +documented and recommended to users. For projects which require +additional manual SBOM annotation: tips for contributing this data and tools for +maintaining the data can be recommended. Reference Implementation ======================== From cd7a24a76e1df38bce25179bde0847e9983e87e4 Mon Sep 17 00:00:00 2001 From: Seth Michael Larson Date: Thu, 23 Jan 2025 13:36:24 -0600 Subject: [PATCH 2/6] Address review comments --- peps/pep-0770.rst | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/peps/pep-0770.rst b/peps/pep-0770.rst index 3f67dde24cd..c5d40b8fbae 100644 --- a/peps/pep-0770.rst +++ b/peps/pep-0770.rst @@ -383,12 +383,15 @@ For projects that do contain other software components, documentation will be added to the Python Packaging User Guide for how to specify and maintain SBOM documents for Python packages in source code. -There are two "camps" of projects that contain other software, those from -a "packaging ecosystem" (PyPI, Linux distros, Rust, NPM, etc) and those from +There are two groups of projects that contain other software, those from +a "packaging ecosystem" (PyPI, Linux distros, Crates.io, NPM, etc) and those from outside a packaging ecosystem (vendored C, C++, Fortran). Software that is a part of a packaging ecosystem is much easier to identify meaning that package maintainers may have their package SBOM data annotated -automatically by common build tools (auditwheel, cibuildwheel, multibuild, etc). +automatically by common build tools ( +`auditwheel `__, +`cibuildwheel `__, +`multibuild `__, etc). For projects that cannot be automatically annotated, the approach will be to generate SBOM files by some means and then include those files manually using From 56a280b1028dd4b22e96e4d24fd9c984d9ee4e86 Mon Sep 17 00:00:00 2001 From: Seth Michael Larson Date: Thu, 23 Jan 2025 18:06:06 -0600 Subject: [PATCH 3/6] Add root SBOM directory terminology, more tweaks --- peps/pep-0770.rst | 56 +++++++++++++++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 16 deletions(-) diff --git a/peps/pep-0770.rst b/peps/pep-0770.rst index c5d40b8fbae..d9d9bc8ec50 100644 --- a/peps/pep-0770.rst +++ b/peps/pep-0770.rst @@ -15,14 +15,20 @@ Post-History: Abstract ======== +Almost all Python packages today are accurately measurable by software +composition analysis (SCA) tools and therefore do not need additional metadata +to improve measurability. For projects that are not accurately measurable, there +is no existing mechanism to annotate a Python package with composition data to +improve measurability. + Software Bill-of-Materials (SBOM) is a technology-and-ecosystem-agnostic method for describing software composition, provenance, heritage, and more. -SBOMs are used as inputs for software composition analysis (SCA) tools, -such as scanners for vulnerabilities and licenses, and have been gaining -traction in global software regulations and frameworks. +SBOMs are used as inputs for SCA tools, such as scanners for vulnerabilities and +licenses, and have been gaining traction in global software regulations and +frameworks. This PEP proposes using SBOM documents included in Python packages as a -means to improve software measurability for Python packages. +means to improve automated software measurability for Python packages. The changes will update the `Core Metadata specification `__ to version 2.5. @@ -141,6 +147,24 @@ In addition to the above, an informational PEP will be created for tools consuming included SBOM documents and other Python package metadata to generate complete SBOM documents for Python packages. +Terminology +----------- + +This section describes terminology used later in the document: + +* **Root SBOM directory**: This is the directory within a Python project source + tree or package archive that SBOM documents are stored in. For + :term:`Project source trees ` and + :term:`Source Distributions ` the root SBOM + directory is the same directory containing ``pyproject.toml`` or other "root" + metadata file like ``PKG-INFO``/``setup.py``. + + For :term:`Built Distribution`s and + :term:`Installed projects ` the root SBOM directory is + defined as ``.dist-info/sboms``. The new ``Sbom-File`` Core Metadata + field defined below always specifies SBOM documents relative to the root SBOM + directory for the specific project format. + .. _770-spec-core-metadata: Core Metadata @@ -149,9 +173,9 @@ Core Metadata Add ``Sbom-File`` field ~~~~~~~~~~~~~~~~~~~~~~~ -The ``Sbom-File`` is an optional Core Metadata field. Each instance contains a -string representation of the path of an SBOM document. The path is located -within the project source tree, relative to the project root directory. It is a +The ``Sbom-File`` is a new optional Core Metadata field. Each instance contains a +string representation of the path to an SBOM document. The path is specified +relative to the root SBOM directory for all project types. It is a multi-use field that MAY appear zero or more times and each instance lists the path to one such file. Files specified under this field are SBOM documents that are distributed with the package. @@ -170,8 +194,7 @@ If an ``Sbom-File`` is listed in a relative path. * Inside the root SBOM directory, packaging tools MUST reproduce the directory structure under which the source files are located relative to the project - root. The root SBOM directory is - `specified in a later section <#770-spec-project-formats>`__. + root. * Path delimiters MUST be the forward slash character (``/``), and parent directory indicators (``..``) MUST NOT be used. @@ -191,10 +214,10 @@ This PEP specifies changes to the project's source metadata under a Add ``sbom-files`` key ~~~~~~~~~~~~~~~~~~~~~~ -A new ``sbom-files`` key is added to the ``[project]`` table for specifying -paths in the project source tree relative to ``pyproject.toml`` to file(s) -containing SBOMs to be distributed with the package. This key corresponds to the -``Sbom-File`` fields in the Core Metadata. +A new optional ``sbom-files`` key is added to the ``[project]`` table for +specifying paths in the project source tree relative to ``pyproject.toml`` to +file(s) containing SBOMs to be distributed with the package. This key +corresponds to the ``Sbom-File`` fields in the Core Metadata. Its value is an array of strings which MUST contain valid glob patterns, as specified below: @@ -384,8 +407,8 @@ added to the Python Packaging User Guide for how to specify and maintain SBOM documents for Python packages in source code. There are two groups of projects that contain other software, those from -a "packaging ecosystem" (PyPI, Linux distros, Crates.io, NPM, etc) and those from -outside a packaging ecosystem (vendored C, C++, Fortran). Software that is +a "packaging ecosystem" (PyPI, Linux distros, Crates.io, NPM, etc) and those +from outside a packaging ecosystem (vendored C, C++, Fortran). Software that is a part of a packaging ecosystem is much easier to identify meaning that package maintainers may have their package SBOM data annotated automatically by common build tools ( @@ -480,7 +503,8 @@ Open Issues Conditional project source SBOM files ------------------------------------- -How can a project specify an SBOM file that is conditional? Under what circumstances would an SBOM document be conditional? +How can a project specify an SBOM file that is conditional? Under what +circumstances would an SBOM document be conditional? Selecting a single SBOM standard -------------------------------- From ad5e14da90879431754b15c5c3060fa037f15ced Mon Sep 17 00:00:00 2001 From: Seth Michael Larson Date: Fri, 24 Jan 2025 09:12:03 -0600 Subject: [PATCH 4/6] Fix lint, clarify recording rather than storing --- peps/pep-0770.rst | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/peps/pep-0770.rst b/peps/pep-0770.rst index d9d9bc8ec50..315ed30fb03 100644 --- a/peps/pep-0770.rst +++ b/peps/pep-0770.rst @@ -16,10 +16,9 @@ Abstract ======== Almost all Python packages today are accurately measurable by software -composition analysis (SCA) tools and therefore do not need additional metadata -to improve measurability. For projects that are not accurately measurable, there -is no existing mechanism to annotate a Python package with composition data to -improve measurability. +composition analysis (SCA) tools. For projects that are not accurately +measurable, there is no existing mechanism to annotate a Python package +with composition data to improve measurability. Software Bill-of-Materials (SBOM) is a technology-and-ecosystem-agnostic method for describing software composition, provenance, heritage, and more. @@ -153,17 +152,14 @@ Terminology This section describes terminology used later in the document: * **Root SBOM directory**: This is the directory within a Python project source - tree or package archive that SBOM documents are stored in. For + tree or package archive that SBOM document paths are recorded relative to. For :term:`Project source trees ` and :term:`Source Distributions ` the root SBOM directory is the same directory containing ``pyproject.toml`` or other "root" - metadata file like ``PKG-INFO``/``setup.py``. - - For :term:`Built Distribution`s and + metadata file like ``PKG-INFO``/``setup.py``. For + :term:`Built Distributions ` and :term:`Installed projects ` the root SBOM directory is - defined as ``.dist-info/sboms``. The new ``Sbom-File`` Core Metadata - field defined below always specifies SBOM documents relative to the root SBOM - directory for the specific project format. + defined as ``.dist-info/sboms``. .. _770-spec-core-metadata: From 5a2bea568148a6d4bc8d8821024b6f0ad1400c00 Mon Sep 17 00:00:00 2001 From: Seth Michael Larson Date: Fri, 24 Jan 2025 12:21:35 -0600 Subject: [PATCH 5/6] Adopt PEP 639-esque wording for 'root SBOM directory' --- peps/pep-0770.rst | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/peps/pep-0770.rst b/peps/pep-0770.rst index 315ed30fb03..f8d0a7d0f83 100644 --- a/peps/pep-0770.rst +++ b/peps/pep-0770.rst @@ -151,46 +151,55 @@ Terminology This section describes terminology used later in the document: -* **Root SBOM directory**: This is the directory within a Python project source - tree or package archive that SBOM document paths are recorded relative to. For - :term:`Project source trees ` and - :term:`Source Distributions ` the root SBOM - directory is the same directory containing ``pyproject.toml`` or other "root" - metadata file like ``PKG-INFO``/``setup.py``. For - :term:`Built Distributions ` and - :term:`Installed projects ` the root SBOM directory is - defined as ``.dist-info/sboms``. +.. glossary:: + + root SBOM directory + The directory under which SBOM files are stored in a + :term:`project source tree`, :term:`distribution archive` + or :term:`installed project`. + Also, the root directory that their paths + recorded in the :ref:`Sbom-File <770-spec-sbom-file-field>` + :term:`Core Metadata field` are relative to. + Defined to be the :term:`project root directory` + for a :term:`project source tree` or + :term:`source distribution `; + and a subdirectory named ``sboms`` of + the directory containing the :term:`built metadata`— + i.e., the ``.dist-info/sboms`` directory— + for a :term:`Built Distribution` or :term:`installed project`. .. _770-spec-core-metadata: Core Metadata ------------- +.. _770-spec-sbom-file-field: + Add ``Sbom-File`` field ~~~~~~~~~~~~~~~~~~~~~~~ The ``Sbom-File`` is a new optional Core Metadata field. Each instance contains a string representation of the path to an SBOM document. The path is specified -relative to the root SBOM directory for all project types. It is a +relative to the :term:`root SBOM directory` for all project types. It is a multi-use field that MAY appear zero or more times and each instance lists the path to one such file. Files specified under this field are SBOM documents that are distributed with the package. As `specified by this PEP <#770-spec-project-formats>`__, its value is also -that file's path relative to the root SBOM directory in both installed projects -and the standardized Distribution Package types. +that file's path relative to the :term:`root SBOM directory` in both installed +projects and the standardized Distribution Package types. If an ``Sbom-File`` is listed in a :term:`Source Distribution ` or :term:`Built Distribution`'s Core Metadata: * That file MUST be included in the :term:`distribution archive` at the - specified path relative to the root SBOM directory. + specified path relative to the :term:`root SBOM directory`. * Installers MUST install the file with the :term:`project` at that same relative path. -* Inside the root SBOM directory, packaging tools MUST reproduce the directory - structure under which the source files are located relative to the project - root. +* Inside the :term:`root SBOM directory`, packaging tools MUST reproduce the + directory structure under which the source files are located relative to the + project root. * Path delimiters MUST be the forward slash character (``/``), and parent directory indicators (``..``) MUST NOT be used. From 33d8bec9a188812010fbad7053d966222163bd55 Mon Sep 17 00:00:00 2001 From: Seth Michael Larson Date: Thu, 30 Jan 2025 15:03:01 -0600 Subject: [PATCH 6/6] Clarify package maintainer How-To-Teach section --- peps/pep-0770.rst | 37 +++++++++++++++---------------------- 1 file changed, 15 insertions(+), 22 deletions(-) diff --git a/peps/pep-0770.rst b/peps/pep-0770.rst index f8d0a7d0f83..60422420bcb 100644 --- a/peps/pep-0770.rst +++ b/peps/pep-0770.rst @@ -402,28 +402,21 @@ SBOM generation tools and vulnerability scanners. What do Python package maintainers need to know? ------------------------------------------------ -Most Python packages don't contain code from other software components and thus -are already measurable by SCA tools without the need of this standard or -additional SBOM documents. Pure-Python packages are about `~90% `__ -of popular packages on PyPI. - -For projects that do contain other software components, documentation will be -added to the Python Packaging User Guide for how to specify and maintain -SBOM documents for Python packages in source code. - -There are two groups of projects that contain other software, those from -a "packaging ecosystem" (PyPI, Linux distros, Crates.io, NPM, etc) and those -from outside a packaging ecosystem (vendored C, C++, Fortran). Software that is -a part of a packaging ecosystem is much easier to identify meaning -that package maintainers may have their package SBOM data annotated -automatically by common build tools ( -`auditwheel `__, -`cibuildwheel `__, -`multibuild `__, etc). - -For projects that cannot be automatically annotated, the approach will be to -generate SBOM files by some means and then include those files manually using -``pyproject.toml``: +Python package metadata can already describe the top-level software included in +a package archive, but what if a package archive contains other software +components beyond the top-level software? For example, the Python wheel for +"Pillow" contains a handful of other software libraries bundled inside, like +``libjpeg``, ``libpng``, ``libwebp``, and so on. This scenario is where this PEP +is most useful, for adding metadata about bundled software to a Python package. + +Some build tools may be able to automatically annotate bundled dependencies. +Typically tools can automatically annotate bundled dependencies when those +dependencies come from a "packaging ecosystem" (such as PyPI, Linux distros, +Crates.io, NPM, etc). + +For packages which cannot be automatically annotated and if the package author +wishes to provide an SBOM the approach will be to generate or author SBOM files +and then include those files using ``pyproject.toml``: .. code-block:: toml