From a003b96c51a117e2ab64e8e74c139691b9afced0 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Wed, 26 Mar 2025 01:13:31 -0700 Subject: [PATCH 01/16] Add initial PEP 781 text --- peps/pep-0781.rst | 241 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 241 insertions(+) create mode 100644 peps/pep-0781.rst diff --git a/peps/pep-0781.rst b/peps/pep-0781.rst new file mode 100644 index 00000000000..3aa114bfdd0 --- /dev/null +++ b/peps/pep-0781.rst @@ -0,0 +1,241 @@ +PEP: 0781 +Title: Adding Zstandard to the standard library +Author: Emma Harper Smith +Sponsor: Gregory P. Smith +Status: Draft +Type: Standards Track +Created: TBD +Python-Version: 3.14 + +Abstract +======== + +`Zstandard `_ is a widely adopted, mature, +and highly efficient compression standard. This PEP proposes adding a new +module to the Python standard library containing a Python wrapper around Meta's +``libzstd`` library, the default implementation. Additionally, to avoid name +collisions and present a unified interface, compression modules in the standard +library will be moved under a ``compression.*`` namespace. + +Motivation +========== + +CPython has modules for several different compression formats, such as `zlib +(DEFLATE) `_, +`bzip2 `_, +and `lzma `_, each widely used. +Including popular compression algorithms matches Python's "batteries included" +philosophy. The last compression algorithm added to the language was the +``lzma`` module, added in Python 3.3. Since that time, several new compression +formats have become very popular, including Zstandard. + +Zstandard was released over a decade ago. In the intervening time, it has seen +`widespread adoption in many different areas of computing `_, +including databases, filesystems, archive formats (including ``tar``), package +formats (including conda packages), and other file formats (including several +Apache formats like Arrow). + +There are multiple bindings to Zstandard for Python available on PyPI. One +goal with introducing an official module in the standard library is to reduce +confusion for Python users who want simple compression/decompression APIs. +These packages can continue providing extended APIs and bindings for other +Python implementations such as PyPy. + +Additionally, a long standing `open issue `_ +requesting Zstandard support in the ``tarfile`` module requires Zstandard +support in the standard library. This issue has the 5th most "thumbs up" of +open issues on the CPython tracker, and has garnered a significant amount of +discussion. + +Another use case of Zstandard compression could be to both reduce the size of +packages and increase installation speed for Python wheels. Anaconda found a +significant speedup when adopting Zstandard for the conda package format + +.. epigraph:: + + We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format. + + -- `Anaconda blog on Zstandard adoption `_ + +`According to lzbench `_, +a comprehensive benchmark of many different compression libraries and formats, +Zstandard has a significantly higher compression ratio compared to wheel's +existing zlib-based compression. While this PEP does *not* prescribe any +changes to the wheel format, having Zstandard bindings in the standard library +would enable a future PEP to improve Python wheel packages. + +Rationale +========= + +Implementation based on ``pyzstd`` +---------------------------------- + +The implementation for this PEP is based on the `pyzstd project `_. +This project was chosen as the code was `originally written to be upstreamed `_ +to CPython by Ma Lin, who also wrote the `output buffer implementation used in +the standard library today `_. +The project has since been taken over by Rogdham, but the APIs are similar to +the APIs for other compression modules such as ``bz2`` and ``lzma``. + +Changes and removals to ``pyzstd``'s APIs +----------------------------------------- + +n.b. maybe this should be an appendix? + +To keep the initial implementation simple and make it easier to review, several +APIs were modified or removed completely. The "RichMem" API is removed as +CPython's output buffer does not use MREMAP. This could be integrated upstream +in a future change benefitting all compression libraries. The ``ZstdFile`` +implementation was re-written in Python to match ``lzma`` and other modules, +and reduce the amount of C code in need of review. + +The ``compress_stream`` / ``decompress_stream`` functions were removed, as they +were performance optimizations and can be replaced with using the ``open`` +function from the Zstandard module. + +The other major change is the ``level_or_options`` argument was split into two +independent arguments to keep the argument parsing clearer and improve clarity +of usage. + +Finally, features requiring newer versions of Zstandard were removed, which +is mostly the support for the ``ZSTD_c_targetCBlockSize`` compression +parameter. + +Minimum supported Zstandard version +----------------------------------- + +The minimum supported Zstandard was chosen as v1.4.5, released in May of 2020. +This version was chosen as a minimum based on reviewing the versions of +Zstandard available in a number of Linux distribution package repositories, +including LTS releases. + +Introduction of a ``compression`` namespace +------------------------------------------- + +Both the ``zstd`` and ``zstandard`` import names are claimed by projects on +PyPI. To avoid breaking users of one of the existing bindings, this PEP +proposes introducing a new namespace for compression libraries, +``compression.*``. This name is already pre-claimed by PyPI for use in the +standard library, so no one may take the package on PyPI. The new Zstandard +module will be ``compression.zstd``. Other compression modules will be +re-exported to the ``compression`` namespace and their current names will be +deprecated. This is both to avoid user confusion about where to find +compression modules and to future-proof adding new compression modules. + +Specification +============= + +The ``compression`` namespace +----------------------------- + +A new namespace package for compression modules will be introduced named +``compression``. The top-level module for this package will not contain code +to begin with, but a standard API for interacting with compression routines +may be added in the future. + +The ``compression.zstd`` module +------------------------------- + +A new module, ``compression.zstd`` will be introduced with Zstandard +compression APIs that match other compression modules in the standard library, +namely + +* ``compress`` / ``decompress`` - APIs for one-shot compression/decompression +* ``ZstdFile`` / ``open`` - APIs for interacting with streams and file-like + objects +* ``ZstdCompressor`` / ``ZstdDecompressor`` - APIs for incremental compression/ + decompression +* ``ZstdDict`` / ``train_dict`` / ``finalize_dict`` - APIs for interacting with + Zstandard dictionaries, which are useful for compressing many small chunks of + similar data + +Other compression modules +------------------------- + +Existing compression modules, namely ``lzma``, ``bz2``, and ``zlib``, will each +correspond to new sub-modules, ``compression.lzma``, ``compression.bz2``, and +``compression.zlib`` respectively. The ``compression`` sub-modules will be +alternate import names for the existing modules. The existing modules will emit +deprecation warnings (targeting Python 3.24 for removal) directing users to the +new ``compression`` namespace variants. + +Backwards Compatibility +======================= + +The main compatibility concern is usage of existing standard library +compression APIs. These will be deprecated, and may be removed in a future +version of Python (see open questions). This change should not be taken +lightly. However, given a long enough deprecation period, most users will +likely migrate to the new import names. Additionally, a libCST codemod could be +provided to automatically rewrite imports. + +Security Implications +===================== + +As with any new C code, especially code operating on potentially untrusted user +input, there are risks of memory safety issues. The authors plan on +contributing integration with libfuzzer to enable fuzzing the ``_zstd`` code +and ensure it is robust. Furthermore, there are a number of tests that exercise +the compression and decompression routines. + +Taking on a new dependency also always has security risks, but the ``libzstd`` +library participates in Meta's bug bounty program. Furthermore, the project +is widely used and fuzzed on each commit. + +How to Teach This +================= + +Documentation for the new module is in the reference implementation branch. If +existing compression modules are going to be moved to a ``compression`` +namespace, then the documentation for those modules will be updated as well. + +Reference Implementation +======================== + +The `reference implementation `_ +contains the ``_zstd`` C code, the ``compression.zstd`` code, modifications to +tarfile, shutil, and zipfile, and tests for each new API and integration added. +It also contains the re-exports of other compression modules. Deprecations for +the existing import names will be added once a decision is reached regarding +the open issues. + +Rejected Ideas +============== + +Name the module ``libzstd`` and do not make a new ``compression`` namespace +--------------------------------------------------------------------------- + +One option instead of making a new ``compression`` namespace would be to find +a different name, such as ``libzstd``, as the import name. However, the issue +of existing import names is likely to persist for future compression formats +added to the standard library. LZ4, a common high speed compression format, +has `a package on PyPI `_ ``lz4`` with the +import name ``lz4``. Instead of solving this issue for each compression format, +it is better to solve it once and for all by using the already-claimed +``compression`` namespace. + +Open Issues +=========== + +Should we remove old compression imports? +----------------------------------------- + +It would be confusing to indefinitely have ``lzma`` and ``compression.lzma`` +simultaneously. Ideally, ``import lzma`` should emit a deprecation for a future +Python version (maybe 3.24?). But should that deprecation exist indefinitely? +Should the old import names (e.g. ``import lzma``) eventually be removed? If +so, at which version? + +Could we keep the existing compression module imports as-is? +------------------------------------------------------------ + +The minimally disruptive change would be to add ``compression.zstd``, but not +deprecate and remove ``lzma``, ``bz2``, and ``zlib``, and not create +``compression.lzma`` etc. This has the potential to cause significant +confusion for users however. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. From cf0d3a5966a5ee077c2b7206c1601f61909acfbb Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Wed, 26 Mar 2025 09:27:30 -0700 Subject: [PATCH 02/16] Fix lint --- peps/pep-0781.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/peps/pep-0781.rst b/peps/pep-0781.rst index 3aa114bfdd0..6b816f1570e 100644 --- a/peps/pep-0781.rst +++ b/peps/pep-0781.rst @@ -1,10 +1,10 @@ -PEP: 0781 +PEP: 781 Title: Adding Zstandard to the standard library Author: Emma Harper Smith Sponsor: Gregory P. Smith Status: Draft Type: Standards Track -Created: TBD +Created: 2025-03-26 Python-Version: 3.14 Abstract @@ -53,7 +53,7 @@ significant speedup when adopting Zstandard for the conda package format .. epigraph:: - We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format. + We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format. -- `Anaconda blog on Zstandard adoption `_ From e7f8ab556d30e78ff6a3255d5b5cd60a94094c82 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Wed, 26 Mar 2025 16:03:38 -0700 Subject: [PATCH 03/16] Make the deprecation stronger, add a bunch more details --- peps/pep-0781.rst | 167 +++++++++++++++++++++++++++------------------- 1 file changed, 97 insertions(+), 70 deletions(-) diff --git a/peps/pep-0781.rst b/peps/pep-0781.rst index 6b816f1570e..444ad57f374 100644 --- a/peps/pep-0781.rst +++ b/peps/pep-0781.rst @@ -4,7 +4,7 @@ Author: Emma Harper Smith Sponsor: Gregory P. Smith Status: Draft Type: Standards Track -Created: 2025-03-26 +Created: 26-Mar-2025 Python-Version: 3.14 Abstract @@ -13,9 +13,10 @@ Abstract `Zstandard `_ is a widely adopted, mature, and highly efficient compression standard. This PEP proposes adding a new module to the Python standard library containing a Python wrapper around Meta's -``libzstd`` library, the default implementation. Additionally, to avoid name -collisions and present a unified interface, compression modules in the standard -library will be moved under a ``compression.*`` namespace. +``zstd`` library, the default implementation. Additionally, to avoid name +collisions with packages on PyPI and to present a unified interface to Python +users, compression modules in the standard library will be moved under a +``compression.*`` namespace package. Motivation ========== @@ -25,34 +26,43 @@ CPython has modules for several different compression formats, such as `zlib `bzip2 `_, and `lzma `_, each widely used. Including popular compression algorithms matches Python's "batteries included" -philosophy. The last compression algorithm added to the language was the -``lzma`` module, added in Python 3.3. Since that time, several new compression -formats have become very popular, including Zstandard. +philosophy of including widely useful standards and utilities. The last +compression module added to the language was ``lzma``, added in Python 3.3. +Since that time, several new compression formats have become very popular, +including Zstandard. Zstandard uses highly optimized implementations of modern +compression techniques such as `Asymmetric Numerical Systems (ANS) +`_ +to acheive high compression ratios, while still attaining high performance. Zstandard was released over a decade ago. In the intervening time, it has seen `widespread adoption in many different areas of computing `_, -including databases, filesystems, archive formats (including ``tar``), package -formats (including conda packages), and other file formats (including several -Apache formats like Arrow). - -There are multiple bindings to Zstandard for Python available on PyPI. One -goal with introducing an official module in the standard library is to reduce -confusion for Python users who want simple compression/decompression APIs. -These packages can continue providing extended APIs and bindings for other -Python implementations such as PyPy. - -Additionally, a long standing `open issue `_ -requesting Zstandard support in the ``tarfile`` module requires Zstandard -support in the standard library. This issue has the 5th most "thumbs up" of -open issues on the CPython tracker, and has garnered a significant amount of -discussion. - -Another use case of Zstandard compression could be to both reduce the size of -packages and increase installation speed for Python wheels. Anaconda found a -significant speedup when adopting Zstandard for the conda package format +including databases, filesystems, archive formats (including ``tar`` and +``zip``), package formats (including conda packages), and other file formats +(including several Apache formats like Arrow). + +There are several bindings to Zstandard for Python available on PyPI, each with +different APIs and choices of how to bind the ``zstd`` library. One goal with +introducing an official module in the standard library is to reduce confusion +for Python users who want simple compression/decompression APIs for Zstandard. +The existing packages can continue providing extended APIs and bindings for +other Python implementations such as PyPy or integrate features from newer +Zstandard versions. + +Another reason to add Zstandard support to the standard library is to resolve +a long standing `open issue `_ +requesting Zstandard support in the ``tarfile`` module. This issue has the 5th +most "thumbs up" of open issues on the CPython tracker, and has garnered a +significant amount of discussion and interest. The reference implementation for +this PEP contains integration with ``tarfile`` and would address this issue. + +Zstandard compression could also be used to make Python wheel packages smaller +and significantly faster to install. Anaconda found a sizeable speedup when +adopting Zstandard for the conda package format .. epigraph:: + Conda's download sizes are reduced ~30-40%, and extraction is dramatically faster. + [...] We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format. -- `Anaconda blog on Zstandard adoption `_ @@ -61,8 +71,9 @@ significant speedup when adopting Zstandard for the conda package format a comprehensive benchmark of many different compression libraries and formats, Zstandard has a significantly higher compression ratio compared to wheel's existing zlib-based compression. While this PEP does *not* prescribe any -changes to the wheel format, having Zstandard bindings in the standard library -would enable a future PEP to improve Python wheel packages. +changes to the wheel format or other packaging standards, having Zstandard +bindings in the standard library would enable a future PEP to improve the user +experience for Python wheel packages. Rationale ========= @@ -74,8 +85,9 @@ The implementation for this PEP is based on the `pyzstd project `_ to CPython by Ma Lin, who also wrote the `output buffer implementation used in the standard library today `_. -The project has since been taken over by Rogdham, but the APIs are similar to -the APIs for other compression modules such as ``bz2`` and ``lzma``. +The project has since been taken over by Rogdham and is published to PyPI. The +APIs in ``pyzstd`` are similar to the APIs for other compression modules in the +standard library such as ``bz2`` and ``lzma``. Changes and removals to ``pyzstd``'s APIs ----------------------------------------- @@ -84,10 +96,12 @@ n.b. maybe this should be an appendix? To keep the initial implementation simple and make it easier to review, several APIs were modified or removed completely. The "RichMem" API is removed as -CPython's output buffer does not use MREMAP. This could be integrated upstream -in a future change benefitting all compression libraries. The ``ZstdFile`` -implementation was re-written in Python to match ``lzma`` and other modules, -and reduce the amount of C code in need of review. +CPython's output buffer does not use `mremap(2) `_. +This could be integrated into CPython in a future change benefitting all +compression libraries, but is not necessary for the initial introduction of +Zstandard. The ``ZstdFile`` implementation was re-written in Python to match +``lzma`` and other compression modules, and reduce the amount of C code in need +of review. The ``compress_stream`` / ``decompress_stream`` functions were removed, as they were performance optimizations and can be replaced with using the ``open`` @@ -107,7 +121,10 @@ Minimum supported Zstandard version The minimum supported Zstandard was chosen as v1.4.5, released in May of 2020. This version was chosen as a minimum based on reviewing the versions of Zstandard available in a number of Linux distribution package repositories, -including LTS releases. +including LTS releases. This version choice is rather conservative to maximize +compatibility with existing LTS Linux distributions, but a newer Zstandard +version could likely be chosen given that newer Python releases are generally +packaged as part of newer distribution releases. Introduction of a ``compression`` namespace ------------------------------------------- @@ -115,12 +132,12 @@ Introduction of a ``compression`` namespace Both the ``zstd`` and ``zstandard`` import names are claimed by projects on PyPI. To avoid breaking users of one of the existing bindings, this PEP proposes introducing a new namespace for compression libraries, -``compression.*``. This name is already pre-claimed by PyPI for use in the -standard library, so no one may take the package on PyPI. The new Zstandard -module will be ``compression.zstd``. Other compression modules will be -re-exported to the ``compression`` namespace and their current names will be -deprecated. This is both to avoid user confusion about where to find -compression modules and to future-proof adding new compression modules. +``compression``. This name is already reserved on PyPI for use in the +standard library. The new Zstandard module will be ``compression.zstd``. +Other compression modules will be re-exported to the ``compression`` namespace +and their current import names will be deprecated. This is both to avoid user +confusion about where to find compression modules and to avoid the same issue +occuring when adding other new compression modules in the future. Specification ============= @@ -129,9 +146,9 @@ The ``compression`` namespace ----------------------------- A new namespace package for compression modules will be introduced named -``compression``. The top-level module for this package will not contain code -to begin with, but a standard API for interacting with compression routines -may be added in the future. +``compression``. The top-level module for this package will be empty to begin +with, but a standard API for interacting with compression routines may be +added in the future to the toplevel. The ``compression.zstd`` module ------------------------------- @@ -145,6 +162,9 @@ namely objects * ``ZstdCompressor`` / ``ZstdDecompressor`` - APIs for incremental compression/ decompression + +It will also contain some Zstandard-specific functionality + * ``ZstdDict`` / ``train_dict`` / ``finalize_dict`` - APIs for interacting with Zstandard dictionaries, which are useful for compressing many small chunks of similar data @@ -152,52 +172,59 @@ namely Other compression modules ------------------------- -Existing compression modules, namely ``lzma``, ``bz2``, and ``zlib``, will each -correspond to new sub-modules, ``compression.lzma``, ``compression.bz2``, and -``compression.zlib`` respectively. The ``compression`` sub-modules will be -alternate import names for the existing modules. The existing modules will emit -deprecation warnings (targeting Python 3.24 for removal) directing users to the -new ``compression`` namespace variants. +New import names ``compression.lzma``, ``compression.bz2``, and +``compression.zlib`` will be introduced for the existing standard library +compression modules ``lzma``, ``bz2``, and ``zlib`` respectively. The new +modules will simply re-export the contents of the existing modules. Importing +the existing module import names will emit a deprecation warning, with a +planned removal in 3.24. The documentation for these modules will be updated +to discuss the planned deprecation and removal. + +The ``_compression`` module, given that it is marked private, will be +immediately renamed to ``compression._common.streams``. The new name was +selected due to the current contents of the module being I/O related helpers +for stream APIs (e.g. ``LZMAFile``) in standard library compression modules. Backwards Compatibility ======================= The main compatibility concern is usage of existing standard library -compression APIs. These will be deprecated, and may be removed in a future -version of Python (see open questions). This change should not be taken -lightly. However, given a long enough deprecation period, most users will -likely migrate to the new import names. Additionally, a libCST codemod could be -provided to automatically rewrite imports. +compression APIs with the existing import names. These names will be +deprecated, and will be removed in 3.24. Given the long deprecation period, +most users will likely migrate to the new import names well before then. +Additionally, a libCST codemod can be provided to automatically rewrite +imports, easing the migration. Security Implications ===================== As with any new C code, especially code operating on potentially untrusted user -input, there are risks of memory safety issues. The authors plan on +input, there are risks of memory safety issues. The author plans on contributing integration with libfuzzer to enable fuzzing the ``_zstd`` code and ensure it is robust. Furthermore, there are a number of tests that exercise -the compression and decompression routines. +the compression and decompression routines. These tests pass without error when +compiled with AddressSanitizer. -Taking on a new dependency also always has security risks, but the ``libzstd`` -library participates in Meta's bug bounty program. Furthermore, the project -is widely used and fuzzed on each commit. +Taking on a new dependency also always has security risks, but the ``zstd`` +library is mature, fuzzed on each commit, and `participates in Meta's bug bounty +program `_. How to Teach This ================= -Documentation for the new module is in the reference implementation branch. If -existing compression modules are going to be moved to a ``compression`` -namespace, then the documentation for those modules will be updated as well. +Documentation for the new module is in the reference implementation branch. The +documentation for other modules will be updated to discuss the deprecation of +their existing import names, and how to migrate. Reference Implementation ======================== The `reference implementation `_ contains the ``_zstd`` C code, the ``compression.zstd`` code, modifications to -tarfile, shutil, and zipfile, and tests for each new API and integration added. -It also contains the re-exports of other compression modules. Deprecations for -the existing import names will be added once a decision is reached regarding -the open issues. +``tarfile``, ``shutil``, and ``zipfile``, and tests for each new API and +integration added. It also contains the re-exports of other compression +modules. Deprecations for the existing import names will be added once a +decision is reached regarding the open issues. Rejected Ideas ============== @@ -209,7 +236,7 @@ One option instead of making a new ``compression`` namespace would be to find a different name, such as ``libzstd``, as the import name. However, the issue of existing import names is likely to persist for future compression formats added to the standard library. LZ4, a common high speed compression format, -has `a package on PyPI `_ ``lz4`` with the +has `a package on PyPI `_, ``lz4``, with the import name ``lz4``. Instead of solving this issue for each compression format, it is better to solve it once and for all by using the already-claimed ``compression`` namespace. @@ -217,8 +244,8 @@ it is better to solve it once and for all by using the already-claimed Open Issues =========== -Should we remove old compression imports? ------------------------------------------ +Should we keep old compression imports? +--------------------------------------- It would be confusing to indefinitely have ``lzma`` and ``compression.lzma`` simultaneously. Ideally, ``import lzma`` should emit a deprecation for a future From a8c7e2d32dcb09d8d519e8500b92f62626429e29 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Wed, 26 Mar 2025 16:32:34 -0700 Subject: [PATCH 04/16] Add more about the compression namespace --- peps/pep-0781.rst | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/peps/pep-0781.rst b/peps/pep-0781.rst index 444ad57f374..da7551d583f 100644 --- a/peps/pep-0781.rst +++ b/peps/pep-0781.rst @@ -135,9 +135,21 @@ proposes introducing a new namespace for compression libraries, ``compression``. This name is already reserved on PyPI for use in the standard library. The new Zstandard module will be ``compression.zstd``. Other compression modules will be re-exported to the ``compression`` namespace -and their current import names will be deprecated. This is both to avoid user -confusion about where to find compression modules and to avoid the same issue -occuring when adding other new compression modules in the future. +and their current import names will be deprecated. + +Providing a common namespace for compression modules has several advantages. +First, it reduces user confusion about where to find compression modules. +Second, the top level ``compression`` module could provide information on which +compression formats are available, similar to ``hashlib``'s +``algorithms_available``. If :pep:`775` is accepted, a +``compression.algorithms_guaranteed`` could be provided as well, listing +``zlib``. Finally, a ``compression`` namespace prevents future issues with +merging other compression formats into the standard library. New compression +formats will likely be published to PyPI prior to integration into +CPython. Therefore, any new compression format import name will likely already +be claimed by the time a module would be considered for inclusion in CPython. +Putting compression modules under a package prefix prevents issues with +potential future name clashes. Specification ============= From 1dc5e32019a094c00fbe4d74b460792bc0a9d147 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sat, 29 Mar 2025 10:18:15 -0700 Subject: [PATCH 05/16] Move compression section earlier --- peps/pep-0781.rst | 50 +++++++++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/peps/pep-0781.rst b/peps/pep-0781.rst index da7551d583f..ae4cc80db6b 100644 --- a/peps/pep-0781.rst +++ b/peps/pep-0781.rst @@ -78,6 +78,31 @@ experience for Python wheel packages. Rationale ========= +Introduction of a ``compression`` namespace +------------------------------------------- + +Both the ``zstd`` and ``zstandard`` import names are claimed by projects on +PyPI. To avoid breaking users of one of the existing bindings, this PEP +proposes introducing a new namespace for compression libraries, +``compression``. This name is already reserved on PyPI for use in the +standard library. The new Zstandard module will be ``compression.zstd``. +Other compression modules will be re-exported to the ``compression`` namespace +and their current import names will be deprecated. + +Providing a common namespace for compression modules has several advantages. +First, it reduces user confusion about where to find compression modules. +Second, the top level ``compression`` module could provide information on which +compression formats are available, similar to ``hashlib``'s +``algorithms_available``. If :pep:`775` is accepted, a +``compression.algorithms_guaranteed`` could be provided as well, listing +``zlib``. Finally, a ``compression`` namespace prevents future issues with +merging other compression formats into the standard library. New compression +formats will likely be published to PyPI prior to integration into +CPython. Therefore, any new compression format import name will likely already +be claimed by the time a module would be considered for inclusion in CPython. +Putting compression modules under a package prefix prevents issues with +potential future name clashes. + Implementation based on ``pyzstd`` ---------------------------------- @@ -126,31 +151,6 @@ compatibility with existing LTS Linux distributions, but a newer Zstandard version could likely be chosen given that newer Python releases are generally packaged as part of newer distribution releases. -Introduction of a ``compression`` namespace -------------------------------------------- - -Both the ``zstd`` and ``zstandard`` import names are claimed by projects on -PyPI. To avoid breaking users of one of the existing bindings, this PEP -proposes introducing a new namespace for compression libraries, -``compression``. This name is already reserved on PyPI for use in the -standard library. The new Zstandard module will be ``compression.zstd``. -Other compression modules will be re-exported to the ``compression`` namespace -and their current import names will be deprecated. - -Providing a common namespace for compression modules has several advantages. -First, it reduces user confusion about where to find compression modules. -Second, the top level ``compression`` module could provide information on which -compression formats are available, similar to ``hashlib``'s -``algorithms_available``. If :pep:`775` is accepted, a -``compression.algorithms_guaranteed`` could be provided as well, listing -``zlib``. Finally, a ``compression`` namespace prevents future issues with -merging other compression formats into the standard library. New compression -formats will likely be published to PyPI prior to integration into -CPython. Therefore, any new compression format import name will likely already -be claimed by the time a module would be considered for inclusion in CPython. -Putting compression modules under a package prefix prevents issues with -potential future name clashes. - Specification ============= From 160810ee2adcf453e66153a3ecafa7eadfa80aec Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sat, 29 Mar 2025 12:25:57 -0700 Subject: [PATCH 06/16] Respond to review by Rogdham --- peps/pep-0781.rst | 61 ++++++++++++++++++++++------------------------- 1 file changed, 29 insertions(+), 32 deletions(-) diff --git a/peps/pep-0781.rst b/peps/pep-0781.rst index ae4cc80db6b..a8a24c94040 100644 --- a/peps/pep-0781.rst +++ b/peps/pep-0781.rst @@ -103,6 +103,17 @@ be claimed by the time a module would be considered for inclusion in CPython. Putting compression modules under a package prefix prevents issues with potential future name clashes. +Code that would like to remain compatible across Python versions may use the +following pattern to ensure compatibility:: + + try: + from compression.lzma import LZMAFile + except ImportError: + from lzma import LZMAFile + +This will use the newer import name when available and fall back to the old +name otherwise. + Implementation based on ``pyzstd`` ---------------------------------- @@ -114,32 +125,6 @@ The project has since been taken over by Rogdham and is published to PyPI. The APIs in ``pyzstd`` are similar to the APIs for other compression modules in the standard library such as ``bz2`` and ``lzma``. -Changes and removals to ``pyzstd``'s APIs ------------------------------------------ - -n.b. maybe this should be an appendix? - -To keep the initial implementation simple and make it easier to review, several -APIs were modified or removed completely. The "RichMem" API is removed as -CPython's output buffer does not use `mremap(2) `_. -This could be integrated into CPython in a future change benefitting all -compression libraries, but is not necessary for the initial introduction of -Zstandard. The ``ZstdFile`` implementation was re-written in Python to match -``lzma`` and other compression modules, and reduce the amount of C code in need -of review. - -The ``compress_stream`` / ``decompress_stream`` functions were removed, as they -were performance optimizations and can be replaced with using the ``open`` -function from the Zstandard module. - -The other major change is the ``level_or_options`` argument was split into two -independent arguments to keep the argument parsing clearer and improve clarity -of usage. - -Finally, features requiring newer versions of Zstandard were removed, which -is mostly the support for the ``ZSTD_c_targetCBlockSize`` compression -parameter. - Minimum supported Zstandard version ----------------------------------- @@ -181,16 +166,28 @@ It will also contain some Zstandard-specific functionality Zstandard dictionaries, which are useful for compressing many small chunks of similar data +``libzstd`` optional dependency +------------------------------- + +The ``libzstd`` library will become an optional dependency of CPython. If the +library is not available, the ``compression.zstd`` module will be unavailable. +This is handled automatically on Unix platforms as part of the normal build +environment detection. + +On Windows, ``libzstd`` will be added to +`the source dependencies `_ +used to build libraries CPython depends on for Windows. + Other compression modules ------------------------- New import names ``compression.lzma``, ``compression.bz2``, and -``compression.zlib`` will be introduced for the existing standard library -compression modules ``lzma``, ``bz2``, and ``zlib`` respectively. The new -modules will simply re-export the contents of the existing modules. Importing -the existing module import names will emit a deprecation warning, with a -planned removal in 3.24. The documentation for these modules will be updated -to discuss the planned deprecation and removal. +``compression.zlib`` will be introduced re-exporting the contents of the +existing ``lzma``, ``bz2``, and ``zlib`` modules respectively. Starting with +Python 3.14, the existing modules will emit a deprecation warning on import. +In Python 3.24, the existing modules will be removed and code must use the +``compression`` sub-modules. The documentation for these modules will be +updated to discuss the planned deprecation and removal. The ``_compression`` module, given that it is marked private, will be immediately renamed to ``compression._common.streams``. The new name was From 464e1ccbfcab851e8031bdbbaa0f12ce7c4d8c2c Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Tue, 1 Apr 2025 13:45:02 -0700 Subject: [PATCH 07/16] Update PEP number to 784 --- peps/{pep-0781.rst => pep-0784.rst} | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) rename peps/{pep-0781.rst => pep-0784.rst} (99%) diff --git a/peps/pep-0781.rst b/peps/pep-0784.rst similarity index 99% rename from peps/pep-0781.rst rename to peps/pep-0784.rst index a8a24c94040..46c0c21e80a 100644 --- a/peps/pep-0781.rst +++ b/peps/pep-0784.rst @@ -1,10 +1,10 @@ -PEP: 781 +PEP: 784 Title: Adding Zstandard to the standard library Author: Emma Harper Smith Sponsor: Gregory P. Smith Status: Draft Type: Standards Track -Created: 26-Mar-2025 +Created: 1-Apr-2025 Python-Version: 3.14 Abstract From fffc181e943535188087c602c7fe6351cc3fc137 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Tue, 1 Apr 2025 19:40:28 -0700 Subject: [PATCH 08/16] Fix lint and date --- peps/pep-0784.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0784.rst b/peps/pep-0784.rst index 46c0c21e80a..33837db6ec7 100644 --- a/peps/pep-0784.rst +++ b/peps/pep-0784.rst @@ -4,7 +4,7 @@ Author: Emma Harper Smith Sponsor: Gregory P. Smith Status: Draft Type: Standards Track -Created: 1-Apr-2025 +Created: 02-Apr-2025 Python-Version: 3.14 Abstract From fb4e9d8af0e5cce432ee91960bada77e6eafd117 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Fri, 4 Apr 2025 12:54:27 -0700 Subject: [PATCH 09/16] Re-target to Python 3.15 --- peps/pep-0784.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/peps/pep-0784.rst b/peps/pep-0784.rst index 33837db6ec7..708580996f1 100644 --- a/peps/pep-0784.rst +++ b/peps/pep-0784.rst @@ -4,8 +4,8 @@ Author: Emma Harper Smith Sponsor: Gregory P. Smith Status: Draft Type: Standards Track -Created: 02-Apr-2025 -Python-Version: 3.14 +Created: 04-Apr-2025 +Python-Version: 3.15 Abstract ======== @@ -184,8 +184,8 @@ Other compression modules New import names ``compression.lzma``, ``compression.bz2``, and ``compression.zlib`` will be introduced re-exporting the contents of the existing ``lzma``, ``bz2``, and ``zlib`` modules respectively. Starting with -Python 3.14, the existing modules will emit a deprecation warning on import. -In Python 3.24, the existing modules will be removed and code must use the +Python 3.15, the existing modules will emit a deprecation warning on import. +In Python 3.25, the existing modules will be removed and code must use the ``compression`` sub-modules. The documentation for these modules will be updated to discuss the planned deprecation and removal. @@ -199,7 +199,7 @@ Backwards Compatibility The main compatibility concern is usage of existing standard library compression APIs with the existing import names. These names will be -deprecated, and will be removed in 3.24. Given the long deprecation period, +deprecated, and will be removed in 3.25. Given the long deprecation period, most users will likely migrate to the new import names well before then. Additionally, a libCST codemod can be provided to automatically rewrite imports, easing the migration. @@ -258,7 +258,7 @@ Should we keep old compression imports? It would be confusing to indefinitely have ``lzma`` and ``compression.lzma`` simultaneously. Ideally, ``import lzma`` should emit a deprecation for a future -Python version (maybe 3.24?). But should that deprecation exist indefinitely? +Python version (maybe 3.25?). But should that deprecation exist indefinitely? Should the old import names (e.g. ``import lzma``) eventually be removed? If so, at which version? From c4544a1d349cba0c93583120dfd04bb9b3083425 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 6 Apr 2025 17:51:17 -0700 Subject: [PATCH 10/16] Add note about zipfile integration --- peps/pep-0784.rst | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/peps/pep-0784.rst b/peps/pep-0784.rst index 708580996f1..f9a1e6b52fb 100644 --- a/peps/pep-0784.rst +++ b/peps/pep-0784.rst @@ -52,8 +52,12 @@ Another reason to add Zstandard support to the standard library is to resolve a long standing `open issue `_ requesting Zstandard support in the ``tarfile`` module. This issue has the 5th most "thumbs up" of open issues on the CPython tracker, and has garnered a -significant amount of discussion and interest. The reference implementation for -this PEP contains integration with ``tarfile`` and would address this issue. +significant amount of discussion and interest. Additionally, the `ZIP format +standardizes a Zstandard compression format ID +`_, +and integration with ``zipfile`` would allow opening ZIP archives using +Zstandard compression. The reference implementation for this PEP contains +integration with the ``zipfile``, ``tarfile``, and ``shutil`` modules. Zstandard compression could also be used to make Python wheel packages smaller and significantly faster to install. Anaconda found a sizeable speedup when From 117338d3efd1c8b72eac3e1e9e960dc411ef4b4a Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 6 Apr 2025 19:54:06 -0700 Subject: [PATCH 11/16] Rewrite the early motivation section based on Greg's advise --- peps/pep-0784.rst | 43 ++++++++++++++++++++++++++----------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/peps/pep-0784.rst b/peps/pep-0784.rst index f9a1e6b52fb..b6cf250a868 100644 --- a/peps/pep-0784.rst +++ b/peps/pep-0784.rst @@ -26,19 +26,28 @@ CPython has modules for several different compression formats, such as `zlib `bzip2 `_, and `lzma `_, each widely used. Including popular compression algorithms matches Python's "batteries included" -philosophy of including widely useful standards and utilities. The last +philosophy of incorporating widely useful standards and utilities. The last compression module added to the language was ``lzma``, added in Python 3.3. -Since that time, several new compression formats have become very popular, -including Zstandard. Zstandard uses highly optimized implementations of modern -compression techniques such as `Asymmetric Numerical Systems (ANS) -`_ -to acheive high compression ratios, while still attaining high performance. - -Zstandard was released over a decade ago. In the intervening time, it has seen -`widespread adoption in many different areas of computing `_, -including databases, filesystems, archive formats (including ``tar`` and -``zip``), package formats (including conda packages), and other file formats -(including several Apache formats like Arrow). + +Since then, Zstandard has become the modern de facto preferred compression +library for both high performance compression and decompression attaining high +compression ratios at reasonable CPU and memory cost. Zstandard achieves a much +higher compression ratio than bzip2 or zlib (DEFLATE) while decompressing +significantly faster than LZMA. + +Zstandard has seen `widespread adoption in many different areas of computing +`_. The numerous hardware +implementations demonstrate long-term commitment to Zstandard and an +expectation that Zstandard will stay the de facto choice for compression for +years to come. Zstandard compression is also implemented in both the ZFS and +Btrfs filesystems. + +Zstandard's highly efficient compression has supplanted other modern +compression formats, such as brotli, lzo, and ucl due to it's highly efficient +compression. While `LZ4 `_ is still used in very high +throughput scenarios, Zstandard can also be used in some of these contexts. +While inclusion of LZ4 is out of scope, it would be a compelling future +addition to the ``compression`` namespace introduced by this PEP. There are several bindings to Zstandard for Python available on PyPI, each with different APIs and choices of how to bind the ``zstd`` library. One goal with @@ -186,11 +195,11 @@ Other compression modules ------------------------- New import names ``compression.lzma``, ``compression.bz2``, and -``compression.zlib`` will be introduced re-exporting the contents of the -existing ``lzma``, ``bz2``, and ``zlib`` modules respectively. Starting with -Python 3.15, the existing modules will emit a deprecation warning on import. -In Python 3.25, the existing modules will be removed and code must use the -``compression`` sub-modules. The documentation for these modules will be +``compression.zlib`` will be introduced in Python 3.14 re-exporting the +contents of the existing ``lzma``, ``bz2``, and ``zlib`` modules respectively. +Starting with Python 3.19, the existing modules will emit a deprecation warning +on import. In Python 3.24, the existing modules will be removed and code must +use the ``compression`` sub-modules. The documentation for these modules will be updated to discuss the planned deprecation and removal. The ``_compression`` module, given that it is marked private, will be From 3893bfa414591bb25e07efe1ae7486a7c02a3691 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 6 Apr 2025 19:56:13 -0700 Subject: [PATCH 12/16] Re-target to Python 3.14, optimistically --- peps/pep-0784.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/peps/pep-0784.rst b/peps/pep-0784.rst index b6cf250a868..f477b3f3273 100644 --- a/peps/pep-0784.rst +++ b/peps/pep-0784.rst @@ -4,8 +4,8 @@ Author: Emma Harper Smith Sponsor: Gregory P. Smith Status: Draft Type: Standards Track -Created: 04-Apr-2025 -Python-Version: 3.15 +Created: 06-Apr-2025 +Python-Version: 3.14 Abstract ======== @@ -212,7 +212,7 @@ Backwards Compatibility The main compatibility concern is usage of existing standard library compression APIs with the existing import names. These names will be -deprecated, and will be removed in 3.25. Given the long deprecation period, +deprecated, and will be removed in 3.24. Given the long deprecation period, most users will likely migrate to the new import names well before then. Additionally, a libCST codemod can be provided to automatically rewrite imports, easing the migration. @@ -271,7 +271,7 @@ Should we keep old compression imports? It would be confusing to indefinitely have ``lzma`` and ``compression.lzma`` simultaneously. Ideally, ``import lzma`` should emit a deprecation for a future -Python version (maybe 3.25?). But should that deprecation exist indefinitely? +Python version (maybe 3.24?). But should that deprecation exist indefinitely? Should the old import names (e.g. ``import lzma``) eventually be removed? If so, at which version? From c4500d6fbb3f4878f375284bf66f8d342da8dc00 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 6 Apr 2025 20:19:38 -0700 Subject: [PATCH 13/16] Rewrite the deprecation/removal timeline --- peps/pep-0784.rst | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/peps/pep-0784.rst b/peps/pep-0784.rst index f477b3f3273..3cea0641984 100644 --- a/peps/pep-0784.rst +++ b/peps/pep-0784.rst @@ -197,25 +197,35 @@ Other compression modules New import names ``compression.lzma``, ``compression.bz2``, and ``compression.zlib`` will be introduced in Python 3.14 re-exporting the contents of the existing ``lzma``, ``bz2``, and ``zlib`` modules respectively. -Starting with Python 3.19, the existing modules will emit a deprecation warning -on import. In Python 3.24, the existing modules will be removed and code must -use the ``compression`` sub-modules. The documentation for these modules will be -updated to discuss the planned deprecation and removal. The ``_compression`` module, given that it is marked private, will be immediately renamed to ``compression._common.streams``. The new name was selected due to the current contents of the module being I/O related helpers for stream APIs (e.g. ``LZMAFile``) in standard library compression modules. +Compression module migration timeline +------------------------------------- + +Existing modules will emit a ``DeprecationWarning`` in the Python +release following the last Python without the ``compression`` module leaving +support. For example, if the ``compression`` namespace is introduced in 3.14, +then the ``DeprecationWarnings`` would be emitted in 3.19, the next release +after 3.13 reaches end of life. Following the standard deprecation timeline +specified in :pep:`387`, in Python 3.24 the existing modules will be removed +and code must use the ``compression`` sub-modules. The documentation for these +modules will be updated to discuss the planned deprecation and removal +timelines. + + Backwards Compatibility ======================= The main compatibility concern is usage of existing standard library compression APIs with the existing import names. These names will be -deprecated, and will be removed in 3.24. Given the long deprecation period, -most users will likely migrate to the new import names well before then. -Additionally, a libCST codemod can be provided to automatically rewrite -imports, easing the migration. +deprecated in 3.19 and will be removed in 3.24. Given the long coexistance of +the modules and a 5 year deprecation period, most users will likely migrate to +the new import names well before then. Additionally, a libCST codemod can be +provided to automatically rewrite imports, easing the migration. Security Implications ===================== From 2d9aab4cf6d3b261fd64c9f112c326134c29d555 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 6 Apr 2025 20:35:00 -0700 Subject: [PATCH 14/16] Add Greg to CODEOWNERS for the PEP --- .github/CODEOWNERS | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 0c90dfc9aa7..f041ebaef93 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -661,6 +661,7 @@ peps/pep-0779.rst @Yhg1s @colesbury @mpage peps/pep-0780.rst @lysnikolaou peps/pep-0781.rst @methane peps/pep-0782.rst @vstinner +peps/pep-0784.rst @gpshead # ... peps/pep-0789.rst @njsmith # ... From 224dd470163058dfc96f5d200798eadeef6f2218 Mon Sep 17 00:00:00 2001 From: Emma Smith Date: Sun, 6 Apr 2025 21:14:49 -0700 Subject: [PATCH 15/16] Remove extraneous apostrophe Co-authored-by: Jelle Zijlstra --- peps/pep-0784.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0784.rst b/peps/pep-0784.rst index 3cea0641984..2e3355c7d20 100644 --- a/peps/pep-0784.rst +++ b/peps/pep-0784.rst @@ -43,7 +43,7 @@ years to come. Zstandard compression is also implemented in both the ZFS and Btrfs filesystems. Zstandard's highly efficient compression has supplanted other modern -compression formats, such as brotli, lzo, and ucl due to it's highly efficient +compression formats, such as brotli, lzo, and ucl due to its highly efficient compression. While `LZ4 `_ is still used in very high throughput scenarios, Zstandard can also be used in some of these contexts. While inclusion of LZ4 is out of scope, it would be a compelling future From a1db1e35744933781ae713bf756dc61942cf3c79 Mon Sep 17 00:00:00 2001 From: Emma Harper Smith Date: Sun, 6 Apr 2025 21:35:56 -0700 Subject: [PATCH 16/16] Remove open issues section --- peps/pep-0784.rst | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/peps/pep-0784.rst b/peps/pep-0784.rst index 2e3355c7d20..fef83d8fce7 100644 --- a/peps/pep-0784.rst +++ b/peps/pep-0784.rst @@ -273,26 +273,6 @@ import name ``lz4``. Instead of solving this issue for each compression format, it is better to solve it once and for all by using the already-claimed ``compression`` namespace. -Open Issues -=========== - -Should we keep old compression imports? ---------------------------------------- - -It would be confusing to indefinitely have ``lzma`` and ``compression.lzma`` -simultaneously. Ideally, ``import lzma`` should emit a deprecation for a future -Python version (maybe 3.24?). But should that deprecation exist indefinitely? -Should the old import names (e.g. ``import lzma``) eventually be removed? If -so, at which version? - -Could we keep the existing compression module imports as-is? ------------------------------------------------------------- - -The minimally disruptive change would be to add ``compression.zstd``, but not -deprecate and remove ``lzma``, ``bz2``, and ``zlib``, and not create -``compression.lzma`` etc. This has the potential to cause significant -confusion for users however. - Copyright =========