Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 88 additions & 62 deletions peps/pep-0784.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,97 +2,112 @@ PEP: 784
Title: Adding Zstandard to the standard library
Author: Emma Harper Smith <emma@python.org>
Sponsor: Gregory P. Smith <greg@krypto.org>
Discussions-To: https://discuss.python.org/t/87377
Status: Draft
Type: Standards Track
Created: 06-Apr-2025
Python-Version: 3.14
Post-History:
`07-Apr-2025 <https://discuss.python.org/t/87377>`__,


Abstract
========

`Zstandard <https://facebook.github.io/zstd/>`_ is a widely adopted, mature,
and highly efficient compression standard. This PEP proposes adding a new
module to the Python standard library containing a Python wrapper around Meta's
``zstd`` library, the default implementation. Additionally, to avoid name
collisions with packages on PyPI and to present a unified interface to Python
users, compression modules in the standard library will be moved under a
``compression.*`` namespace package.
`Zstandard`_ is a widely adopted, mature, and highly efficient compression
standard. This PEP proposes adding a new module to the Python standard library
containing a Python wrapper around Meta's |zstd| library, the default
implementation. Additionally, to avoid name collisions with packages on PyPI
and to present a unified interface to Python users, compression modules in the
standard library will be moved under a ``compression.*`` package.

.. |zstd| replace:: ``zstd``
.. _zstd: https://facebook.github.io/zstd/
.. _Zstandard: https://facebook.github.io/zstd/


Motivation
==========

CPython has modules for several different compression formats, such as `zlib
(DEFLATE) <https://docs.python.org/3/library/zlib.html>`_,
`bzip2 <https://docs.python.org/3/library/bz2.html>`_,
and `lzma <https://docs.python.org/3/library/lzma.html>`_, each widely used.
Including popular compression algorithms matches Python's "batteries included"
philosophy of incorporating widely useful standards and utilities. The last
compression module added to the language was ``lzma``, added in Python 3.3.
CPython has modules for several different compression formats, such as
:mod:`zlib (DEFLATE) <zlib>`, :mod:`bzip2 <bz2>`, and :mod:`lzma <lzma>`,
each widely used. Including popular compression algorithms matches Python's
"batteries included" philosophy of incorporating widely useful standards and
utilities. :mod:`!lzma` is the most recent such module, added in Python 3.3.

Since then, Zstandard has become the modern de facto preferred compression
Since then, Zstandard has become the modern *de facto* preferred compression
library for both high performance compression and decompression attaining high
compression ratios at reasonable CPU and memory cost. Zstandard achieves a much
higher compression ratio than bzip2 or zlib (DEFLATE) while decompressing
significantly faster than LZMA.

Zstandard has seen `widespread adoption in many different areas of computing
<https://facebook.github.io/zstd/#references>`_. The numerous hardware
implementations demonstrate long-term commitment to Zstandard and an
expectation that Zstandard will stay the de facto choice for compression for
years to come. Zstandard compression is also implemented in both the ZFS and
Btrfs filesystems.
Zstandard has seen `widespread adoption`_ in many different areas of computing.
The numerous hardware implementations demonstrate long-term commitment to
Zstandard and an expectation that Zstandard will stay the *de facto* choice for
compression for years to come. Zstandard compression is also implemented in
both the ZFS_ and Btrfs_ filesystems.

Zstandard's highly efficient compression has supplanted other modern
compression formats, such as brotli, lzo, and ucl due to its highly efficient
compression. While `LZ4 <https://lz4.org/>`_ is still used in very high
throughput scenarios, Zstandard can also be used in some of these contexts.
compression formats, such as brotli_, lzo_, and ucl_ due to its highly
efficient compression. While `LZ4`_ is still used in very high throughput
scenarios, Zstandard can also be used in some of these contexts.
While inclusion of LZ4 is out of scope, it would be a compelling future
addition to the ``compression`` namespace introduced by this PEP.

There are several bindings to Zstandard for Python available on PyPI, each with
different APIs and choices of how to bind the ``zstd`` library. One goal with
introducing an official module in the standard library is to reduce confusion
for Python users who want simple compression/decompression APIs for Zstandard.
The existing packages can continue providing extended APIs and bindings for
other Python implementations such as PyPy or integrate features from newer
Zstandard versions.
The existing packages can continue providing extended APIs or integrate
features from newer Zstandard versions.

Another reason to add Zstandard support to the standard library is to resolve
a long standing `open issue <https://github.com/python/cpython/issues/81276>`_
requesting Zstandard support in the ``tarfile`` module. This issue has the 5th
most "thumbs up" of open issues on the CPython tracker, and has garnered a
significant amount of discussion and interest. Additionally, the `ZIP format
standardizes a Zstandard compression format ID
<https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.8.TXT>`_,
and integration with ``zipfile`` would allow opening ZIP archives using
Zstandard compression. The reference implementation for this PEP contains
integration with the ``zipfile``, ``tarfile``, and ``shutil`` modules.
a long standing open issue (`python/cpython#81276`_) requesting Zstandard
support in the :mod:`tarfile` module. This issue has the 5th most "thumbs up"
of open issues on the CPython tracker, and has garnered a significant amount of
discussion and interest. Additionally, the ZIP format standardizes a
`Zstandard compression format ID`_, and integration with the :mod:`zipfile`
module would allow opening ZIP archives using Zstandard compression. The
reference implementation for this PEP contains integration with the
:mod:`!zipfile`, :mod:`!tarfile`, and :mod:`shutil` modules.

Zstandard compression could also be used to make Python wheel packages smaller
and significantly faster to install. Anaconda found a sizeable speedup when
adopting Zstandard for the conda package format
adopting Zstandard for the conda package format:

.. epigraph::

Conda's download sizes are reduced ~30-40%, and extraction is dramatically faster.
[...]
We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format.

-- `Anaconda blog on Zstandard adoption <https://www.anaconda.com/blog/how-we-made-conda-faster-4-7>`_
-- `Anaconda blog on Zstandard adoption`_

`According to lzbench <https://github.com/inikep/lzbench?tab=readme-ov-file#benchmarks>`_,
a comprehensive benchmark of many different compression libraries and formats,
Zstandard has a significantly higher compression ratio compared to wheel's
existing zlib-based compression. While this PEP does *not* prescribe any
changes to the wheel format or other packaging standards, having Zstandard
bindings in the standard library would enable a future PEP to improve the user
experience for Python wheel packages.
existing zlib-based compression, `according to lzbench`_, a comprehensive
benchmark of many different compression libraries and formats.
While this PEP does *not* prescribe any changes to the wheel format or other
packaging standards, having Zstandard bindings in the standard library would
enable a future PEP to improve the user experience for Python wheel packages.

.. _widespread adoption: https://facebook.github.io/zstd/#references
.. _ZFS: https://en.wikipedia.org/wiki/ZFS
.. _Btrfs: https://btrfs.readthedocs.io/
.. _brotli: https://brotli.org/
.. _lzo: https://www.oberhumer.com/opensource/lzo/
.. _ucl: https://www.oberhumer.com/opensource/ucl/
.. _LZ4: https://lz4.org/
.. _python/cpython#81276: https://github.com/python/cpython/issues/81276
.. _Zstandard compression format ID: https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.8.TXT
.. _according to lzbench: https://github.com/inikep/lzbench#benchmarks
.. _Anaconda blog on Zstandard adoption: https://www.anaconda.com/blog/how-we-made-conda-faster-4-7


Rationale
=========

Introduction of a ``compression`` namespace
-------------------------------------------
Introduction of a ``compression`` package
-----------------------------------------

Both the ``zstd`` and ``zstandard`` import names are claimed by projects on
PyPI. To avoid breaking users of one of the existing bindings, this PEP
Expand Down Expand Up @@ -130,13 +145,17 @@ name otherwise.
Implementation based on ``pyzstd``
----------------------------------

The implementation for this PEP is based on the `pyzstd project <https://github.com/Rogdham/pyzstd>`_.
This project was chosen as the code was `originally written to be upstreamed <https://github.com/python/cpython/issues/81276#issuecomment-1093824963>`_
to CPython by Ma Lin, who also wrote the `output buffer implementation used in
the standard library today <https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231>`_.
The implementation for this PEP is based on the `pyzstd project`_.
This project was chosen as the code was `originally written to be upstreamed`_
to CPython by Ma Lin, who also wrote the `output buffer implementation`_ used in
the standard library today.
The project has since been taken over by Rogdham and is published to PyPI. The
APIs in ``pyzstd`` are similar to the APIs for other compression modules in the
standard library such as ``bz2`` and ``lzma``.
standard library such as :mod:`!bz2` and :mod:`!lzma`.

.. _pyzstd project: https://github.com/Rogdham/pyzstd
.. _originally written to be upstreamed: https://github.com/python/cpython/issues/81276#issuecomment-1093824963
.. _output buffer implementation: https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231

Minimum supported Zstandard version
-----------------------------------
Expand All @@ -149,13 +168,14 @@ compatibility with existing LTS Linux distributions, but a newer Zstandard
version could likely be chosen given that newer Python releases are generally
packaged as part of newer distribution releases.


Specification
=============

The ``compression`` namespace
-----------------------------

A new namespace package for compression modules will be introduced named
A new namespace for compression modules will be introduced named
``compression``. The top-level module for this package will be empty to begin
with, but a standard API for interacting with compression routines may be
added in the future to the toplevel.
Expand All @@ -167,17 +187,18 @@ A new module, ``compression.zstd`` will be introduced with Zstandard
compression APIs that match other compression modules in the standard library,
namely

* ``compress`` / ``decompress`` - APIs for one-shot compression/decompression
* ``ZstdFile`` / ``open`` - APIs for interacting with streams and file-like
objects
* ``ZstdCompressor`` / ``ZstdDecompressor`` - APIs for incremental compression/
decompression
* :func:`!compress` / :func:`!decompress` - APIs for one-shot compression
or decompression
* :class:`!ZstdFile` / :func:`!open` - APIs for interacting with streams
and file-like objects
* :class:`!ZstdCompressor` / :class:`!ZstdDecompressor` - APIs for incremental
compression or decompression

It will also contain some Zstandard-specific functionality
It will also contain some Zstandard-specific functionality:

* ``ZstdDict`` / ``train_dict`` / ``finalize_dict`` - APIs for interacting with
Zstandard dictionaries, which are useful for compressing many small chunks of
similar data
* :class:`!ZstdDict` / :func:`!train_dict` / :func:`!finalize_dict` - APIs for
interacting with Zstandard dictionaries, which are useful for compressing
many small chunks of similar data

``libzstd`` optional dependency
-------------------------------
Expand Down Expand Up @@ -222,11 +243,12 @@ Backwards Compatibility

The main compatibility concern is usage of existing standard library
compression APIs with the existing import names. These names will be
deprecated in 3.19 and will be removed in 3.24. Given the long coexistance of
deprecated in 3.19 and will be removed in 3.24. Given the long coexistence of
the modules and a 5 year deprecation period, most users will likely migrate to
the new import names well before then. Additionally, a libCST codemod can be
provided to automatically rewrite imports, easing the migration.


Security Implications
=====================

Expand All @@ -241,13 +263,15 @@ Taking on a new dependency also always has security risks, but the ``zstd``
library is mature, fuzzed on each commit, and `participates in Meta's bug bounty
program <https://github.com/facebook/zstd/blob/dev/SECURITY.md>`_.


How to Teach This
=================

Documentation for the new module is in the reference implementation branch. The
documentation for other modules will be updated to discuss the deprecation of
their existing import names, and how to migrate.


Reference Implementation
========================

Expand All @@ -258,6 +282,7 @@ integration added. It also contains the re-exports of other compression
modules. Deprecations for the existing import names will be added once a
decision is reached regarding the open issues.


Rejected Ideas
==============

Expand All @@ -273,6 +298,7 @@ import name ``lz4``. Instead of solving this issue for each compression format,
it is better to solve it once and for all by using the already-claimed
``compression`` namespace.


Copyright
=========

Expand Down