Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 80 additions & 15 deletions peps/pep-0784.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,11 @@ Motivation
==========

CPython has modules for several different compression formats, such as
:mod:`zlib (DEFLATE) <zlib>`, :mod:`bzip2 <bz2>`, and :mod:`lzma <lzma>`,
each widely used. Including popular compression algorithms matches Python's
"batteries included" philosophy of incorporating widely useful standards and
utilities. :mod:`!lzma` is the most recent such module, added in Python 3.3.
:mod:`zlib (DEFLATE) <zlib>`, :mod:`gzip <gzip>`, :mod:`bzip2 <bz2>`, and
:mod:`lzma <lzma>`, each widely used. Including popular compression algorithms
matches Python's "batteries included" philosophy of incorporating widely useful
standards and utilities. :mod:`!lzma` is the most recent such module, added in
Python 3.3.

Since then, Zstandard has become the modern *de facto* preferred compression
library for both high performance compression and decompression attaining high
Expand Down Expand Up @@ -216,9 +217,10 @@ used to build libraries CPython depends on for Windows.
Other compression modules
-------------------------

New import names ``compression.lzma``, ``compression.bz2``, and
``compression.zlib`` will be introduced in Python 3.14 re-exporting the
contents of the existing ``lzma``, ``bz2``, and ``zlib`` modules respectively.
New import names ``compression.lzma``, ``compression.bz2``,
``compression.gzip`` and ``compression.zlib`` will be introduced in Python 3.14
re-exporting the contents of the existing ``lzma``, ``bz2``, ``gzip`` and
``zlib`` modules respectively.

The ``_compression`` module, given that it is marked private, will be
immediately renamed to ``compression._common.streams``. The new name was
Expand Down Expand Up @@ -289,17 +291,80 @@ decision is reached regarding the open issues.
Rejected Ideas
==============

Name the module ``libzstd`` and do not make a new ``compression`` namespace
Name the module ``zstdlib`` and do not make a new ``compression`` namespace
---------------------------------------------------------------------------

One option instead of making a new ``compression`` namespace would be to find
a different name, such as ``libzstd``, as the import name. However, the issue
of existing import names is likely to persist for future compression formats
added to the standard library. LZ4, a common high speed compression format,
has `a package on PyPI <https://pypi.org/project/lz4/>`_, ``lz4``, with the
import name ``lz4``. Instead of solving this issue for each compression format,
it is better to solve it once and for all by using the already-claimed
``compression`` namespace.
a different name, such as ``zstdlib``, as the import name. Several other names,
such as ``zst``, ``libzstd``, and ``zstdcomp`` were proposed as well. In
discussion, the names were found to either be too easy to typo, or unintuitive.
Furthermore, the issue of existing import names is likely to persist for future
compression formats added to the standard library. LZ4, a common high speed
compression format, has `a package on PyPI <https://pypi.org/project/lz4/>`_,
``lz4``, with the import name ``lz4``. Instead of solving this issue for each
compression format, it is better to solve it once and for all by using the
already-claimed ``compression`` namespace.

Introduce an experimental ``_zstd`` package in Python 3.14
----------------------------------------------------------

Since this PEP was published close to the beta cutoff for new features for
Python 3.14, one proposal was to name the package a private module ``_zstd``
so that packaging tools could use it sooner, but not deciding on a name. This
would allow more time for discussion of the final module name during the 3.15
development window. However, introducing a private module was not popular. The
expectations and contract for external usage of a private module in the
standard library are unclear.

Introduce a standard library namespace instead of ``compression``
-----------------------------------------------------------------

One alternative to a ``compression`` namespace would be to introduce a
``std`` namespace for the entire standard library. However, this was seen as
too significant a change for 3.14, with no agreed upon semantics, migration
path, or name for the package. Furthermore, a future PEP introducing a ``std``
namespace could always define that the ``compression`` sub-modules be flattened
into the ``std`` namespace.

Include ``zipfile`` and ``tarfile`` in ``compression``
------------------------------------------------------

Compression is often used with archiving tools, so putting both :mod:`zipfile`
and :mod:`tarfile` under the ``compression`` namespace is appealing. However,
compression can be used beyond just archiving tools. For example, network
requests can be gzip compressed. Furthermore, formats like tar do not include
compression themselves, instead relying on external compression. Therefore,
this PEP does not propose moving :mod:`!zipfile` or :mod:`!tarfile` under
``compression``.

Do not include ``gzip`` under ``compression``
---------------------------------------------

The :rfc:`GZip format RFC <1952>` defines a format which can include multiple
blocks and metadata about its contents. In this way GZip is rather similar to
archive formats like ZIP and tar. Despite that, in usage GZip is often treated
as a compression format rather than an archive format. Looking at how different
languages classify GZip, the prevailing trend is to classify it as a
compression format and not an archiving format.

========== ======================== ==============================================================================
Language Compression or Archive Documentation Link
========== ======================== ==============================================================================
Golang Compression https://pkg.go.dev/compress/gzip
Ruby Compression https://docs.ruby-lang.org/en/master/Zlib/GzipFile.html
Rust Compression https://github.com/rust-lang/flate2-rs
Haskell Compression https://hackage.haskell.org/package/zlib
C# Compression https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.gzipstream
Java Archive https://docs.oracle.com/javase/8/docs/api/java/util/zip/package-summary.html
NodeJS Compression https://nodejs.org/api/zlib.html
Web APIs Compression https://developer.mozilla.org/en-US/docs/Web/API/Compression_Streams_API
PHP Compression https://www.php.net/manual/en/function.gzcompress.php
Perl Compression https://perldoc.perl.org/IO::Compress::Gzip
========== ======================== ==============================================================================

In addition, the :mod:`!gzip` module in Python mostly focuses on single block
content and has an API similar to other compression modules, making it a good
fit for the ``compression`` namespace.


Copyright
Expand Down