Skip to content
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,7 @@ peps/pep-0777.rst @warsaw
peps/pep-0779.rst @Yhg1s @colesbury @mpage
peps/pep-0780.rst @lysnikolaou
peps/pep-0781.rst @methane
peps/pep-0782.rst @vstinner
# ...
peps/pep-0789.rst @njsmith
# ...
Expand Down
372 changes: 372 additions & 0 deletions peps/pep-0782.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,372 @@
PEP: 782
Title: Add PyBytesWriter C API
Author: Victor Stinner <vstinner@python.org>
Status: Draft
Type: Standards Track
Created: 27-Mar-2025
Python-Version: 3.14
Post-History:
`18-Feb-2025 <https://discuss.python.org/t/81182>`__


.. highlight:: c


Abstract
========

Add a new ``PyBytesWriter`` C API to create ``bytes`` objects.

Soft deprecate ``PyBytes_FromStringAndSize(NULL, size)`` and
``_PyBytes_Resize()`` APIs. These APIs treat an immutable ``bytes``
object as a mutable object. They remain available and maintained, don't
emit deprecation warning, but are no longer recommended when writing new
code.


Rationale
=========

Disallow creation of incomplete/inconsistent objects
----------------------------------------------------

Creating a Python :class:`bytes` object using
``PyBytes_FromStringAndSize(NULL, size)`` and ``_PyBytes_Resize()``
treats an immutable :class:`bytes` object as mutable. It goes against
the principle that :class:`bytes` objects are immutable. It also creates
an incomplete or "invalid" object since bytes are not initialized. In
Python, a :class:`bytes` object should always have its bytes fully
initialized.

* `Avoid creating incomplete/invalid objects api-evolution#36
<https://github.com/capi-workgroup/api-evolution/issues/36>`_
* `Disallow mutating immutable objects api-evolution#20
<https://github.com/capi-workgroup/api-evolution/issues/20>`_
* `Disallow creation of incomplete/inconsistent objects problems#56
<https://github.com/capi-workgroup/problems/issues/56>`_

Inefficient allocation strategy
-------------------------------

When creating a bytes string and the output size is unknown, one
strategy is to allocate a short buffer and extend it (to the exact size)
each time a larger write is needed.

This strategy is inefficient because it requires to enlarge the buffer
multiple timess. It's more efficient to overallocate the buffer the
first time that a larger write is needed. It reduces the number of
expensive ``realloc()`` operations which can imply a memory copy.


Specification
=============

API
---

.. c:type:: PyBytesWriter

A Python :class:`bytes` writer instance created by
:c:func:`PyBytesWriter_Create`.

The instance must be destroyed by :c:func:`PyBytesWriter_Finish` or
:c:func:`PyBytesWriter_Discard`.

Create, Finish, Discard
^^^^^^^^^^^^^^^^^^^^^^^

.. c:function:: PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size)

Create a :c:type:`PyBytesWriter` to write *size* bytes.

If *size* is greater than zero, allocate *size* bytes for the
returned buffer.

On error, set an exception and return NULL.

*size* must be positive or zero.

.. c:function:: PyObject* PyBytesWriter_Finish(PyBytesWriter *writer)

Finish a :c:type:`PyBytesWriter` created by
:c:func:`PyBytesWriter_Create`.

On success, return a Python :class:`bytes` object.
On error, set an exception and return ``NULL``.

The writer instance is invalid after the call in any case.

.. c:function:: PyObject* PyBytesWriter_FinishWithSize(PyBytesWriter *writer, Py_ssize_t size)

Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer
to *size* bytes before creating the :class:`bytes` object.

.. c:function:: PyObject* PyBytesWriter_FinishWithPointer(PyBytesWriter *writer, void *buf)

Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer
using *buf* pointer before creating the :class:`bytes` object.

Pseudo-code::

Py_ssize_t size = (char*)buf - (char*)PyBytesWriter_GetData(writer);
return PyBytesWriter_FinishWithSize(writer, size);

Set an exception and return ``NULL`` if *buf* pointer is outside the
internal buffer bounds.

.. c:function:: void PyBytesWriter_Discard(PyBytesWriter *writer)

Discard a :c:type:`PyBytesWriter` created by :c:func:`PyBytesWriter_Create`.

Do nothing if *writer* is ``NULL``.

The writer instance is invalid after the call.

High-level API
^^^^^^^^^^^^^^

.. c:function:: int PyBytesWriter_WriteBytes(PyBytesWriter *writer, const void *bytes, Py_ssize_t size)

Write *size* bytes of *bytes* into the *writer*.

If *size* is equal to ``-1``, call ``strlen(bytes)`` to get the
string length.

On success, return ``0``.
On error, set an exception and return ``-1``.

.. c:function:: int PyBytesWriter_Format(PyBytesWriter *writer, const char *format, ...)

Similar to ``PyBytes_FromFormat()``, but write the output directly
into the writer.

On success, return ``0``.
On error, set an exception and return ``-1``.

Getters
^^^^^^^

.. c:function:: Py_ssize_t PyBytesWriter_GetSize(PyBytesWriter *writer)

Get the writer size.

.. c:function:: void* PyBytesWriter_GetData(PyBytesWriter *writer)

Get the writer data.

The pointer is valid until :c:func:`PyBytesWriter_Finish` or
:c:func:`PyBytesWriter_Discard` is called on *writer*.

Low-level API
^^^^^^^^^^^^^

.. c:function:: int PyBytesWriter_Resize(PyBytesWriter *writer, Py_ssize_t size)

Resize the writer to *size* bytes. It can be used to enlarge or to
shrink the writer.

Newly allocated bytes are left uninitialized.

On success, return ``0``.
On error, set an exception and return ``-1``.

*size* must be positive or zero.

.. c:function:: int PyBytesWriter_Grow(PyBytesWriter *writer, Py_ssize_t grow)

Resize the writer by adding *grow* bytes to the current writer size.

Newly allocated bytes are left uninitialized.

On success, return ``0``.
On error, set an exception and return ``-1``.

*size* must be positive or zero.

.. c:function:: void* PyBytesWriter_GrowAndUpdatePointer(PyBytesWriter *writer, Py_ssize_t size, void *buf)

Similar to :c:func:`PyBytesWriter_Grow`, but update also the *buf*
pointer.

On error, set an exception and return ``NULL``.

Pseudo-code::

Py_ssize_t pos = (char*)buf - (char*)PyBytesWriter_GetData(writer);
if (PyBytesWriter_Grow(writer, size) < 0) {
return NULL;
}
return (char*)PyBytesWriter_GetData(writer) + pos;


Overallocation
--------------

:c:func:`PyBytesWriter_Resize` and :c:func:`PyBytesWriter_Grow`
overallocate the internal buffer to reduce the number of ``realloc()``
calls and so reduce memory copies.


Thread safety
-------------

The API is not thread safe: a writer should only be used by a single
thread at the same time.


Soft deprecations
-----------------

Soft deprecate ``PyBytes_FromStringAndSize(NULL, size)`` and
``_PyBytes_Resize()`` APIs. These APIs treat an immutable ``bytes``
object as a mutable object. They remain available and maintained, don't
emit deprecation warning, but are no longer recommended when writing new
code.

``PyBytes_FromStringAndSize(str, size)`` is not soft deprecated. Only
calls with ``NULL`` *str* are soft deprecated.


Examples
========

High-level API
--------------

Create the bytes string ``b"Hello World!"``::

PyObject* hello_world(void)
{
PyBytesWriter *writer = PyBytesWriter_Create(0);
if (writer == NULL) {
goto error;
}
if (PyBytesWriter_WriteBytes(writer, "Hello", -1) < 0) {
goto error;
}
if (PyBytesWriter_Format(writer, " %s!", "World") < 0) {
goto error;
}
return PyBytesWriter_Finish(writer);

error:
PyBytesWriter_Discard(writer);
return NULL;
}


Create the bytes string "abc"
-----------------------------

Example creating the bytes string ``b"abc"``, with a fixed size of 3 bytes::

PyObject* create_abc(void)
{
PyBytesWriter *writer = PyBytesWriter_Create(3);
if (writer == NULL) {
return NULL;
}

char *str = PyBytesWriter_GetData(writer);
memcpy(str, "abc", 3);
return PyBytesWriter_Finish(writer);
}

GrowAndUpdatePointer() example
------------------------------

Example using a pointer to write bytes and to track the written size.

Create the bytes string ``b"Hello World"``::

PyObject* grow_example(void)
{
// Allocate 10 bytes
PyBytesWriter *writer = PyBytesWriter_Create(10);
if (writer == NULL) {
return NULL;
}

// Write some bytes
char *buf = PyBytesWriter_GetData(writer);
memcpy(buf, "Hello ", strlen("Hello "));
buf += strlen("Hello ");

// Allocate 10 more bytes
buf = PyBytesWriter_GrowAndUpdatePointer(writer, 10, buf);
if (buf == NULL) {
PyBytesWriter_Discard(writer);
return NULL;
}

// Write more bytes
memcpy(buf, "World", strlen("World"));
buf += strlen("World");

// Truncate the string at 'buf' position
// and create a bytes object
return PyBytesWriter_FinishWithPointer(writer, buf);
}


Reference Implementation
========================

`Pull request gh-131681 <https://github.com/python/cpython/pull/131681>`__.

The implementation allocates internally a :class:`bytes` object, so
:c:func:`PyBytesWriter_Finish` just returns the object without having
to copy memory.

For strings up to 256 bytes, a small internal raw buffer of bytes is
used. It avoids having to resize a :class:`bytes` object which is
inefficient. At the end, :c:func:`PyBytesWriter_Finish` creates the
:class:`bytes` object from this small buffer.

A free list is used to reduce the cost of allocating a
:c:type:`PyBytesWriter` on the heap memory.


Backwards Compatibility
=======================

There is no impact on the backward compatibility, only new APIs are
added.


Prior Discussions
=================

* March 2025: Third public API attempt, using size rather than pointers:

* `Discussion <https://discuss.python.org/t/81182/56>`_
* `Pull request gh-131681 <https://github.com/python/cpython/pull/131681>`__

* February 2025: Second public API attempt:

* `Issue gh-129813 <https://github.com/python/cpython/issues/129813>`_
and
`pull request gh-129814
<https://github.com/python/cpython/pull/129814>`_

* July 2024: First public API attempt:

* C API Working Group decision:
`Add PyBytes_Writer() API
<https://github.com/capi-workgroup/decisions/issues/39>`_
(August 2024)
* `Pull request gh-121726
<https://github.com/python/cpython/pull/121726>`_:
first public API attempt (July 2024)

* March 2016:
`Fast _PyAccu, _PyUnicodeWriter and _PyBytesWriter APIs to produce
strings in CPython <https://vstinner.github.io/pybyteswriter.html>`_:
Article on the original private ``_PyBytesWriter`` C API.


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.