-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
PEP 782: Add PyBytesWriter C API #4325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
d0bfc2e
PEP 782: Add PyBytesWriter C API
vstinner 1a25a38
Add myself to CODEOWNERS
vstinner 79bdc30
Apply suggestions from code review
vstinner 496a1fa
Remove PyPI study
vstinner 1adf08c
Merge branch 'main' into pep782
vstinner 54ebc67
Update
vstinner a4aa0f2
Update peps/pep-0782.rst
vstinner 7380b81
Update
vstinner f536c9e
Update
vstinner c26246e
Implementation
vstinner dee2be5
Update
vstinner d6a63ae
Soft deprecations
vstinner 8a2fcb6
Apply suggestions from code review
vstinner 07e69a0
Address Benedikt's review
vstinner 79403f2
Update
vstinner 141f531
Update
vstinner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,372 @@ | ||
| PEP: 782 | ||
| Title: Add PyBytesWriter C API | ||
| Author: Victor Stinner <vstinner@python.org> | ||
| Status: Draft | ||
| Type: Standards Track | ||
| Created: 27-Mar-2025 | ||
| Python-Version: 3.14 | ||
| Post-History: | ||
| `18-Feb-2025 <https://discuss.python.org/t/81182>`__ | ||
|
|
||
|
|
||
| .. highlight:: c | ||
|
|
||
|
|
||
| Abstract | ||
| ======== | ||
|
|
||
| Add a new ``PyBytesWriter`` C API to create ``bytes`` objects. | ||
|
|
||
| Soft deprecate ``PyBytes_FromStringAndSize(NULL, size)`` and | ||
| ``_PyBytes_Resize()`` APIs. These APIs treat an immutable ``bytes`` | ||
| object as a mutable object. They remain available and maintained, don't | ||
| emit deprecation warning, but are no longer recommended when writing new | ||
| code. | ||
|
|
||
|
|
||
| Rationale | ||
| ========= | ||
|
|
||
| Disallow creation of incomplete/inconsistent objects | ||
| ---------------------------------------------------- | ||
|
|
||
| Creating a Python :class:`bytes` object using | ||
| ``PyBytes_FromStringAndSize(NULL, size)`` and ``_PyBytes_Resize()`` | ||
| treats an immutable :class:`bytes` object as mutable. It goes against | ||
| the principle that :class:`bytes` objects are immutable. It also creates | ||
| an incomplete or "invalid" object since bytes are not initialized. In | ||
| Python, a :class:`bytes` object should always have its bytes fully | ||
| initialized. | ||
|
|
||
| * `Avoid creating incomplete/invalid objects api-evolution#36 | ||
| <https://github.com/capi-workgroup/api-evolution/issues/36>`_ | ||
| * `Disallow mutating immutable objects api-evolution#20 | ||
| <https://github.com/capi-workgroup/api-evolution/issues/20>`_ | ||
| * `Disallow creation of incomplete/inconsistent objects problems#56 | ||
| <https://github.com/capi-workgroup/problems/issues/56>`_ | ||
|
|
||
| Inefficient allocation strategy | ||
| ------------------------------- | ||
|
|
||
| When creating a bytes string and the output size is unknown, one | ||
| strategy is to allocate a short buffer and extend it (to the exact size) | ||
| each time a larger write is needed. | ||
|
|
||
| This strategy is inefficient because it requires to enlarge the buffer | ||
| multiple timess. It's more efficient to overallocate the buffer the | ||
| first time that a larger write is needed. It reduces the number of | ||
| expensive ``realloc()`` operations which can imply a memory copy. | ||
|
|
||
|
|
||
| Specification | ||
| ============= | ||
|
|
||
| API | ||
| --- | ||
|
|
||
| .. c:type:: PyBytesWriter | ||
|
|
||
| A Python :class:`bytes` writer instance created by | ||
| :c:func:`PyBytesWriter_Create`. | ||
|
|
||
| The instance must be destroyed by :c:func:`PyBytesWriter_Finish` or | ||
| :c:func:`PyBytesWriter_Discard`. | ||
|
|
||
| Create, Finish, Discard | ||
| ^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| .. c:function:: PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size) | ||
|
|
||
| Create a :c:type:`PyBytesWriter` to write *size* bytes. | ||
|
|
||
| If *size* is greater than zero, allocate *size* bytes for the | ||
| returned buffer. | ||
|
|
||
| On error, set an exception and return NULL. | ||
|
|
||
| *size* must be positive or zero. | ||
|
|
||
| .. c:function:: PyObject* PyBytesWriter_Finish(PyBytesWriter *writer) | ||
|
|
||
| Finish a :c:type:`PyBytesWriter` created by | ||
| :c:func:`PyBytesWriter_Create`. | ||
|
|
||
| On success, return a Python :class:`bytes` object. | ||
| On error, set an exception and return ``NULL``. | ||
|
|
||
| The writer instance is invalid after the call in any case. | ||
|
|
||
| .. c:function:: PyObject* PyBytesWriter_FinishWithSize(PyBytesWriter *writer, Py_ssize_t size) | ||
|
|
||
| Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer | ||
| to *size* bytes before creating the :class:`bytes` object. | ||
|
|
||
| .. c:function:: PyObject* PyBytesWriter_FinishWithPointer(PyBytesWriter *writer, void *buf) | ||
|
|
||
| Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer | ||
| using *buf* pointer before creating the :class:`bytes` object. | ||
vstinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Pseudo-code:: | ||
|
|
||
| Py_ssize_t size = (char*)buf - (char*)PyBytesWriter_GetData(writer); | ||
| return PyBytesWriter_FinishWithSize(writer, size); | ||
|
|
||
| Set an exception and return ``NULL`` if *buf* pointer is outside the | ||
| internal buffer bounds. | ||
|
|
||
| .. c:function:: void PyBytesWriter_Discard(PyBytesWriter *writer) | ||
|
|
||
| Discard a :c:type:`PyBytesWriter` created by :c:func:`PyBytesWriter_Create`. | ||
|
|
||
| Do nothing if *writer* is ``NULL``. | ||
|
|
||
| The writer instance is invalid after the call. | ||
vstinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| High-level API | ||
| ^^^^^^^^^^^^^^ | ||
|
|
||
| .. c:function:: int PyBytesWriter_WriteBytes(PyBytesWriter *writer, const void *bytes, Py_ssize_t size) | ||
|
|
||
| Write *size* bytes of *bytes* into the *writer*. | ||
|
|
||
| If *size* is equal to ``-1``, call ``strlen(bytes)`` to get the | ||
vstinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| string length. | ||
|
|
||
| On success, return ``0``. | ||
| On error, set an exception and return ``-1``. | ||
|
|
||
| .. c:function:: int PyBytesWriter_Format(PyBytesWriter *writer, const char *format, ...) | ||
|
|
||
| Similar to ``PyBytes_FromFormat()``, but write the output directly | ||
| into the writer. | ||
|
|
||
| On success, return ``0``. | ||
| On error, set an exception and return ``-1``. | ||
|
|
||
| Getters | ||
| ^^^^^^^ | ||
|
|
||
| .. c:function:: Py_ssize_t PyBytesWriter_GetSize(PyBytesWriter *writer) | ||
|
|
||
| Get the writer size. | ||
|
|
||
| .. c:function:: void* PyBytesWriter_GetData(PyBytesWriter *writer) | ||
|
|
||
| Get the writer data. | ||
|
|
||
| The pointer is valid until :c:func:`PyBytesWriter_Finish` or | ||
| :c:func:`PyBytesWriter_Discard` is called on *writer*. | ||
|
|
||
| Low-level API | ||
| ^^^^^^^^^^^^^ | ||
|
|
||
| .. c:function:: int PyBytesWriter_Resize(PyBytesWriter *writer, Py_ssize_t size) | ||
|
|
||
| Resize the writer to *size* bytes. It can be used to enlarge or to | ||
| shrink the writer. | ||
|
|
||
| Newly allocated bytes are left uninitialized. | ||
|
|
||
| On success, return ``0``. | ||
| On error, set an exception and return ``-1``. | ||
|
|
||
| *size* must be positive or zero. | ||
|
|
||
| .. c:function:: int PyBytesWriter_Grow(PyBytesWriter *writer, Py_ssize_t grow) | ||
vstinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Resize the writer by adding *grow* bytes to the current writer size. | ||
vstinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Newly allocated bytes are left uninitialized. | ||
|
|
||
| On success, return ``0``. | ||
| On error, set an exception and return ``-1``. | ||
|
|
||
| *size* must be positive or zero. | ||
|
|
||
| .. c:function:: void* PyBytesWriter_GrowAndUpdatePointer(PyBytesWriter *writer, Py_ssize_t size, void *buf) | ||
|
|
||
| Similar to :c:func:`PyBytesWriter_Grow`, but update also the *buf* | ||
vstinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| pointer. | ||
|
|
||
| On error, set an exception and return ``NULL``. | ||
|
|
||
| Pseudo-code:: | ||
|
|
||
| Py_ssize_t pos = (char*)buf - (char*)PyBytesWriter_GetData(writer); | ||
| if (PyBytesWriter_Grow(writer, size) < 0) { | ||
| return NULL; | ||
| } | ||
| return (char*)PyBytesWriter_GetData(writer) + pos; | ||
|
|
||
|
|
||
| Overallocation | ||
| -------------- | ||
|
|
||
| :c:func:`PyBytesWriter_Resize` and :c:func:`PyBytesWriter_Grow` | ||
| overallocate the internal buffer to reduce the number of ``realloc()`` | ||
| calls and so reduce memory copies. | ||
|
|
||
|
|
||
| Thread safety | ||
| ------------- | ||
|
|
||
| The API is not thread safe: a writer should only be used by a single | ||
| thread at the same time. | ||
|
|
||
|
|
||
| Soft deprecations | ||
| ----------------- | ||
|
|
||
| Soft deprecate ``PyBytes_FromStringAndSize(NULL, size)`` and | ||
| ``_PyBytes_Resize()`` APIs. These APIs treat an immutable ``bytes`` | ||
| object as a mutable object. They remain available and maintained, don't | ||
| emit deprecation warning, but are no longer recommended when writing new | ||
| code. | ||
|
|
||
| ``PyBytes_FromStringAndSize(str, size)`` is not soft deprecated. Only | ||
| calls with ``NULL`` *str* are soft deprecated. | ||
|
|
||
|
|
||
| Examples | ||
| ======== | ||
|
|
||
| High-level API | ||
| -------------- | ||
|
|
||
| Create the bytes string ``b"Hello World!"``:: | ||
|
|
||
| PyObject* hello_world(void) | ||
| { | ||
| PyBytesWriter *writer = PyBytesWriter_Create(0); | ||
| if (writer == NULL) { | ||
| goto error; | ||
| } | ||
| if (PyBytesWriter_WriteBytes(writer, "Hello", -1) < 0) { | ||
| goto error; | ||
| } | ||
| if (PyBytesWriter_Format(writer, " %s!", "World") < 0) { | ||
| goto error; | ||
| } | ||
| return PyBytesWriter_Finish(writer); | ||
|
|
||
| error: | ||
| PyBytesWriter_Discard(writer); | ||
| return NULL; | ||
| } | ||
|
|
||
|
|
||
| Create the bytes string "abc" | ||
| ----------------------------- | ||
|
|
||
| Example creating the bytes string ``b"abc"``, with a fixed size of 3 bytes:: | ||
|
|
||
| PyObject* create_abc(void) | ||
| { | ||
| PyBytesWriter *writer = PyBytesWriter_Create(3); | ||
| if (writer == NULL) { | ||
| return NULL; | ||
| } | ||
|
|
||
| char *str = PyBytesWriter_GetData(writer); | ||
| memcpy(str, "abc", 3); | ||
| return PyBytesWriter_Finish(writer); | ||
| } | ||
|
|
||
| GrowAndUpdatePointer() example | ||
| ------------------------------ | ||
|
|
||
| Example using a pointer to write bytes and to track the written size. | ||
|
|
||
| Create the bytes string ``b"Hello World"``:: | ||
|
|
||
| PyObject* grow_example(void) | ||
| { | ||
| // Allocate 10 bytes | ||
| PyBytesWriter *writer = PyBytesWriter_Create(10); | ||
| if (writer == NULL) { | ||
| return NULL; | ||
| } | ||
|
|
||
| // Write some bytes | ||
| char *buf = PyBytesWriter_GetData(writer); | ||
| memcpy(buf, "Hello ", strlen("Hello ")); | ||
| buf += strlen("Hello "); | ||
|
|
||
| // Allocate 10 more bytes | ||
| buf = PyBytesWriter_GrowAndUpdatePointer(writer, 10, buf); | ||
| if (buf == NULL) { | ||
| PyBytesWriter_Discard(writer); | ||
| return NULL; | ||
| } | ||
|
|
||
| // Write more bytes | ||
| memcpy(buf, "World", strlen("World")); | ||
| buf += strlen("World"); | ||
|
|
||
| // Truncate the string at 'buf' position | ||
| // and create a bytes object | ||
| return PyBytesWriter_FinishWithPointer(writer, buf); | ||
| } | ||
|
|
||
|
|
||
| Reference Implementation | ||
| ======================== | ||
|
|
||
| `Pull request gh-131681 <https://github.com/python/cpython/pull/131681>`__. | ||
|
|
||
| The implementation allocates internally a :class:`bytes` object, so | ||
| :c:func:`PyBytesWriter_Finish` just returns the object without having | ||
| to copy memory. | ||
|
|
||
| For strings up to 256 bytes, a small internal raw buffer of bytes is | ||
| used. It avoids having to resize a :class:`bytes` object which is | ||
| inefficient. At the end, :c:func:`PyBytesWriter_Finish` creates the | ||
| :class:`bytes` object from this small buffer. | ||
|
|
||
| A free list is used to reduce the cost of allocating a | ||
| :c:type:`PyBytesWriter` on the heap memory. | ||
|
|
||
|
|
||
| Backwards Compatibility | ||
| ======================= | ||
|
|
||
| There is no impact on the backward compatibility, only new APIs are | ||
| added. | ||
|
|
||
|
|
||
| Prior Discussions | ||
| ================= | ||
|
|
||
| * March 2025: Third public API attempt, using size rather than pointers: | ||
|
|
||
| * `Discussion <https://discuss.python.org/t/81182/56>`_ | ||
| * `Pull request gh-131681 <https://github.com/python/cpython/pull/131681>`__ | ||
|
|
||
| * February 2025: Second public API attempt: | ||
|
|
||
| * `Issue gh-129813 <https://github.com/python/cpython/issues/129813>`_ | ||
| and | ||
| `pull request gh-129814 | ||
| <https://github.com/python/cpython/pull/129814>`_ | ||
|
|
||
| * July 2024: First public API attempt: | ||
|
|
||
| * C API Working Group decision: | ||
| `Add PyBytes_Writer() API | ||
| <https://github.com/capi-workgroup/decisions/issues/39>`_ | ||
| (August 2024) | ||
| * `Pull request gh-121726 | ||
| <https://github.com/python/cpython/pull/121726>`_: | ||
| first public API attempt (July 2024) | ||
|
|
||
| * March 2016: | ||
| `Fast _PyAccu, _PyUnicodeWriter and _PyBytesWriter APIs to produce | ||
| strings in CPython <https://vstinner.github.io/pybyteswriter.html>`_: | ||
| Article on the original private ``_PyBytesWriter`` C API. | ||
|
|
||
|
|
||
| Copyright | ||
| ========= | ||
|
|
||
| This document is placed in the public domain or under the | ||
| CC0-1.0-Universal license, whichever is more permissive. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.