From 6a6e24b42d91ae4e9b42d770472542b742d3d855 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Mon, 31 Mar 2025 18:34:27 +0200 Subject: [PATCH 1/4] PEP 782: Address Steve's review --- peps/pep-0782.rst | 57 ++++++++++++++++++++++++++++++----------------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/peps/pep-0782.rst b/peps/pep-0782.rst index b1310c08d0f..3d48af52b9c 100644 --- a/peps/pep-0782.rst +++ b/peps/pep-0782.rst @@ -81,7 +81,7 @@ Create, Finish, Discard Create a :c:type:`PyBytesWriter` to write *size* bytes. If *size* is greater than zero, allocate *size* bytes for the - returned buffer. + returned buffer, and set the writer size to *size*. On error, set an exception and return NULL. @@ -107,14 +107,14 @@ Create, Finish, Discard Similar to :c:func:`PyBytesWriter_Finish`, but resize the writer using *buf* pointer before creating the :class:`bytes` object. - Pseudo-code:: + Set an exception and return ``NULL`` if *buf* pointer is outside the + internal buffer bounds. + + Function pseudo-code:: Py_ssize_t size = (char*)buf - (char*)PyBytesWriter_GetData(writer); return PyBytesWriter_FinishWithSize(writer, size); - Set an exception and return ``NULL`` if *buf* pointer is outside the - internal buffer bounds. - .. c:function:: void PyBytesWriter_Discard(PyBytesWriter *writer) Discard a :c:type:`PyBytesWriter` created by :c:func:`PyBytesWriter_Create`. @@ -128,7 +128,8 @@ High-level API .. c:function:: int PyBytesWriter_WriteBytes(PyBytesWriter *writer, const void *bytes, Py_ssize_t size) - Write *size* bytes of *bytes* into the *writer*. + Write *size* bytes of *bytes* at the *writer* end, + and add *size* to the writer size. If *size* is equal to ``-1``, call ``strlen(bytes)`` to get the string length. @@ -138,8 +139,8 @@ High-level API .. c:function:: int PyBytesWriter_Format(PyBytesWriter *writer, const char *format, ...) - Similar to ``PyBytes_FromFormat()``, but write the output directly - into the writer. + Similar to ``PyBytes_FromFormat()``, but write the output directly at + the writer end. Then add the written size to the writer size. On success, return ``0``. On error, set an exception and return ``-1``. @@ -153,7 +154,7 @@ Getters .. c:function:: void* PyBytesWriter_GetData(PyBytesWriter *writer) - Get the writer data. + Get the writer data: start of the internal buffer. The pointer is valid until :c:func:`PyBytesWriter_Finish` or :c:func:`PyBytesWriter_Discard` is called on *writer*. @@ -182,16 +183,21 @@ Low-level API On success, return ``0``. On error, set an exception and return ``-1``. - *size* must be positive or zero. + *size* can be negative to shrink the writer. .. c:function:: void* PyBytesWriter_GrowAndUpdatePointer(PyBytesWriter *writer, Py_ssize_t size, void *buf) Similar to :c:func:`PyBytesWriter_Grow`, but update also the *buf* pointer. + The *buf* pointer is moved if the internal buffer is moved in memory. + The *buf* position inside the internal buffer is left unchanged. + On error, set an exception and return ``NULL``. - Pseudo-code:: + *buf* must not be ``NULL``. + + Function pseudo-code:: Py_ssize_t pos = (char*)buf - (char*)PyBytesWriter_GetData(writer); if (PyBytesWriter_Grow(writer, size) < 0) { @@ -207,6 +213,10 @@ Overallocation overallocate the internal buffer to reduce the number of ``realloc()`` calls and so reduce memory copies. +:c:func:`PyBytesWriter_Finish` trims overallocations: it shrinks the +internal buffer to the exact size when creating the final :class:`bytes` +object. + Thread safety ------------- @@ -315,17 +325,20 @@ Reference Implementation `Pull request gh-131681 `__. -The implementation allocates internally a :class:`bytes` object, so -:c:func:`PyBytesWriter_Finish` just returns the object without having -to copy memory. +Notes on the CPython reference implementation which are not part of the +Specification: + +* The implementation allocates internally a :class:`bytes` object, so + :c:func:`PyBytesWriter_Finish` just returns the object without having + to copy memory. -For strings up to 256 bytes, a small internal raw buffer of bytes is -used. It avoids having to resize a :class:`bytes` object which is -inefficient. At the end, :c:func:`PyBytesWriter_Finish` creates the -:class:`bytes` object from this small buffer. +* For strings up to 256 bytes, a small internal raw buffer of bytes is + used. It avoids having to resize a :class:`bytes` object which is + inefficient. At the end, :c:func:`PyBytesWriter_Finish` creates the + :class:`bytes` object from this small buffer. -A free list is used to reduce the cost of allocating a -:c:type:`PyBytesWriter` on the heap memory. +* A free list is used to reduce the cost of allocating a + :c:type:`PyBytesWriter` on the heap memory. Backwards Compatibility @@ -334,6 +347,10 @@ Backwards Compatibility There is no impact on the backward compatibility, only new APIs are added. +``PyBytes_FromStringAndSize(NULL, size)`` and ``_PyBytes_Resize()`` APIs +are soft deprecated. No new warnings is emitted when these functions are +used and they are not planned for removal. + Prior Discussions ================= From 4397c537f64e4e9c9ab031c16e69ee30ec827880 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Mon, 31 Mar 2025 18:53:06 +0200 Subject: [PATCH 2/4] Add PyBytes_FromStringAndSize() example --- peps/pep-0782.rst | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/peps/pep-0782.rst b/peps/pep-0782.rst index 3d48af52b9c..65582cad13c 100644 --- a/peps/pep-0782.rst +++ b/peps/pep-0782.rst @@ -320,6 +320,34 @@ Create the bytes string ``b"Hello World"``:: } +Update ``PyBytes_FromStringAndSize()`` code +------------------------------------------- + +Example of code using the soft deprecated +``PyBytes_FromStringAndSize(NULL, size)`` API:: + + PyObject *result = PyBytes_FromStringAndSize(NULL, num_bytes); + if (result == NULL) { + return NULL; + } + if (safe_memcpy(PyBytes_AS_STRING(result), start, num_bytes) < 0) { + Py_CLEAR(result); + } + return result; + +It can now be updated to:: + + PyBytesWriter *writer = PyBytesWriter_Create(num_bytes); + if (writer == NULL) { + return NULL; + } + if (safe_memcpy(PyBytesWriter_GetData(writer), start, num_bytes) < 0) { + PyBytesWriter_Discard(writer); + return NULL; + } + return PyBytesWriter_Finish(writer); + + Reference Implementation ======================== From 014602544974bbc53ac1d227e2ea5eb1c33763bb Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Mon, 31 Mar 2025 19:14:46 +0200 Subject: [PATCH 3/4] Add _PyBytes_Resize() example --- peps/pep-0782.rst | 69 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 50 insertions(+), 19 deletions(-) diff --git a/peps/pep-0782.rst b/peps/pep-0782.rst index 65582cad13c..c88dbdbc741 100644 --- a/peps/pep-0782.rst +++ b/peps/pep-0782.rst @@ -283,8 +283,8 @@ Example creating the bytes string ``b"abc"``, with a fixed size of 3 bytes:: return PyBytesWriter_Finish(writer); } -GrowAndUpdatePointer() example ------------------------------- +``GrowAndUpdatePointer()`` example +---------------------------------- Example using a pointer to write bytes and to track the written size. @@ -326,26 +326,57 @@ Update ``PyBytes_FromStringAndSize()`` code Example of code using the soft deprecated ``PyBytes_FromStringAndSize(NULL, size)`` API:: - PyObject *result = PyBytes_FromStringAndSize(NULL, num_bytes); - if (result == NULL) { - return NULL; - } - if (safe_memcpy(PyBytes_AS_STRING(result), start, num_bytes) < 0) { - Py_CLEAR(result); - } - return result; + PyObject *result = PyBytes_FromStringAndSize(NULL, num_bytes); + if (result == NULL) { + return NULL; + } + if (safe_memcpy(PyBytes_AS_STRING(result), start, num_bytes) < 0) { + Py_CLEAR(result); + } + return result; It can now be updated to:: - PyBytesWriter *writer = PyBytesWriter_Create(num_bytes); - if (writer == NULL) { - return NULL; - } - if (safe_memcpy(PyBytesWriter_GetData(writer), start, num_bytes) < 0) { - PyBytesWriter_Discard(writer); - return NULL; - } - return PyBytesWriter_Finish(writer); + PyBytesWriter *writer = PyBytesWriter_Create(num_bytes); + if (writer == NULL) { + return NULL; + } + if (safe_memcpy(PyBytesWriter_GetData(writer), start, num_bytes) < 0) { + PyBytesWriter_Discard(writer); + return NULL; + } + return PyBytesWriter_Finish(writer); + + +Update ``_PyBytes_Resize()`` code +--------------------------------- + +Example of code using the soft deprecated ``_PyBytes_Resize()`` API:: + + PyObject *v = PyBytes_FromStringAndSize(NULL, size); + if (v == NULL) { + return NULL; + } + char *p = PyBytes_AS_STRING(v); + + // ... fill bytes into 'p' ... + + if (_PyBytes_Resize(&v, (p - PyBytes_AS_STRING(v)))) { + return NULL; + } + return v; + +It can now be updated to:: + + PyBytesWriter *writer = PyBytesWriter_Create(size); + if (writer == NULL) { + return NULL; + } + char *p = PyBytesWriter_GetData(writer); + + // ... fill bytes into 'p' ... + + return PyBytesWriter_FinishWithPointer(writer, p); Reference Implementation From 80a522bde7d14cdaed84c0751d61d394eb5cdb0f Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Mon, 31 Mar 2025 19:26:49 +0200 Subject: [PATCH 4/4] Update --- peps/pep-0782.rst | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/peps/pep-0782.rst b/peps/pep-0782.rst index c88dbdbc741..cbf4c974b9f 100644 --- a/peps/pep-0782.rst +++ b/peps/pep-0782.rst @@ -80,8 +80,9 @@ Create, Finish, Discard Create a :c:type:`PyBytesWriter` to write *size* bytes. - If *size* is greater than zero, allocate *size* bytes for the - returned buffer, and set the writer size to *size*. + If *size* is greater than zero, allocate *size* bytes, and set the + writer size to *size*. The caller is responsible to write *size* + bytes using :c:func:`PyBytesWriter_GetData`. On error, set an exception and return NULL. @@ -128,8 +129,9 @@ High-level API .. c:function:: int PyBytesWriter_WriteBytes(PyBytesWriter *writer, const void *bytes, Py_ssize_t size) - Write *size* bytes of *bytes* at the *writer* end, - and add *size* to the writer size. + Grow the *writer* internal buffer by *size* bytes, + write *size* bytes of *bytes* at the *writer* end, + and add *size* to the *writer* size. If *size* is equal to ``-1``, call ``strlen(bytes)`` to get the string length. @@ -140,7 +142,8 @@ High-level API .. c:function:: int PyBytesWriter_Format(PyBytesWriter *writer, const char *format, ...) Similar to ``PyBytes_FromFormat()``, but write the output directly at - the writer end. Then add the written size to the writer size. + the writer end. Grow the writer internal buffer on demand. + Then add the written size to the writer size. On success, return ``0``. On error, set an exception and return ``-1``. @@ -191,7 +194,8 @@ Low-level API pointer. The *buf* pointer is moved if the internal buffer is moved in memory. - The *buf* position inside the internal buffer is left unchanged. + The *buf* relative position within the internal buffer is left + unchanged. On error, set an exception and return ``NULL``. @@ -330,7 +334,7 @@ Example of code using the soft deprecated if (result == NULL) { return NULL; } - if (safe_memcpy(PyBytes_AS_STRING(result), start, num_bytes) < 0) { + if (copy_bytes(PyBytes_AS_STRING(result), start, num_bytes) < 0) { Py_CLEAR(result); } return result; @@ -341,7 +345,7 @@ It can now be updated to:: if (writer == NULL) { return NULL; } - if (safe_memcpy(PyBytesWriter_GetData(writer), start, num_bytes) < 0) { + if (copy_bytes(PyBytesWriter_GetData(writer), start, num_bytes) < 0) { PyBytesWriter_Discard(writer); return NULL; }