diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 0062df25475..b0123590eda 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -671,6 +671,8 @@ peps/pep-0789.rst @njsmith peps/pep-0790.rst @hugovk peps/pep-0791.rst @vstinner # ... +peps/pep-0793.rst @encukou +# ... peps/pep-0801.rst @warsaw # ... peps/pep-2026.rst @hugovk diff --git a/peps/pep-0793.rst b/peps/pep-0793.rst new file mode 100644 index 00000000000..58c6e20f378 --- /dev/null +++ b/peps/pep-0793.rst @@ -0,0 +1,607 @@ +PEP: 793 +Title: PyModExport: A new entry point for C extension modules +Author: Petr Viktorin +Discussions-To: Pending +Status: Draft +Type: Standards Track +Created: 23-May-2025 +Python-Version: 3.15 +Post-History: `14-Mar-2025 `__, + + +Abstract +======== + +In this PEP, we propose a new entry point for C extension modules, by which +one can define a module using an array of ``PyModuleDef_Slot`` structures +without an enclosing ``PyModuleDef`` structure. +This allows extension authors to avoid using a statically allocated +``PyObject``, lifting the most common obstacle to making one compiled library +file usable with both regular and free-threaded builds of CPython. + +To make this viable, we also specify new module slot types to replace +``PyModuleDef``'s fields, and to allow adding a *token* similar to the +``Py_tp_token`` used for type objects. + +We also add an API for defining modules from slots dynamically. + + +Background & Motivation +======================= + +The memory layout of Python objects differs between regular and free-threading +builds. +So, an ABI that supports both regular and free-threading builds cannot include +the current ``PyObject`` memory layout. To stay compatible with existing ABI +(and API), it cannot support statically allocated Python objects. + +There is one type of object that is needed in most extension modules +and is allocated statically in virtually all cases: the ``PyModuleDef`` returned +from the module export hooks (that is, ``PyInit_*`` functions). + +Module export hooks (``PyInit_*`` functions) can return two kinds of objects: + +1. A fully initialized module object (for so-called + *single-phase initialization*). This was the only option in 3.4 and below. + Modules created this way have surprising (but backwards-compatible) + behaviour around multiple interpreters or repeated loading. + (Specifically, the *contents* of such a module's ``__dict__`` are shared + across all instances of the module object.) + + The module returned is typically created using the ``PyModule_Create`` + function, which requires a statically allocated (or at least long-lived) + ``PyModuleDef`` struct. + + It is possible to bypass this using the lower-level ``PyModule_New*`` API. + This avoids the need for ``PyModuleDef``, but offers much less functionality. + +2. A ``PyModuleDef`` object containing a description of how to create a module + object. This option, *multi-phase initialization*, was introduced in + :pep:`489`; see its motivation for why it exists. + +The interpreter cannot distinguish between these cases before the export hook +is called. + + +The interpreter switch +---------------------- + +Python 3.12 added a way for modules to mark whether they may be +loaded in a subinterpreter: the ``Py_mod_multiple_interpreters`` slot. +Setting it to the “not supported” value signals that an extension +can only be loaded in the main interpreter. + +Unfortunately, Python can only get this information by *calling* the +module export hook. +For single-phase modules, that creates the module object and runs arbitrary +initialization code. +For modules that set ``Py_mod_multiple_interpreters`` to “not supported”, +this initialization needs to happen in the main interpreter. + +To make this work, if a new module is loaded in a sub-interpreter, Python +temporarily switches to the main interpreter, calls the export hook +there, and then either switches back and redoes the import, or fails. + +This unnecessary and fragile extra work highlights the underlying design issue: +Python has no way to get information about an extension +before the extension can, potentially, fully initialize itself. + + +Rationale +========= + +For avoiding the module export hook requiring a statically allocated +``PyObject*``, two options come to mind: + +- Returning a *dynamically* allocated object, whose ownership is transferred + to the interpreter. This stucture could be very similar to the existing + ``PyModuleDef``, since it needs to contain the same data. + Unlike the existing ``PyModuleDef``, this one would need to be + reference-counted so that it both outlives “its” module and does not leak. + +- Adding a new export hook, which does not return a ``PyObject*``. + + This was considered already for Python 3.5 in :pep:`489`, but rejected: + + Keeping only the PyInit hook name, even if it’s not entirely appropriate + for exporting a definition, yielded a much simpler solution. + + Alas, after a decade of fixing the implications of this choice, the solution + is no longer simple. + +A new hook will also allow Python to avoid the second issue mentioned in +Motivation -- the interpreter switch. +Effectivelly, it will add a new phase to multi-phase initialization, in which +Python can check whether the module is compatible. + + +Using slots without a wrapper struct +------------------------------------ + +The existing ``PyModuleDef`` is a struct with some fixed fields and +a “slots” array. +Unlike slots, the fixed fields cannot be individually deprecated and replaced. +This proposal does away with fixed fields and proposes using a slots array +directly, without a wrapper struct. + +The ``PyModuleDef_Slot`` struct does have some downsides compared to fixed fields. +We believe these are fixable, but leave that out of scope of this PEP +(see “Improving slots in general” in the Possible Future Directions section). + + +Tokens +------ + +A static ``PyModuleDef`` has another purpose besides describing +how a module should be created. +As a statically allocated singleton that remains attached to the module object, +it allows extension authors to check whether a given Python module is “theirs”: +if a module object has a known ``PyModuleDef``, its module state will have +a known memory layout. + +An analogous issue was solved for types by adding ``Py_tp_token``. +This proposal adds the same mechanism to modules. + +Unlike types, the import mechanism often has a pointer that's known to be +suitable as a token value; in these cases it can provide a default token. +Thus, module tokens do not need a variant of the inelegant ``Py_TP_USE_SPEC``. + + +Specification +============= + +The export hook +--------------- + +When importing an extension module, Python will now first look for an export hook +like this: + +.. code-block:: c + + PyModuleDef_Slot *PyModExport_(PyObject *spec); + +where ```` is the name of the module. +For non-ASCII names, it will instead look for ``PyModExportU_``, +with ```` encoded as for existing ``PyInitU_*`` hooks +(that is, *punycode*-encoded with hyphens replaced by underscores). + +If not found, the import will continue as in previous Python versions (that is, +by looking up a ``PyInit_*`` or ``PyInitU_*`` function). + +If found, Python will call the hook with the appropriate +``importlib.machinery.ModuleSpec`` object as *spec*. +To support duck-typing, extensions should not type-check this object, and +if possible, implement fallbacks for any missing attributes. +(The argument is mainly meant for introspection, testing, or use with +specialized loaders.) + +On failure, the export hook must return NULL with an exception set. +This will cause the import to fail. +(Python will not fall back to ``PyInit_*`` on error.) + +On success, the hook must return a pointer to an array of +``PyModuleDef_Slot`` structs. +Python will then create a module based on the given slots by calling functions +proposed below: ``PyModule_FromSlotsAndSpec`` and ``PyModule_Exec``. +See their description for requirements on the slots array. + +The returned array and all data it points to (recursively) must remain valid +and constant until runtime shutdown. +(We expect functions to export a static constant, or one of several constants +chosen depending on, for example, ``Py_Version``. Dynamic behaviour should +generally happen in the ``Py_mod_create`` and ``Py_mod_exec`` functions.) + + +Dynamic creation +---------------- + +A new function will be added to create a module from an array of slots: + +.. code-block:: c + + PyObject *PyModule_FromSlotsAndSpec(PyModuleDef_Slot *slots, PyObject *spec) + +The *slots* argument must point to an array of ``PyModuleDef_Slot`` structures, +terminated by a slot with ``slot=0`` (typically written as ``{0}`` in C). +There are no required slots, though *slots* must not be ``NULL``. +It follows that minimal input contains only the terminator slot. + +The *spec* argument is a duck-typed ModuleSpec-like object, meaning that any +attributes defined for ``importlib.machinery.ModuleSpec`` have matching +semantics. +The ``name`` attribute is required, but this limitation may be lifted in the +future. +The ``name`` will be used *instead of* the ``Py_mod_name`` slot (just like +``PyModule_FromDefAndSpec`` ignores ``PyModuleDef.m_name``). + +The slots arrays for both ``PyModule_FromSlotsAndSpec`` and the new export hook +will only allow up to one ``Py_mod_exec`` slot. +Arrays in ``PyModuleDef.m_slots`` may have more; this will not change. +This limitation is easy to work around and multiple ``exec`` slots are rarely +used [#multiexec]_. + +For modules created without a ``PyModuleDef``, the ``Py_mod_create`` function +will be called with ``NULL`` for the second argument (*def*). +(In the future, if we find a use case for passing the input slots array, a new +slot with an updated signature can be added.) + +Unlike the ``PyModExport_*`` hook, the *slots* array may be changed or +destroyed after the ``PyModule_FromSlotsAndSpec`` call. +(That is, Python must take a copy of all input data.) +As an exception, any ``PyMethodDef`` array given by ``Py_mod_methods`` +must be statically allocated (or be otherwise guaranteed to outlive the +objects created from it). This limitation may be lifted in the future. + +A new function, ``PyModule_Exec``, will be added to run the ``exec`` slot(s) for a module. +This acts like ``PyModule_ExecDef``, but supports modules created using slots, +and does not take an explicit *def*: + +.. code-block:: c + + int PyModule_Exec(PyObject *module) + +Calling this is required to fully initialize a module. +``PyModule_FromSlotsAndSpec`` will *not* run it (just like +``PyModule_FromDefAndSpec`` does not call ``PyModule_ExecDef``). + +For modules created from a *def*, calling this is equivalent to +calling ``PyModule_ExecDef(module, PyModule_GetDef(module))``. + + +Tokens +------ + +Module objects will optionally store a “token”: a ``void*`` pointer +similar to ``Py_tp_token`` for types. + +If specified, using a new ``Py_mod_token`` slot, the module token must: + +- outlive the module, so it's not reused for something else while the module + exists; and +- "belong" to the extension module where the module lives, so it will not + clash with other extension modules. + +(Typically, it should point to a static constant.) + +Modules created using the ``PyModule_FromSlotsAndSpec`` or the +``PyModExport_`` export hook can use a new ``Py_mod_token`` slot +to set the token. + +Modules created from a ``PyModuleDef`` will have the token set to that +definition. An explicit ``Py_mod_token`` slot will we rejected for these. +(This allows implementations to share storage for the token and def.) + +For modules created via the new export hook, the token +will be set to the address of the slots array by default. +(This does **not** apply to modules created by ``PyModule_FromSlotsAndSpec``, +as that function's input might not outlive the module.) + +The token will not be set for non-``PyModuleType`` instances. + +A ``PyModule_GetToken`` function will be added to get the token. +Since the result may be ``NULL``, it will be passed via a pointer; the function +will return 0 on success and -1 on failure: + +.. code-block:: c + + int PyModule_GetToken(PyObject *, void **token_p) + +A new ``PyType_GetModuleByToken`` function will be added, with a signature +like ``PyType_GetModuleByDef`` but a ``void *token`` argument, +and the same behaviour except matching tokens, rather than only defs. + + +New slots +--------- + +For each field of the ``PyModuleDef`` struct, except ones from +``PyModuleDef_HEAD_INIT``, a new slot ID will be provided: ``Py_mod_name``, +``Py_mod_doc``, ``Py_mod_clear``, etc. +Slots related to the module state rather than the module object will +use a ``Py_mod_state_`` prefix. +See :ref:`pep793-api-summary` for a full list. + +All new slots -- these and ``Py_tp_token`` discussed above -- may not be +repeated in the slots array, and may not be used in a +``PyModuleDef.m_slots`` array. +They may not have a ``NULL`` value (instead, the slot can be omitted entirely). + +Note that currently, for modules created from a *spec* (that is, using +``PyModule_FromDefAndSpec``), the ``PyModuleDef.m_name`` member is ignored +and the name from the spec is used instead. +All API proposed in this document creates modules from a *spec*, and it will +ignore ``Py_mod_name`` in the same way. +The slot will be optional, but extension authors are strongly encouraged to +include it for the benefit of future APIs, external tooling, debugging, +and introspection. + + +Bits & Pieces +------------- + +A ``PyMODEXPORT_FUNC`` macro will be added, similar to the ``PyMODINIT_FUNC`` +macro but with ``PyModuleDef_Slot *`` as the return type. + +A ``PyModule_GetStateSize`` function will be added to retrieve the size set +by ``Py_mod_state_size`` or ``PyModuleDef.m_size``. +Since the result may be -1 (for single-phase-init modules), it will be output +via a pointer; the function will return 0 on success and -1 on failure: + +.. code-block:: c + + int PyModule_GetStateSize(PyObject *, Py_ssize_t *result); + + +.. _pep793-api-summary: + +New API summary +--------------- +Python will load a new module export hook, with two variants: + +.. code-block:: c + + PyModuleDef_Slot *PyModExport_(PyObject *spec); + PyModuleDef_Slot *PyModExportU_(PyObject *spec); + +The following functions will be added: + +.. code-block:: c + + PyObject *PyModule_FromSlotsAndSpec(PyModuleDef_Slot *, PyObject *spec) + int PyModule_Exec(PyObject *) + int PyModule_GetToken(PyObject *, void**) + PyObject *PyType_GetModuleByToken(PyTypeObject *type, void *token) + int PyModule_GetStateSize(PyObject *, Py_ssize_t *result); + +A new macro will be added: + +.. code-block:: c + + PyMODEXPORT_FUNC + +And new slot types (``#define``\d names for small integers): + +- ``Py_mod_name`` (equivalent to ``PyModuleDef.m_name``) +- ``Py_mod_doc`` (equivalent to ``PyModuleDef.m_doc``) +- ``Py_mod_state_size`` (equivalent to ``PyModuleDef.m_size``) +- ``Py_mod_methods`` (equivalent to ``PyModuleDef.m_methods``) +- ``Py_mod_state_traverse`` (equivalent to ``PyModuleDef.m_traverse``) +- ``Py_mod_state_clear`` (equivalent to ``PyModuleDef.m_clear``) +- ``Py_mod_state_free`` (equivalent to ``PyModuleDef.m_free``) +- ``Py_mod_token`` (see above) + +All this will be added to the Limited API. + + +Backwards Compatibility +======================= + +If an existing module is ported to use the new mechanism, then +``PyModule_GetDef`` will start returning ``NULL`` for it. +(This matches ``PyModule_GetDef``'s current documentation.) +We claim that how a module was defined is an implementation detail of that +module, so this should not be considered a breaking change. + +Similarly, ``PyType_GetModuleByDef`` will not match modules that are not +defined using a *def*. +The new ``PyType_GetModuleByToken`` function may be used instead. + +The ``Py_mod_create`` function may now be called with ``NULL`` for the second +argument. +This could trip people porting from *def* to *slots*, so it needs to be +mentioned in porting notes. + + +Forward compatibility +--------------------- + +If a module defines the new export hook, CPython versions that implement this +PEP will ignore the traditional ``PyInit_*`` hook. + +Extensions that straddle Python versions are expected to define both hooks; +each build of CPython will “pick” the newest one that it supports. + + +.. _pep793-porting-notes: + +Porting guide +------------- + +Here is a guide to convert an existing module to the new API, including +some tricky edge cases. +It should be moved to a HOWTO in the documentation. + +#. Scan your code for uses of ``PyModule_GetDef``. This function will + return ``NULL`` for modules that use the new mechanism. Instead: + + * For getting the contents of the module's ``PyModuleDef``, use the C struct + directly. Alternatively, get attributes from the module using, for + example, ``PyModule_GetNameObject``, the ``__doc__`` attribute, and + ``PyModule_GetStateSize``. + (Note that Python code can mutate a module's attributes.) + * For testing if a module object is “yours”, use ``PyModule_GetToken`` + instead. + Later in this guide, you'll set the token to *be* the existing + ``PyModuleDef`` structure. + +#. Scan your code for uses of ``PyType_GetModuleByDef``, and replace them by + ``PyType_GetModuleByToken``. + Later in this guide, you'll set the token to *be* the existing + ``PyModuleDef`` structure. + +#. Look at the function identified by ``Py_mod_create``, if any. + Make sure that it does not use its second argument (``PyModuleDef``), + as it will be called with ``NULL``. + Instead of the argument, use the existing ``PyModuleDef`` struct directly. +#. If using multiple ``Py_mod_exec`` slots, consolidate them: pick one of + the functions, or write a new one, and call the others from it. + Remove all but one ``Py_mod_exec`` slots. +#. Make a copy of the existing ``PyModuleDef_Slot`` array pointed to by + the ``m_slots`` member of your ``PyModuleDef``. If you don't have an + existing slots array, create one like this: + + .. code-block:: c + + static PyModuleDef_Slot module_slots[] = { + {0} + }; + + Give this array a unique name. + Further examples will assume that you've named it ``module_slots``. + +#. Add slots for all members of the existing ``PyModuleDef`` structure. + See :ref:`pep793-api-summary` for a list of the new slots. + For example, to add a name and docstring: + + .. code-block:: c + + static PyModuleDef_Slot module_slots[] = { + {Py_mod_name, "mymodule"}, + {Py_mod_doc, (char*)PyDoc_STR("my docstring")}, + // ... (keep existing slots here) + {0} + }; + +#. If you switched from ``PyModule_GetDef`` to ``PyModule_GetToken``, + and/or from ``PyType_GetModuleByDef`` to ``PyType_GetModuleByToken``, + add a ``Py_mod_token`` slot pointing to the existing ``PyModuleDef`` struct: + + .. code-block:: c + + static PyModuleDef_Slot module_slots[] = { + // ... (keep existing slots here) + {Py_mod_token, your_module_def}, + {0} + }; + + +#. Add a new export hook. + + .. code-block:: c + + PyMODEXPORT_FUNC PyModExport_examplemodule(PyObject); + + PyMODEXPORT_FUNC + PyModExport_examplemodule(PyObject *spec) + { + return module_slots; + } + +The new export hook will be used on Python 3.15 and above. +Once your module no longer supports lower versions, delete the ``PyInit_`` +function and any unused data. + + + +Security Implications +===================== + +None known + + +How to Teach This +================= + +In addition to regular reference docs, the :ref:`pep793-porting-notes` should +be added as a new HOWTO. + + +Example +======= + +.. literalinclude:: pep-0793/examplemodule.c + :language: c + + + +Reference Implementation +======================== + +A draft implementation is available in a +`GitHub branch `_. + + +Open Issues +=========== + +(Add yours!) + + +Rejected Ideas +============== + + +Exporting a data pointer rather than a function +----------------------------------------------- + +This proposes a new module export *function*, which is expected to +return static constant data. +That data could be exported directly as a data pointer. + +With a function, we avoid dealing with a new kind of exported symbol. + +A function also allows the extension to introspect its environment in a limited +way -- for example, to tailor the returned data to the current Python version. + + +Possible Future Directions +========================== + +These ideas are out of scope for *this* proposal. + +Improving slots in general +-------------------------- + +Slots -- and specifically the existing ``PyModuleDef_Slot`` -- do have a few +shortcomings. The most important are: + +- Type safety: ``void *`` is used for data pointers, function pointers + and small integers, requiring casting that is technically undefined + behaviour in C -- but works in practice on all relevant architectures. + (For example: ``Py_tp_doc`` marks a string; ``Py_mod_gil`` an integer.) + +- Limited forward compatibility: if an extension provides a slot ID that's + unknown to the current interpreter, module creation will fail. + This makes it cumbersome to use “optional” features -- ones that should only + take effect if the interpreter supports them. + (The recently added slots ``Py_mod_gil`` and ``Py_mod_multiple_interpreters`` + are good examples.) + + One workaround is to check ``Py_Version`` in the export function, + and return a slot array suitable for the current interpreter. + +Updating defaults +----------------- + +With a new API, we could update defaults for the +``Py_mod_multiple_interpreters`` and ``Py_mod_gil`` slots. + + +The inittab +----------- + +We'll need to allow ``PyModuleDef``-less slots in the inittab -- +that is, add a new variant of ``PyImport_ExtendInittab``. +Should that be part of this PEP? + +The inittab is used for embedding, where a common/stable ABI is not that +important. So, it might be OK to leave this to a later change. + + +Footnotes +========= + +.. [#multiexec] A `quick survey`_ of multiple ``Py_mod_exec`` slots found zero + uses in the top 15,000 PyPI projects, and three in the stardard library + (including tests for the feature). + The easy workaround is consolidating the ``exec`` functions; see + :ref:`pep793-porting-notes` for details. + + .. _quick survey: https://github.com/python/peps/pull/4435#discussion_r2105731314 + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. diff --git a/peps/pep-0793/examplemodule.c b/peps/pep-0793/examplemodule.c new file mode 100644 index 00000000000..c05ceb73040 --- /dev/null +++ b/peps/pep-0793/examplemodule.c @@ -0,0 +1,65 @@ +/* +Example module with C-level module-global state, and a simple function to +update and query it. +Once compiled and renamed to not include a version tag (for example +examplemodule.so on Linux), this will run succesfully on both regular +and free-threaded builds. + +Python usage: + +import examplemodule +print(examplemodule.increment_value()) # 0 +print(examplemodule.increment_value()) # 1 +print(examplemodule.increment_value()) # 2 +print(examplemodule.increment_value()) # 3 + +*/ + +// Avoid CPython-version-specific ABI (inline functions & macros): +#define Py_LIMITED_API 0x030f0000 // 3.15 + +#include + +typedef struct { + int value; +} examplemodule_state; + +static PyObject * +increment_value(PyObject *module, PyObject *_ignored) +{ + examplemodule_state *state = PyModule_GetState(module); + int result = ++(state->value); + return PyLong_FromLong(result); +} + +static PyMethodDef examplemodule_methods[] = { + {"increment_value", increment_value, METH_NOARGS}, + {NULL} +}; + +static int +examplemodule_exec(PyObject *module) { + examplemodule_state *state = PyModule_GetState(module); + state->value = -1; + return 0; +} + +PyDoc_STRVAR(examplemodule_doc, "Example extension."); + +static PyModuleDef_Slot examplemodule_slots[] = { + {Py_mod_name, "examplemodule"}, + {Py_mod_doc, (char*)examplemodule_doc}, + {Py_mod_exec, (void*)examplemodule_exec}, + {Py_mod_methods, examplemodule_methods}, + {Py_mod_state_size, (void*)sizeof(examplemodule_state)}, + {0} +}; + +// Avoid "implicit declaration of function" warning: +PyMODEXPORT_FUNC PyModExport_examplemodule(PyObject *); + +PyMODEXPORT_FUNC +PyModExport_examplemodule(PyObject *spec) +{ + return examplemodule_slots; +}