Skip to content

Commit afc2aeb

Browse files
authored
gh-134160: "First extension module" tutorial improvements (GH-144183)
- Pass -v to pip, so compiler output is visible - Move the call ``spam.system(3)`` up so that error handling is tested right after it's added - Use `PyUnicode_AsUTF8AndSize` as `PyUnicode_AsUTF8` is not in the Limited API. - Add a footnote about embedded NULs.
1 parent ebbb2ca commit afc2aeb

File tree

2 files changed

+39
-19
lines changed

2 files changed

+39
-19
lines changed

Doc/extending/first-extension-module.rst

Lines changed: 38 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,10 @@ Now, build install the *project in the current directory* (``.``) via ``pip``:
171171

172172
.. code-block:: sh
173173
174-
python -m pip install .
174+
python -m pip -v install .
175+
176+
The ``-v`` (``--verbose``) option causes ``pip`` to show the output from
177+
the compiler, which is often useful during development.
175178

176179
.. tip::
177180

@@ -460,7 +463,7 @@ So, we'll need to *encode* the data, and we'll use the UTF-8 encoding for it.
460463
and the C API has special support for it.)
461464

462465
The function to encode a Python string into a UTF-8 buffer is named
463-
:c:func:`PyUnicode_AsUTF8` [#why-pyunicodeasutf8]_.
466+
:c:func:`PyUnicode_AsUTF8AndSize` [#why-pyunicodeasutf8]_.
464467
Call it like this:
465468

466469
.. code-block:: c
@@ -469,31 +472,31 @@ Call it like this:
469472
static PyObject *
470473
spam_system(PyObject *self, PyObject *arg)
471474
{
472-
const char *command = PyUnicode_AsUTF8(arg);
475+
const char *command = PyUnicode_AsUTF8AndSize(arg, NULL);
473476
int status = 3;
474477
PyObject *result = PyLong_FromLong(status);
475478
return result;
476479
}
477480
478-
If :c:func:`PyUnicode_AsUTF8` is successful, *command* will point to the
479-
resulting array of bytes.
481+
If :c:func:`PyUnicode_AsUTF8AndSize` is successful, *command* will point to the
482+
resulting C string -- a zero-terminated array of bytes [#embedded-nul]_.
480483
This buffer is managed by the *arg* object, which means we don't need to free
481484
it, but we must follow some rules:
482485

483486
* We should only use the buffer inside the ``spam_system`` function.
484-
When ``spam_system`` returns, *arg* and the buffer it manages might be
487+
After ``spam_system`` returns, *arg* and the buffer it manages might be
485488
garbage-collected.
486489
* We must not modify it. This is why we use ``const``.
487490

488-
If :c:func:`PyUnicode_AsUTF8` was *not* successful, it returns a ``NULL``
491+
If :c:func:`PyUnicode_AsUTF8AndSize` was *not* successful, it returns a ``NULL``
489492
pointer.
490493
When calling *any* Python C API, we always need to handle such error cases.
491494
The way to do this in general is left for later chapters of this documentation.
492495
For now, be assured that we are already handling errors from
493496
:c:func:`PyLong_FromLong` correctly.
494497

495-
For the :c:func:`PyUnicode_AsUTF8` call, the correct way to handle errors is
496-
returning ``NULL`` from ``spam_system``.
498+
For the :c:func:`PyUnicode_AsUTF8AndSize` call, the correct way to handle
499+
errors is returning ``NULL`` from ``spam_system``.
497500
Add an ``if`` block for this:
498501

499502

@@ -503,7 +506,7 @@ Add an ``if`` block for this:
503506
static PyObject *
504507
spam_system(PyObject *self, PyObject *arg)
505508
{
506-
const char *command = PyUnicode_AsUTF8(arg);
509+
const char *command = PyUnicode_AsUTF8AndSize(arg);
507510
if (command == NULL) {
508511
return NULL;
509512
}
@@ -512,7 +515,18 @@ Add an ``if`` block for this:
512515
return result;
513516
}
514517
515-
That's it for the setup.
518+
To test that error handling works, compile again, restart Python so that
519+
``import spam`` picks up the new version of your module, and try passing
520+
a non-string value to your function:
521+
522+
.. code-block:: pycon
523+
524+
>>> import spam
525+
>>> spam.system(3)
526+
Traceback (most recent call last):
527+
...
528+
TypeError: bad argument type for built-in operation
529+
516530
Now, all that is left is calling the C library function :c:func:`system` with
517531
the ``char *`` buffer, and using its result instead of the ``3``:
518532

@@ -522,7 +536,7 @@ the ``char *`` buffer, and using its result instead of the ``3``:
522536
static PyObject *
523537
spam_system(PyObject *self, PyObject *arg)
524538
{
525-
const char *command = PyUnicode_AsUTF8(arg);
539+
const char *command = PyUnicode_AsUTF8AndSize(arg);
526540
if (command == NULL) {
527541
return NULL;
528542
}
@@ -543,7 +557,8 @@ system command:
543557
>>> result
544558
0
545559
546-
You might also want to test error cases:
560+
You can also test with other commands, like ``ls``, ``dir``, or one
561+
that doesn't exist:
547562

548563
.. code-block:: pycon
549564
@@ -553,11 +568,6 @@ You might also want to test error cases:
553568
>>> result
554569
32512
555570
556-
>>> spam.system(3)
557-
Traceback (most recent call last):
558-
...
559-
TypeError: bad argument type for built-in operation
560-
561571
562572
The result
563573
==========
@@ -665,3 +675,13 @@ on :py:attr:`sys.path`.
665675
type.
666676
.. [#why-pyunicodeasutf8] Here, ``PyUnicode`` refers to the original name of
667677
the Python :py:class:`str` class: ``unicode``.
678+
679+
The ``AndSize`` part of the name refers to the fact that this function can
680+
also retrieve the size of the buffer, using an output argument.
681+
We don't need this, so we set the second argument to NULL.
682+
.. [#embedded-nul] We're ignoring the fact that Python strings can also
683+
contain NUL bytes, which terminate a C string.
684+
In other words, our function will treat ``spam.system("foo\0bar")`` as
685+
``spam.system("foo")``.
686+
This possibility can lead to security issues, so the real ``os.system``
687+
function size checks for this case and raises an error.

Doc/includes/capi-extension/spammodule-01.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
static PyObject *
1313
spam_system(PyObject *self, PyObject *arg)
1414
{
15-
const char *command = PyUnicode_AsUTF8(arg);
15+
const char *command = PyUnicode_AsUTF8AndSize(arg, NULL);
1616
if (command == NULL) {
1717
return NULL;
1818
}

0 commit comments

Comments
 (0)