Skip to content

Commit b66af5a

Browse files
authored
Update dev guide with recent compiler and bytecode simplifications (#1154)
1 parent d994dff commit b66af5a

File tree

1 file changed

+20
-35
lines changed

1 file changed

+20
-35
lines changed

internals/compiler.rst

Lines changed: 20 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,9 @@ In CPython, the compilation from source code to bytecode involves several steps:
1414
1. Tokenize the source code (:cpy-file:`Parser/tokenizer.c`)
1515
2. Parse the stream of tokens into an Abstract Syntax Tree
1616
(:cpy-file:`Parser/parser.c`)
17-
3. Transform AST into a Control Flow Graph (:cpy-file:`Python/compile.c`)
18-
4. Emit bytecode based on the Control Flow Graph (:cpy-file:`Python/compile.c`)
17+
3. Transform AST into an instruction sequence (:cpy-file:`Python/compile.c`)
18+
4. Construct a Control Flow Graph and apply optimizations to it (:cpy-file:`Python/flowgraph.c`)
19+
5. Emit bytecode based on the Control Flow Graph (:cpy-file:`Python/assemble.c`)
1920

2021
The purpose of this document is to outline how these steps of the process work.
2122

@@ -433,18 +434,6 @@ the variable.
433434
As for handling the line number on which a statement is defined, this is
434435
handled by ``compiler_visit_stmt()`` and thus is not a worry.
435436

436-
In addition to emitting bytecode based on the AST node, handling the
437-
creation of basic blocks must be done. Below are the macros and
438-
functions used for managing basic blocks:
439-
440-
``NEXT_BLOCK(struct compiler *)``
441-
create an implicit jump from the current block
442-
to the new block
443-
``compiler_new_block(struct compiler *)``
444-
create a block but don't use it (used for generating jumps)
445-
``compiler_use_next_block(struct compiler *, basicblock *block)``
446-
set a previously created block as a current block
447-
448437
Once the CFG is created, it must be flattened and then final emission of
449438
bytecode occurs. Flattening is handled using a post-order depth-first
450439
search. Once flattened, jump offsets are backpatched based on the
@@ -460,15 +449,13 @@ not as simple as just suddenly introducing new bytecode in the AST ->
460449
bytecode step of the compiler. Several pieces of code throughout Python depend
461450
on having correct information about what bytecode exists.
462451

463-
First, you must choose a name and a unique identifier number. The official
464-
list of bytecode can be found in :cpy-file:`Lib/opcode.py`. If the opcode is to
465-
take an argument, it must be given a unique number greater than that assigned to
466-
``HAVE_ARGUMENT`` (as found in :cpy-file:`Lib/opcode.py`).
467-
468-
Once the name/number pair has been chosen and entered in :cpy-file:`Lib/opcode.py`,
469-
you must also enter it into :cpy-file:`Doc/library/dis.rst`, and regenerate
470-
:cpy-file:`Include/opcode.h` and :cpy-file:`Python/opcode_targets.h` by running
471-
``make regen-opcode regen-opcode-targets``.
452+
First, you must choose a name, implement the bytecode in
453+
:cpy-file:`Python/bytecodes.c`, and add a documentation entry in
454+
:cpy-file:`Doc/library/dis.rst`. Then run ``make regen-cases`` to
455+
assign a number for it (see :cpy-file:`Include/opcode_ids.h`) and
456+
regenerate a number of files with the actual implementation of the
457+
bytecodes (:cpy-file:`Python/generated_cases.c.h`) and additional
458+
files with metadata about them.
472459

473460
With a new bytecode you must also change what is called the magic number for
474461
.pyc files. The variable ``MAGIC_NUMBER`` in
@@ -478,23 +465,21 @@ to be recompiled by the interpreter on import. Whenever ``MAGIC_NUMBER`` is
478465
changed, the ranges in the ``magic_values`` array in :cpy-file:`PC/launcher.c`
479466
must also be updated. Changes to :cpy-file:`Lib/importlib/_bootstrap_external.py`
480467
will take effect only after running ``make regen-importlib``. Running this
481-
command before adding the new bytecode target to :cpy-file:`Python/ceval.c` will
482-
result in an error. You should only run ``make regen-importlib`` after the new
483-
bytecode target has been added.
468+
command before adding the new bytecode target to :cpy-file:`Python/bytecodes.c`
469+
(followed by ``make regen-cases``) will result in an error. You should only run
470+
``make regen-importlib`` after the new bytecode target has been added.
484471

485472
.. note:: On Windows, running the ``./build.bat`` script will automatically
486473
regenerate the required files without requiring additional arguments.
487474

488475
Finally, you need to introduce the use of the new bytecode. Altering
489-
:cpy-file:`Python/compile.c` and :cpy-file:`Python/ceval.c` will be the primary
490-
places to change. You must add the case for a new opcode into the 'switch'
491-
statement in the ``stack_effect()`` function in :cpy-file:`Python/compile.c`.
492-
If the new opcode has a jump target, you will need to update macros and
493-
'switch' statements in :cpy-file:`Python/compile.c`. If it affects a control
494-
flow or the block stack, you may have to update the ``frame_setlineno()``
495-
function in :cpy-file:`Objects/frameobject.c`. :cpy-file:`Lib/dis.py` may need
496-
an update if the new opcode interprets its argument in a special way (like
497-
``FORMAT_VALUE`` or ``MAKE_FUNCTION``).
476+
:cpy-file:`Python/compile.c`, :cpy-file:`Python/bytecodes.c` will be the
477+
primary places to change. Optimizations in :cpy-file:`Python/flowgraph.c`
478+
may also need to be updated.
479+
If the new opcode affects a control flow or the block stack, you may have
480+
to update the ``frame_setlineno()`` function in :cpy-file:`Objects/frameobject.c`.
481+
:cpy-file:`Lib/dis.py` may need an update if the new opcode interprets its
482+
argument in a special way (like ``FORMAT_VALUE`` or ``MAKE_FUNCTION``).
498483

499484
If you make a change here that can affect the output of bytecode that
500485
is already in existence and you do not change the magic number constantly, make

0 commit comments

Comments
 (0)