@@ -14,8 +14,9 @@ In CPython, the compilation from source code to bytecode involves several steps:
14141. Tokenize the source code (:cpy-file: `Parser/tokenizer.c `)
15152. Parse the stream of tokens into an Abstract Syntax Tree
1616 (:cpy-file: `Parser/parser.c `)
17- 3. Transform AST into a Control Flow Graph (:cpy-file: `Python/compile.c `)
18- 4. Emit bytecode based on the Control Flow Graph (:cpy-file: `Python/compile.c `)
17+ 3. Transform AST into an instruction sequence (:cpy-file: `Python/compile.c `)
18+ 4. Construct a Control Flow Graph and apply optimizations to it (:cpy-file: `Python/flowgraph.c `)
19+ 5. Emit bytecode based on the Control Flow Graph (:cpy-file: `Python/assemble.c `)
1920
2021The purpose of this document is to outline how these steps of the process work.
2122
@@ -433,18 +434,6 @@ the variable.
433434As for handling the line number on which a statement is defined, this is
434435handled by ``compiler_visit_stmt() `` and thus is not a worry.
435436
436- In addition to emitting bytecode based on the AST node, handling the
437- creation of basic blocks must be done. Below are the macros and
438- functions used for managing basic blocks:
439-
440- ``NEXT_BLOCK(struct compiler *) ``
441- create an implicit jump from the current block
442- to the new block
443- ``compiler_new_block(struct compiler *) ``
444- create a block but don't use it (used for generating jumps)
445- ``compiler_use_next_block(struct compiler *, basicblock *block) ``
446- set a previously created block as a current block
447-
448437Once the CFG is created, it must be flattened and then final emission of
449438bytecode occurs. Flattening is handled using a post-order depth-first
450439search. Once flattened, jump offsets are backpatched based on the
@@ -460,15 +449,13 @@ not as simple as just suddenly introducing new bytecode in the AST ->
460449bytecode step of the compiler. Several pieces of code throughout Python depend
461450on having correct information about what bytecode exists.
462451
463- First, you must choose a name and a unique identifier number. The official
464- list of bytecode can be found in :cpy-file: `Lib/opcode.py `. If the opcode is to
465- take an argument, it must be given a unique number greater than that assigned to
466- ``HAVE_ARGUMENT `` (as found in :cpy-file: `Lib/opcode.py `).
467-
468- Once the name/number pair has been chosen and entered in :cpy-file: `Lib/opcode.py `,
469- you must also enter it into :cpy-file: `Doc/library/dis.rst `, and regenerate
470- :cpy-file: `Include/opcode.h ` and :cpy-file: `Python/opcode_targets.h ` by running
471- ``make regen-opcode regen-opcode-targets ``.
452+ First, you must choose a name, implement the bytecode in
453+ :cpy-file: `Python/bytecodes.c `, and add a documentation entry in
454+ :cpy-file: `Doc/library/dis.rst `. Then run ``make regen-cases `` to
455+ assign a number for it (see :cpy-file: `Include/opcode_ids.h `) and
456+ regenerate a number of files with the actual implementation of the
457+ bytecodes (:cpy-file: `Python/generated_cases.c.h `) and additional
458+ files with metadata about them.
472459
473460With a new bytecode you must also change what is called the magic number for
474461.pyc files. The variable ``MAGIC_NUMBER `` in
@@ -478,23 +465,21 @@ to be recompiled by the interpreter on import. Whenever ``MAGIC_NUMBER`` is
478465changed, the ranges in the ``magic_values `` array in :cpy-file: `PC/launcher.c `
479466must also be updated. Changes to :cpy-file: `Lib/importlib/_bootstrap_external.py `
480467will take effect only after running ``make regen-importlib ``. Running this
481- command before adding the new bytecode target to :cpy-file: `Python/ceval .c ` will
482- result in an error. You should only run `` make regen-importlib `` after the new
483- bytecode target has been added.
468+ command before adding the new bytecode target to :cpy-file: `Python/bytecodes .c `
469+ (followed by `` make regen-cases ``) will result in an error. You should only run
470+ `` make regen-importlib `` after the new bytecode target has been added.
484471
485472.. note :: On Windows, running the ``./build.bat`` script will automatically
486473 regenerate the required files without requiring additional arguments.
487474
488475Finally, you need to introduce the use of the new bytecode. Altering
489- :cpy-file: `Python/compile.c ` and :cpy-file: `Python/ceval.c ` will be the primary
490- places to change. You must add the case for a new opcode into the 'switch'
491- statement in the ``stack_effect() `` function in :cpy-file: `Python/compile.c `.
492- If the new opcode has a jump target, you will need to update macros and
493- 'switch' statements in :cpy-file: `Python/compile.c `. If it affects a control
494- flow or the block stack, you may have to update the ``frame_setlineno() ``
495- function in :cpy-file: `Objects/frameobject.c `. :cpy-file: `Lib/dis.py ` may need
496- an update if the new opcode interprets its argument in a special way (like
497- ``FORMAT_VALUE `` or ``MAKE_FUNCTION ``).
476+ :cpy-file: `Python/compile.c `, :cpy-file: `Python/bytecodes.c ` will be the
477+ primary places to change. Optimizations in :cpy-file: `Python/flowgraph.c `
478+ may also need to be updated.
479+ If the new opcode affects a control flow or the block stack, you may have
480+ to update the ``frame_setlineno() `` function in :cpy-file: `Objects/frameobject.c `.
481+ :cpy-file: `Lib/dis.py ` may need an update if the new opcode interprets its
482+ argument in a special way (like ``FORMAT_VALUE `` or ``MAKE_FUNCTION ``).
498483
499484If you make a change here that can affect the output of bytecode that
500485is already in existence and you do not change the magic number constantly, make
0 commit comments