@@ -11,33 +11,40 @@ Abstract
1111
1212In CPython, the compilation from source code to bytecode involves several steps:
1313
14- 1. Tokenize the source code (:cpy-file: `Parser/tokenizer.c `)
14+ 1. Tokenize the source code (:cpy-file: `Parser/tokenizer.c `).
15152. Parse the stream of tokens into an Abstract Syntax Tree
16- (:cpy-file: `Parser/parser.c `)
17- 3. Transform AST into an instruction sequence (:cpy-file: `Python/compile.c `)
18- 4. Construct a Control Flow Graph and apply optimizations to it (:cpy-file: `Python/flowgraph.c `)
19- 5. Emit bytecode based on the Control Flow Graph (:cpy-file: `Python/assemble.c `)
16+ (:cpy-file: `Parser/parser.c `).
17+ 3. Transform AST into an instruction sequence (:cpy-file: `Python/compile.c `).
18+ 4. Construct a Control Flow Graph and apply optimizations to it (:cpy-file: `Python/flowgraph.c `).
19+ 5. Emit bytecode based on the Control Flow Graph (:cpy-file: `Python/assemble.c `).
2020
21- The purpose of this document is to outline how these steps of the process work.
21+ This document outlines how these steps of the process work.
2222
23- This document does not touch on how parsing works beyond what is needed
24- to explain what is needed for compilation. It is also not exhaustive
25- in terms of the how the entire system works. You will most likely need
26- to read some source to have an exact understanding of all details.
23+ This document only describes parsing in enough depth to explain what is needed
24+ for understanding compilation. This document provides a detailed, though not
25+ exhaustive, view of the how the entire system works. You will most likely need
26+ to read some source code to have an exact understanding of all details.
2727
2828
2929Parsing
3030=======
3131
3232As of Python 3.9, Python's parser is a PEG parser of a somewhat
33- unusual design (since its input is a stream of tokens rather than a
34- stream of characters as is more common with PEG parsers).
33+ unusual design. It is unusual in the sense that the parser's input is a stream
34+ of tokens rather than a stream of characters which is more common with PEG
35+ parsers.
3536
3637The grammar file for Python can be found in
3738:cpy-file: `Grammar/python.gram `. The definitions for literal tokens
3839(such as ``: ``, numbers, etc.) can be found in :cpy-file: `Grammar/Tokens `.
3940Various C files, including :cpy-file: `Parser/parser.c ` are generated from
40- these (see :ref: `grammar `).
41+ these.
42+
43+ .. seealso ::
44+
45+ :ref: `parser ` for a detailed description of the parser.
46+
47+ :ref: `grammar ` for a detailed description of the grammar.
4148
4249
4350Abstract syntax trees (AST)
@@ -133,9 +140,9 @@ Memory management
133140=================
134141
135142Before discussing the actual implementation of the compiler, a discussion of
136- how memory is handled is in order. To make memory management simple, an arena
137- is used. This means that a memory is pooled in a single location for easy
138- allocation and removal. What this gives us is the removal of explicit memory
143+ how memory is handled is in order. To make memory management simple, an ** arena **
144+ is used that pools memory in a single location for easy
145+ allocation and removal. This enables the removal of explicit memory
139146deallocation. Because memory allocation for all needed memory in the compiler
140147registers that memory with the arena, a single call to free the arena is all
141148that is needed to completely free all memory used by the compiler.
@@ -153,8 +160,8 @@ used. That freeing is done with ``PyArena_Free()``. This only needs to be
153160called in strategic areas where the compiler exits.
154161
155162As stated above, in general you should not have to worry about memory
156- management when working on the compiler. The technical details have been
157- designed to be hidden from you for most cases.
163+ management when working on the compiler. The technical details of memory
164+ management have been designed to be hidden from you for most cases.
158165
159166The only exception comes about when managing a PyObject. Since the rest
160167of Python uses reference counting, there is extra support added
@@ -173,7 +180,7 @@ The AST is generated from source code using the function
173180After some checks, a helper function in :cpy-file: `Parser/parser.c ` begins applying
174181production rules on the source code it receives; converting source code to
175182tokens and matching these tokens recursively to their corresponding rule. The
176- rule's corresponding rule function is called on every match. These rule
183+ production rule's corresponding rule function is called on every match. These rule
177184functions follow the format :samp: `xx_rule `. Where *xx * is the grammar rule
178185that the function handles and is automatically derived from
179186:cpy-file: `Grammar/python.gram `
@@ -293,7 +300,7 @@ number is passed as the last parameter to each ``stmt_ty`` function.
293300Control flow graphs
294301===================
295302
296- A *control flow graph * (often referenced by its acronym, CFG) is a
303+ A ** control flow graph ** (often referenced by its acronym, ** CFG ** ) is a
297304directed graph that models the flow of a program. A node of a CFG is
298305not an individual bytecode instruction, but instead represents a
299306sequence of bytecode instructions that always execute sequentially.
@@ -441,60 +448,6 @@ flattening and then a ``PyCodeObject`` is created. All of this is
441448handled by calling ``assemble() ``.
442449
443450
444- Introducing new bytecode
445- ========================
446-
447- Sometimes a new feature requires a new opcode. But adding new bytecode is
448- not as simple as just suddenly introducing new bytecode in the AST ->
449- bytecode step of the compiler. Several pieces of code throughout Python depend
450- on having correct information about what bytecode exists.
451-
452- First, you must choose a name, implement the bytecode in
453- :cpy-file: `Python/bytecodes.c `, and add a documentation entry in
454- :cpy-file: `Doc/library/dis.rst `. Then run ``make regen-cases `` to
455- assign a number for it (see :cpy-file: `Include/opcode_ids.h `) and
456- regenerate a number of files with the actual implementation of the
457- bytecodes (:cpy-file: `Python/generated_cases.c.h `) and additional
458- files with metadata about them.
459-
460- With a new bytecode you must also change what is called the magic number for
461- .pyc files. The variable ``MAGIC_NUMBER `` in
462- :cpy-file: `Lib/importlib/_bootstrap_external.py ` contains the number.
463- Changing this number will lead to all .pyc files with the old ``MAGIC_NUMBER ``
464- to be recompiled by the interpreter on import. Whenever ``MAGIC_NUMBER `` is
465- changed, the ranges in the ``magic_values `` array in :cpy-file: `PC/launcher.c `
466- must also be updated. Changes to :cpy-file: `Lib/importlib/_bootstrap_external.py `
467- will take effect only after running ``make regen-importlib ``. Running this
468- command before adding the new bytecode target to :cpy-file: `Python/bytecodes.c `
469- (followed by ``make regen-cases ``) will result in an error. You should only run
470- ``make regen-importlib `` after the new bytecode target has been added.
471-
472- .. note :: On Windows, running the ``./build.bat`` script will automatically
473- regenerate the required files without requiring additional arguments.
474-
475- Finally, you need to introduce the use of the new bytecode. Altering
476- :cpy-file: `Python/compile.c `, :cpy-file: `Python/bytecodes.c ` will be the
477- primary places to change. Optimizations in :cpy-file: `Python/flowgraph.c `
478- may also need to be updated.
479- If the new opcode affects a control flow or the block stack, you may have
480- to update the ``frame_setlineno() `` function in :cpy-file: `Objects/frameobject.c `.
481- :cpy-file: `Lib/dis.py ` may need an update if the new opcode interprets its
482- argument in a special way (like ``FORMAT_VALUE `` or ``MAKE_FUNCTION ``).
483-
484- If you make a change here that can affect the output of bytecode that
485- is already in existence and you do not change the magic number constantly, make
486- sure to delete your old .py(c|o) files! Even though you will end up changing
487- the magic number if you change the bytecode, while you are debugging your work
488- you will be changing the bytecode output without constantly bumping up the
489- magic number. This means you end up with stale .pyc files that will not be
490- recreated.
491- Running ``find . -name '*.py[co]' -exec rm -f '{}' + `` should delete all .pyc
492- files you have, forcing new ones to be created and thus allow you test out your
493- new bytecode properly. Run ``make regen-importlib `` for updating the
494- bytecode of frozen importlib files. You have to run ``make `` again after this
495- for recompiling generated C files.
496-
497-
498451Code objects
499452============
500453
@@ -613,12 +566,30 @@ Important files
613566
614567 * :cpy-file: `Lib/opcode.py `: Master list of bytecode; if this file is
615568 modified you must modify several other files accordingly
616- (see "`Introducing New Bytecode `_")
617569
618570 * :cpy-file: `Lib/importlib/_bootstrap_external.py `: Home of the magic number
619571 (named ``MAGIC_NUMBER ``) for bytecode versioning.
620572
621573
574+ Objects
575+ =======
576+
577+ * :cpy-file: `Objects/locations.md `: Describes the location table
578+ * :cpy-file: `Objects/frame_layout.md `: Describes the frame stack
579+ * :cpy-file: `Objects/object_layout.md `: Descibes object layout for 3.11 and later
580+ * :cpy-file: `Objects/exception_handling_notes.txt `: Exception handling notes
581+
582+
583+ Specializing Adaptive Interpreter
584+ =================================
585+
586+ Adding a specializing, adaptive interpreter to CPython will bring significant
587+ performance improvements. These documents provide more information:
588+
589+ * :pep: `659 `: Specializing Adaptive Interpreter
590+ * :cpy-file: `Python/adaptive.md `: Adding or extending a family of adaptive instructions
591+
592+
622593References
623594==========
624595
0 commit comments