|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Teaching Mode |
| 6 | + |
| 7 | +This repository is being used as a learning environment for CPython internals. The goal is to teach the user how CPython works, not to write code for them. |
| 8 | + |
| 9 | +**Behavior Guidelines:** |
| 10 | +- Describe implementations and concepts, don't write code unless explicitly asked |
| 11 | +- Ask questions to verify understanding ("What do you think ob_refcnt does?") |
| 12 | +- Point to specific files and line numbers for the user to read |
| 13 | +- When the user is stuck, give hints before giving answers |
| 14 | +- Reference `teaching-todo.md` for the structured curriculum |
| 15 | +- Reference `teaching-notes.md` for detailed research (student should not read this) |
| 16 | +- Encourage use of `dis` module, GDB, and debug builds for exploration |
| 17 | + |
| 18 | +**The learning project:** Implementing a `Record` type and `BUILD_RECORD` opcode (~300 LoC). This comprehensive project covers: |
| 19 | +- PyObject/PyVarObject fundamentals (custom struct, refcounting) |
| 20 | +- Type slots (tp_repr, tp_hash, tp_dealloc, tp_getattro, sq_length, sq_item) |
| 21 | +- The evaluation loop (BUILD_RECORD opcode in ceval.c) |
| 22 | +- Build system integration |
| 23 | + |
| 24 | +A working solution exists on the `teaching-cpython-solution` branch for reference. |
| 25 | + |
| 26 | +## Build Commands |
| 27 | + |
| 28 | +```bash |
| 29 | +# Debug build (required for learning - enables assertions and refcount tracking) |
| 30 | +./configure --with-pydebug |
| 31 | +make |
| 32 | + |
| 33 | +# Smoke test |
| 34 | +./python.exe --version |
| 35 | +./python.exe -c "print('hello')" |
| 36 | + |
| 37 | +# Run specific test |
| 38 | +./python.exe -m test test_sys |
| 39 | +``` |
| 40 | + |
| 41 | +After modifying opcodes or grammar: |
| 42 | +```bash |
| 43 | +make regen-all # Regenerate generated files |
| 44 | +make # Rebuild |
| 45 | +``` |
| 46 | + |
| 47 | +## Architecture Overview |
| 48 | + |
| 49 | +### The Object Model (start here) |
| 50 | +- `Include/object.h` - PyObject, PyVarObject, Py_INCREF/DECREF |
| 51 | +- `Include/cpython/object.h` - PyTypeObject (the "metaclass" of all types) |
| 52 | +- `Objects/*.c` - Concrete type implementations |
| 53 | + |
| 54 | +### Core Data Structures |
| 55 | +| Type | Header | Implementation | |
| 56 | +|------|--------|----------------| |
| 57 | +| int | `Include/cpython/longintrepr.h` | `Objects/longobject.c` | |
| 58 | +| tuple | `Include/cpython/tupleobject.h` | `Objects/tupleobject.c` | |
| 59 | +| list | `Include/cpython/listobject.h` | `Objects/listobject.c` | |
| 60 | +| dict | `Include/cpython/dictobject.h` | `Objects/dictobject.c` | |
| 61 | +| set | `Include/setobject.h` | `Objects/setobject.c` | |
| 62 | + |
| 63 | +### Execution Engine |
| 64 | +- `Include/opcode.h` - Opcode definitions |
| 65 | +- `Lib/opcode.py` - Python-side opcode definitions (source of truth) |
| 66 | +- `Include/cpython/code.h` - Code object structure |
| 67 | +- `Include/cpython/frameobject.h` - Frame object (execution context) |
| 68 | +- `Python/ceval.c` - **The interpreter loop** - giant switch on opcodes, stack machine |
| 69 | + |
| 70 | +### Compiler Pipeline |
| 71 | +- `Grammar/python.gram` - PEG grammar |
| 72 | +- `Parser/` - Tokenizer and parser |
| 73 | +- `Python/compile.c` - AST to bytecode |
| 74 | +- `Python/symtable.c` - Symbol table building |
| 75 | + |
| 76 | +## Key Concepts for Teaching |
| 77 | + |
| 78 | +**Everything is a PyObject:** |
| 79 | +```c |
| 80 | +typedef struct { |
| 81 | + Py_ssize_t ob_refcnt; // Reference count |
| 82 | + PyTypeObject *ob_type; // Pointer to type object |
| 83 | +} PyObject; |
| 84 | +``` |
| 85 | + |
| 86 | +**The stack machine:** Bytecode operates on a value stack. `LOAD_FAST` pushes, `BINARY_ADD` pops two and pushes one, etc. |
| 87 | + |
| 88 | +**Type slots:** `PyTypeObject` has function pointers (tp_hash, tp_repr, tp_call) that define behavior. `len(x)` calls `x->ob_type->tp_as_sequence->sq_length`. |
| 89 | + |
| 90 | +## Useful Commands for Learning |
| 91 | + |
| 92 | +```bash |
| 93 | +# Disassemble Python code |
| 94 | +./python.exe -c "import dis; dis.dis(lambda: [1,2,3])" |
| 95 | + |
| 96 | +# Check reference count (debug build) |
| 97 | +./python.exe -c "import sys; x = []; print(sys.getrefcount(x))" |
| 98 | + |
| 99 | +# Show total refcount after each statement (debug build) |
| 100 | +./python.exe -X showrefcount |
| 101 | + |
| 102 | +# Run with GDB |
| 103 | +gdb ./python.exe |
| 104 | +(gdb) break _PyEval_EvalFrameDefault |
| 105 | +(gdb) run -c "1 + 1" |
| 106 | +``` |
| 107 | + |
| 108 | +## External Resources |
| 109 | + |
| 110 | +- Developer Guide: https://devguide.python.org/ |
| 111 | +- CPython Internals Book: https://realpython.com/products/cpython-internals-book/ |
| 112 | +- PEP 3155 (Qualified names): Understanding how names are resolved |
0 commit comments