Skip to content
83 changes: 64 additions & 19 deletions peps/pep-0768.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,10 @@ A new structure is added to PyThreadState to support remote debugging:

This structure is appended to ``PyThreadState``, adding only a few fields that
are **never accessed during normal execution**. The ``debugger_pending_call`` field
indicates when a debugger has requested execution, while ``debugger_script``
provides Python code to be executed when the interpreter reaches a safe point.
indicates when a debugger has requested execution, while ``debugger_script_path``
provides a filesystem path to a Python source file (.py) that will be executed when
the interpreter reaches a safe point. The path must point to a Python source file,
not compiled Python code (.pyc) or any other format.

The value for ``MAX_SCRIPT_PATH_SIZE`` will be a trade-off between binary size
and how big debugging scripts' paths can be. To limit the memory overhead per
Expand Down Expand Up @@ -177,7 +179,7 @@ debugger support:
These offsets allow debuggers to locate critical debugging control structures in
the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support``
offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call``
and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport``
and ``debugger_script_path`` offsets are relative to each ``_PyRemoteDebuggerSupport``
structure, allowing the new structure and its fields to be found regardless of
where they are in memory. ``debugger_script_path_size`` informs the attaching
tool of the size of the buffer.
Expand All @@ -200,13 +202,19 @@ When a debugger wants to attach to a Python process, it follows these steps:

5. Write control information:

- Write a filename containing Python code to be executed into the
``debugger_script`` field in ``_PyRemoteDebuggerSupport``.
- Most debuggers will pause the process before writing to its memory. This is
standard practice for tools like GDB, which use SIGSTOP or ptrace to pause the process.
This approach prevents races when writing to process memory. Profilers and other tools
that don't wish to stop the process can still use this interface, but they need to
handle possible races. This is a normal consideration for profilers.

- Write a file path to a Python source file (.py) into the
``debugger_script_path`` field in ``_PyRemoteDebuggerSupport``.
- Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport`` to 1
- Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field

Once the interpreter reaches the next safe point, it will execute the script
provided by the debugger.
Once the interpreter reaches the next safe point, it will execute the Python code
contained in the file specified by the debugger.

Interpreter Integration
-----------------------
Expand Down Expand Up @@ -237,7 +245,7 @@ to be audited or disabled if desired by a system's administrator.
if (tstate->eval_breaker) {
if (tstate->remote_debugger_support.debugger_pending_call) {
tstate->remote_debugger_support.debugger_pending_call = 0;
const char *path = tstate->remote_debugger_support.debugger_script;
const char *path = tstate->remote_debugger_support.debugger_script_path;
if (*path) {
if (0 != PySys_Audit("debugger_script", "%s", path)) {
PyErr_Clear();
Expand Down Expand Up @@ -273,28 +281,35 @@ arbitrary Python code within the context of a specified Python process:

.. code-block:: python

def remote_exec(pid: int, code: str, timeout: int = 0) -> None:
def remote_exec(pid: int, script: str|bytes|PathLike) -> None:
"""
Executes a block of Python code in a given remote Python process.
Executes a file containing Python code in a given remote Python process.

This function returns immediately, and the code will be executed by the
target process's main thread at the next available opportunity, similarly
to how signals are handled. There is no interface to determine when the
code has been executed. The caller is responsible for making sure that
the file still exists whenever the remote process tries to read it and that
it hasn't been overwritten.

Args:
pid (int): The process ID of the target Python process.
code (str): A string containing the Python code to be executed.
timeout (int): An optional timeout for waiting for the remote
process to execute the code. If the timeout is exceeded a
``TimeoutError`` will be raised.
script (str|bytes|PathLike): The path to a file containing
the Python code to be executed.
"""

An example usage of the API would look like:

.. code-block:: python

import sys
import uuid
# Execute a print statement in a remote Python process with PID 12345
script = f"/tmp/{uuid.uuid4()}.py"
with open(script, "w") as f:
f.write("print('Hello from remote execution!')")
try:
sys.remote_exec(12345, "print('Hello from remote execution!')", timeout=3)
except TimeoutError:
print(f"The remote process took too long to execute the code")
sys.remote_exec(12345, script)
except Exception as e:
print(f"Failed to execute code: {e}")

Expand Down Expand Up @@ -322,6 +337,36 @@ feature. This way, tools can offer a useful error message explaining why they
won't work, instead of believing that they have attached and then never having
their script run.

Multi-threading Considerations
------------------------------

The overall execution pattern resembles how Python handles signals internally.
The interpreter guarantees that injected code only runs at safe points, never
interrupting atomic operations within the interpreter itself. This approach
ensures that debugging operations cannot corrupt the interpreter state while
still providing timely execution in most real-world scenarios.

However, debugging code injected through this interface can execute in any
thread. This behavior is different than how Python handles signals, since
signal handlers can only run in the main thread. If a debugger wants to inject
code into every running thread, it must inject it into every ``PyThreadState``.
If a debugger wants to run code in the first available thread, it needs to
inject it into every ``PyThreadState``, and that injected code must check
whether it has already been run by another thread (likely by setting some flag
in the globals of some module).

Note that the Global Interpreter Lock (GIL) continues to govern execution as
normal when the injected code runs. This means if a target thread is currently
executing a C extension that holds the GIL continuously, the injected code
won't be able to run until that operation completes and the GIL becomes
available. However, the interface introduces no additional GIL contention
beyond what the injected code itself requires. Importantly, the interface
remains fully compatible with Python's free-threaded mode.

It may be useful for a debugger that injected some code to be run to follow
that up by sending some pre-registered signal to the process, which can
interrupt any blocking I/O or sleep states waiting for external resources, and
allow a safe opportunity to run the injected code.

Backwards Compatibility
=======================
Expand Down Expand Up @@ -454,8 +499,8 @@ Rejected Ideas
Writing Python code into the buffer
-----------------------------------

We have chosen to have debuggers write the code to be executed into a file
whose path is written into a buffer in the remote process. This has been deemed
We have chosen to have debuggers write the path to a file containing Python code
into a buffer in the remote process. This has been deemed
more secure than writing the Python code to be executed itself into a buffer in
the remote process, because it means that an attacker who has gained arbitrary
writes in a process but not arbitrary code execution or file system
Expand Down