diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 32cbe17ba5d..b266904f203 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -141,8 +141,10 @@ A new structure is added to PyThreadState to support remote debugging: This structure is appended to ``PyThreadState``, adding only a few fields that are **never accessed during normal execution**. The ``debugger_pending_call`` field -indicates when a debugger has requested execution, while ``debugger_script`` -provides Python code to be executed when the interpreter reaches a safe point. +indicates when a debugger has requested execution, while ``debugger_script_path`` +provides a filesystem path to a Python source file (.py) that will be executed when +the interpreter reaches a safe point. The path must point to a Python source file, +not compiled Python code (.pyc) or any other format. The value for ``MAX_SCRIPT_PATH_SIZE`` will be a trade-off between binary size and how big debugging scripts' paths can be. To limit the memory overhead per @@ -177,7 +179,7 @@ debugger support: These offsets allow debuggers to locate critical debugging control structures in the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support`` offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call`` -and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport`` +and ``debugger_script_path`` offsets are relative to each ``_PyRemoteDebuggerSupport`` structure, allowing the new structure and its fields to be found regardless of where they are in memory. ``debugger_script_path_size`` informs the attaching tool of the size of the buffer. @@ -200,13 +202,19 @@ When a debugger wants to attach to a Python process, it follows these steps: 5. Write control information: - - Write a filename containing Python code to be executed into the - ``debugger_script`` field in ``_PyRemoteDebuggerSupport``. + - Most debuggers will pause the process before writing to its memory. This is + standard practice for tools like GDB, which use SIGSTOP or ptrace to pause the process. + This approach prevents races when writing to process memory. Profilers and other tools + that don't wish to stop the process can still use this interface, but they need to + handle possible races. This is a normal consideration for profilers. + + - Write a file path to a Python source file (.py) into the + ``debugger_script_path`` field in ``_PyRemoteDebuggerSupport``. - Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport`` to 1 - Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field -Once the interpreter reaches the next safe point, it will execute the script -provided by the debugger. +Once the interpreter reaches the next safe point, it will execute the Python code +contained in the file specified by the debugger. Interpreter Integration ----------------------- @@ -237,7 +245,7 @@ to be audited or disabled if desired by a system's administrator. if (tstate->eval_breaker) { if (tstate->remote_debugger_support.debugger_pending_call) { tstate->remote_debugger_support.debugger_pending_call = 0; - const char *path = tstate->remote_debugger_support.debugger_script; + const char *path = tstate->remote_debugger_support.debugger_script_path; if (*path) { if (0 != PySys_Audit("debugger_script", "%s", path)) { PyErr_Clear(); @@ -273,16 +281,21 @@ arbitrary Python code within the context of a specified Python process: .. code-block:: python - def remote_exec(pid: int, code: str, timeout: int = 0) -> None: + def remote_exec(pid: int, script: str|bytes|PathLike) -> None: """ - Executes a block of Python code in a given remote Python process. + Executes a file containing Python code in a given remote Python process. + + This function returns immediately, and the code will be executed by the + target process's main thread at the next available opportunity, similarly + to how signals are handled. There is no interface to determine when the + code has been executed. The caller is responsible for making sure that + the file still exists whenever the remote process tries to read it and that + it hasn't been overwritten. Args: pid (int): The process ID of the target Python process. - code (str): A string containing the Python code to be executed. - timeout (int): An optional timeout for waiting for the remote - process to execute the code. If the timeout is exceeded a - ``TimeoutError`` will be raised. + script (str|bytes|PathLike): The path to a file containing + the Python code to be executed. """ An example usage of the API would look like: @@ -290,11 +303,13 @@ An example usage of the API would look like: .. code-block:: python import sys + import uuid # Execute a print statement in a remote Python process with PID 12345 + script = f"/tmp/{uuid.uuid4()}.py" + with open(script, "w") as f: + f.write("print('Hello from remote execution!')") try: - sys.remote_exec(12345, "print('Hello from remote execution!')", timeout=3) - except TimeoutError: - print(f"The remote process took too long to execute the code") + sys.remote_exec(12345, script) except Exception as e: print(f"Failed to execute code: {e}") @@ -322,6 +337,36 @@ feature. This way, tools can offer a useful error message explaining why they won't work, instead of believing that they have attached and then never having their script run. +Multi-threading Considerations +------------------------------ + +The overall execution pattern resembles how Python handles signals internally. +The interpreter guarantees that injected code only runs at safe points, never +interrupting atomic operations within the interpreter itself. This approach +ensures that debugging operations cannot corrupt the interpreter state while +still providing timely execution in most real-world scenarios. + +However, debugging code injected through this interface can execute in any +thread. This behavior is different than how Python handles signals, since +signal handlers can only run in the main thread. If a debugger wants to inject +code into every running thread, it must inject it into every ``PyThreadState``. +If a debugger wants to run code in the first available thread, it needs to +inject it into every ``PyThreadState``, and that injected code must check +whether it has already been run by another thread (likely by setting some flag +in the globals of some module). + +Note that the Global Interpreter Lock (GIL) continues to govern execution as +normal when the injected code runs. This means if a target thread is currently +executing a C extension that holds the GIL continuously, the injected code +won't be able to run until that operation completes and the GIL becomes +available. However, the interface introduces no additional GIL contention +beyond what the injected code itself requires. Importantly, the interface +remains fully compatible with Python's free-threaded mode. + +It may be useful for a debugger that injected some code to be run to follow +that up by sending some pre-registered signal to the process, which can +interrupt any blocking I/O or sleep states waiting for external resources, and +allow a safe opportunity to run the injected code. Backwards Compatibility ======================= @@ -454,8 +499,8 @@ Rejected Ideas Writing Python code into the buffer ----------------------------------- -We have chosen to have debuggers write the code to be executed into a file -whose path is written into a buffer in the remote process. This has been deemed +We have chosen to have debuggers write the path to a file containing Python code +into a buffer in the remote process. This has been deemed more secure than writing the Python code to be executed itself into a buffer in the remote process, because it means that an attacker who has gained arbitrary writes in a process but not arbitrary code execution or file system