From b4e3c9932fdac943c80dc8c50ad8060119ba967e Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Wed, 26 Feb 2025 22:12:57 +0000 Subject: [PATCH 01/10] PEP 768: Add some clarifications and minor edits --- peps/pep-0768.rst | 41 ++++++++++++++++++++++++----------------- 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 32cbe17ba5d..5c5d2bc2c32 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -141,8 +141,10 @@ A new structure is added to PyThreadState to support remote debugging: This structure is appended to ``PyThreadState``, adding only a few fields that are **never accessed during normal execution**. The ``debugger_pending_call`` field -indicates when a debugger has requested execution, while ``debugger_script`` -provides Python code to be executed when the interpreter reaches a safe point. +indicates when a debugger has requested execution, while ``debugger_script_path`` +provides a filesystem path to a Python source file (.py) that will be executed when +the interpreter reaches a safe point. The path must point to a Python source file, +not compiled Python code (.pyc) or any other format. The value for ``MAX_SCRIPT_PATH_SIZE`` will be a trade-off between binary size and how big debugging scripts' paths can be. To limit the memory overhead per @@ -177,7 +179,7 @@ debugger support: These offsets allow debuggers to locate critical debugging control structures in the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support`` offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call`` -and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport`` +and ``debugger_script_path`` offsets are relative to each ``_PyRemoteDebuggerSupport`` structure, allowing the new structure and its fields to be found regardless of where they are in memory. ``debugger_script_path_size`` informs the attaching tool of the size of the buffer. @@ -200,13 +202,19 @@ When a debugger wants to attach to a Python process, it follows these steps: 5. Write control information: - - Write a filename containing Python code to be executed into the - ``debugger_script`` field in ``_PyRemoteDebuggerSupport``. + - Most debuggers will pause the process before writing to its memory. This is + standard practice for tools like GDB, which use SIGSTOP or ptrace to pause the process. + This approach prevents races when writing to process memory. Profilers and other tools + that don't wish to stop the process can still use this interface, but they need to + handle possible races, which is a normal consideration for profilers in general. + + - Write a file path to a Python source file (.py) into the + ``debugger_script_path`` field in ``_PyRemoteDebuggerSupport``. - Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport`` to 1 - Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field -Once the interpreter reaches the next safe point, it will execute the script -provided by the debugger. +Once the interpreter reaches the next safe point, it will execute the Python code +contained in the file specified by the debugger. Interpreter Integration ----------------------- @@ -237,7 +245,7 @@ to be audited or disabled if desired by a system's administrator. if (tstate->eval_breaker) { if (tstate->remote_debugger_support.debugger_pending_call) { tstate->remote_debugger_support.debugger_pending_call = 0; - const char *path = tstate->remote_debugger_support.debugger_script; + const char *path = tstate->remote_debugger_support.debugger_script_path; if (*path) { if (0 != PySys_Audit("debugger_script", "%s", path)) { PyErr_Clear(); @@ -273,16 +281,17 @@ arbitrary Python code within the context of a specified Python process: .. code-block:: python - def remote_exec(pid: int, code: str, timeout: int = 0) -> None: + def remote_exec(pid: int, code: str) -> None: """ Executes a block of Python code in a given remote Python process. + This function returns immediately, and the code will be executed at the next + available opportunity in the target process, similar to how signals are handled. + There is no way to determine when or if the code has been executed. + Args: pid (int): The process ID of the target Python process. code (str): A string containing the Python code to be executed. - timeout (int): An optional timeout for waiting for the remote - process to execute the code. If the timeout is exceeded a - ``TimeoutError`` will be raised. """ An example usage of the API would look like: @@ -292,9 +301,7 @@ An example usage of the API would look like: import sys # Execute a print statement in a remote Python process with PID 12345 try: - sys.remote_exec(12345, "print('Hello from remote execution!')", timeout=3) - except TimeoutError: - print(f"The remote process took too long to execute the code") + sys.remote_exec(12345, "print('Hello from remote execution!')") except Exception as e: print(f"Failed to execute code: {e}") @@ -454,8 +461,8 @@ Rejected Ideas Writing Python code into the buffer ----------------------------------- -We have chosen to have debuggers write the code to be executed into a file -whose path is written into a buffer in the remote process. This has been deemed +We have chosen to have debuggers write the path to a file containing Python code +into a buffer in the remote process. This has been deemed more secure than writing the Python code to be executed itself into a buffer in the remote process, because it means that an attacker who has gained arbitrary writes in a process but not arbitrary code execution or file system From 1dfc8b4e7a3ad90944e44fcd75bfddac98a7a077 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Wed, 26 Feb 2025 22:33:15 +0000 Subject: [PATCH 02/10] add paragraph about multiple threads --- peps/pep-0768.rst | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 5c5d2bc2c32..f03de901fca 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -329,6 +329,37 @@ feature. This way, tools can offer a useful error message explaining why they won't work, instead of believing that they have attached and then never having their script run. +Multi-threading Considerations +------------------------------ + +Debugging code injected through this interface executes opportunistically in +whichever thread first encounters a safe evaluation point after the request is +made. This behavior mirrors how Python handles signals, providing a reliable +execution model without adding overhead. For developers needing to target +specific threads, the debug script can be installed only on the desired thread +structure or in all of them if needed. + +The Global Interpreter Lock (GIL) continues to govern execution as normal when +debug code runs. This means if a target thread is currently executing a C +extension that holds the GIL without releasing it, the debug code will wait +until that operation completes and the GIL becomes available. However, the +interface introduces no additional GIL contention beyond what the debugging +code itself requires. Importantly, the interface remains fully compatible with +Python's free-threaded mode, where the GIL is not held, allowing debugger code +to execute in any available thread. + +In situations where all threads in the target process are blocked—waiting on I/O +operations, sleep states, or external resources—the debugging code might not +execute immediately. In these cases, debuggers can send a pre-registered signal +to the process, which will interrupt the sleep state and force thread scheduling, +creating an opportunity for the debug code to run or leverage any other mechanism +that can force the target process to resume execution. + +The execution pattern closely resembles how Python handles signals internally. +The interpreter guarantees that debug code only runs at safe points, never +interrupting atomic operations within the interpreter itself. This approach +ensures that debugging operations cannot corrupt the interpreter state while +still providing timely execution in most real-world scenarios. Backwards Compatibility ======================= From 02ee13900c93860df780febe77943dfb8ffa303a Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Wed, 26 Feb 2025 22:51:48 +0000 Subject: [PATCH 03/10] fixup! add paragraph about multiple threads --- peps/pep-0768.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index f03de901fca..835085845de 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -110,7 +110,7 @@ expressions, and step through code dynamically. This approach would align Python's debugging capabilities with those of other major programming languages and debugging tools that support this mode. -Specification +/Specification ============= @@ -334,10 +334,10 @@ Multi-threading Considerations Debugging code injected through this interface executes opportunistically in whichever thread first encounters a safe evaluation point after the request is -made. This behavior mirrors how Python handles signals, providing a reliable -execution model without adding overhead. For developers needing to target +made. This behavior is different on how Python handles signals in the sense that +signal handlers can only run in the main thread. For developers needing to target specific threads, the debug script can be installed only on the desired thread -structure or in all of them if needed. +structure. The Global Interpreter Lock (GIL) continues to govern execution as normal when debug code runs. This means if a target thread is currently executing a C From 5afff45d652af9a9ef10dce88191d424e2448abd Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Wed, 26 Feb 2025 22:53:20 +0000 Subject: [PATCH 04/10] fixup! fixup! add paragraph about multiple threads --- peps/pep-0768.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 835085845de..5535910233b 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -332,12 +332,12 @@ their script run. Multi-threading Considerations ------------------------------ -Debugging code injected through this interface executes opportunistically in -whichever thread first encounters a safe evaluation point after the request is -made. This behavior is different on how Python handles signals in the sense that -signal handlers can only run in the main thread. For developers needing to target -specific threads, the debug script can be installed only on the desired thread -structure. +Debugging code injected through this interface executes opportunistically in the +thread where the debugging information has been written first encounters a safe +evaluation point after the request is made. This behavior is different on how +Python handles signals in the sense that signal handlers can only run in the +main thread. For developers needing to target any thread, the debug script can +be installed on all threads. The Global Interpreter Lock (GIL) continues to govern execution as normal when debug code runs. This means if a target thread is currently executing a C From bcae670a57be04e94f0ae98e1cf73ae691b57f23 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Wed, 26 Feb 2025 22:57:01 +0000 Subject: [PATCH 05/10] fixup! fixup! fixup! add paragraph about multiple threads --- peps/pep-0768.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 5535910233b..10c91f75093 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -111,7 +111,7 @@ Python's debugging capabilities with those of other major programming languages and debugging tools that support this mode. /Specification -============= +============== This proposal introduces a safe debugging mechanism that allows external From e38edd65cdf1c2118d8916068f6b099fff8e9d67 Mon Sep 17 00:00:00 2001 From: Matt Wozniski Date: Wed, 26 Feb 2025 18:14:13 -0500 Subject: [PATCH 06/10] fixup! fixup! fixup! fixup! add paragraph about multiple threads --- peps/pep-0768.rst | 55 +++++++++++++++++++++++------------------------ 1 file changed, 27 insertions(+), 28 deletions(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 10c91f75093..a6c3037a2df 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -110,8 +110,8 @@ expressions, and step through code dynamically. This approach would align Python's debugging capabilities with those of other major programming languages and debugging tools that support this mode. -/Specification -============== +Specification +============= This proposal introduces a safe debugging mechanism that allows external @@ -206,7 +206,7 @@ When a debugger wants to attach to a Python process, it follows these steps: standard practice for tools like GDB, which use SIGSTOP or ptrace to pause the process. This approach prevents races when writing to process memory. Profilers and other tools that don't wish to stop the process can still use this interface, but they need to - handle possible races, which is a normal consideration for profilers in general. + handle possible races. This is a normal consideration for profilers. - Write a file path to a Python source file (.py) into the ``debugger_script_path`` field in ``_PyRemoteDebuggerSupport``. @@ -332,35 +332,34 @@ their script run. Multi-threading Considerations ------------------------------ -Debugging code injected through this interface executes opportunistically in the -thread where the debugging information has been written first encounters a safe -evaluation point after the request is made. This behavior is different on how -Python handles signals in the sense that signal handlers can only run in the -main thread. For developers needing to target any thread, the debug script can -be installed on all threads. - -The Global Interpreter Lock (GIL) continues to govern execution as normal when -debug code runs. This means if a target thread is currently executing a C -extension that holds the GIL without releasing it, the debug code will wait -until that operation completes and the GIL becomes available. However, the -interface introduces no additional GIL contention beyond what the debugging -code itself requires. Importantly, the interface remains fully compatible with -Python's free-threaded mode, where the GIL is not held, allowing debugger code -to execute in any available thread. - -In situations where all threads in the target process are blocked—waiting on I/O -operations, sleep states, or external resources—the debugging code might not -execute immediately. In these cases, debuggers can send a pre-registered signal -to the process, which will interrupt the sleep state and force thread scheduling, -creating an opportunity for the debug code to run or leverage any other mechanism -that can force the target process to resume execution. - -The execution pattern closely resembles how Python handles signals internally. -The interpreter guarantees that debug code only runs at safe points, never +The overall execution pattern resembles how Python handles signals internally. +The interpreter guarantees that injected code only runs at safe points, never interrupting atomic operations within the interpreter itself. This approach ensures that debugging operations cannot corrupt the interpreter state while still providing timely execution in most real-world scenarios. +However, debugging code injected through this interface can execute in any +thread. This behavior is different than how Python handles signals, since +signal handlers can only run in the main thread. If a debugger wants to inject +code into every running thread, it must inject it into every ``PyThreadState``. +If a debugger wants to run code in the first available thread, it needs to +inject it into every ``PyThreadState``, and that injected code must check +whether it has already been run by another thread (likely by setting some flag +in the globals of some module). + +Note that the Global Interpreter Lock (GIL) continues to govern execution as +normal when the injected code runs. This means if a target thread is currently +executing a C extension that holds the GIL continuously, the injected code +won't be able to run until that operation completes and the GIL becomes +available. However, the interface introduces no additional GIL contention +beyond what the injected code itself requires. Importantly, the interface +remains fully compatible with Python's free-threaded mode. + +It may be useful for a debugger that injected some code to be run to follow +that up by sending some pre-registered signal to the process, which can +iterrupt any blocking I/O or sleep states waiting for external resources, and +allow a safe opportunity to run the injected code. + Backwards Compatibility ======================= From d53b85a1362ed94add0397e3bd2611266369b21c Mon Sep 17 00:00:00 2001 From: Matt Wozniski Date: Wed, 26 Feb 2025 20:01:34 -0500 Subject: [PATCH 07/10] Update `remote_exec` to use a path This makes it clearer that it is the end user's responsibility to arrange for the file to still be valid even if the process calling `remote_exec` dies before the remote process tries to read the file. Signed-off-by: Matt Wozniski --- peps/pep-0768.rst | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index a6c3037a2df..07b908e04ae 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -281,17 +281,21 @@ arbitrary Python code within the context of a specified Python process: .. code-block:: python - def remote_exec(pid: int, code: str) -> None: + def remote_exec(pid: int, script: str|bytes|PathLike) -> None: """ - Executes a block of Python code in a given remote Python process. + Executes a file containing Python code in a given remote Python process. - This function returns immediately, and the code will be executed at the next - available opportunity in the target process, similar to how signals are handled. - There is no way to determine when or if the code has been executed. + This function returns immediately, and the code will be executed by the + target process's main thread at the next available opportunity, similarly + to how signals are handled. There is no interface to determine when the + code has been executed. The caller is responsible for making sure that + the file still exists whenever the remote process tries to read it, or at + leas that it hasn't been overwritten by a malicious actor. Args: pid (int): The process ID of the target Python process. - code (str): A string containing the Python code to be executed. + script (str|bytes|PathLike): The path to a file containing + the Python code to be executed. """ An example usage of the API would look like: @@ -299,9 +303,13 @@ An example usage of the API would look like: .. code-block:: python import sys + import uuid # Execute a print statement in a remote Python process with PID 12345 + script = f"/tmp/{uuid.uuid4()}.py" + with open(script, "w") as f: + f.write("print('Hello from remote execution!')") try: - sys.remote_exec(12345, "print('Hello from remote execution!')") + sys.remote_exec(12345, script) except Exception as e: print(f"Failed to execute code: {e}") From 7bb9e9753bf1deaa5c253e50fd42cf4e064a17c5 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Thu, 27 Feb 2025 12:39:38 +0000 Subject: [PATCH 08/10] Update pep-0768.rst Co-authored-by: ivonastojanovic <80911834+ivonastojanovic@users.noreply.github.com> --- peps/pep-0768.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 07b908e04ae..1c2dd89e45a 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -290,7 +290,7 @@ arbitrary Python code within the context of a specified Python process: to how signals are handled. There is no interface to determine when the code has been executed. The caller is responsible for making sure that the file still exists whenever the remote process tries to read it, or at - leas that it hasn't been overwritten by a malicious actor. + least that it hasn't been overwritten by a malicious actor. Args: pid (int): The process ID of the target Python process. From 54681823f7045be895805cc97953ce81db87f9d1 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Thu, 27 Feb 2025 21:56:24 +0000 Subject: [PATCH 09/10] Update peps/pep-0768.rst --- peps/pep-0768.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 1c2dd89e45a..6027f31f69b 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -289,8 +289,8 @@ arbitrary Python code within the context of a specified Python process: target process's main thread at the next available opportunity, similarly to how signals are handled. There is no interface to determine when the code has been executed. The caller is responsible for making sure that - the file still exists whenever the remote process tries to read it, or at - least that it hasn't been overwritten by a malicious actor. + the file still exists whenever the remote process tries to read it and that + it hasn't been overwritten. Args: pid (int): The process ID of the target Python process. From 95997458ff83c5e1620c73e06e10a00d8f198704 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Tue, 4 Mar 2025 00:56:14 +0000 Subject: [PATCH 10/10] Update peps/pep-0768.rst --- peps/pep-0768.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst index 6027f31f69b..b266904f203 100644 --- a/peps/pep-0768.rst +++ b/peps/pep-0768.rst @@ -365,7 +365,7 @@ remains fully compatible with Python's free-threaded mode. It may be useful for a debugger that injected some code to be run to follow that up by sending some pre-registered signal to the process, which can -iterrupt any blocking I/O or sleep states waiting for external resources, and +interrupt any blocking I/O or sleep states waiting for external resources, and allow a safe opportunity to run the injected code. Backwards Compatibility