From 21295beca9b3e6986527753fb845d99bcd8b1af8 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Mon, 29 Dec 2025 16:18:11 -0800 Subject: [PATCH 1/4] gh-135898: Add section to free-threading howto about memory usage. --- Doc/howto/free-threading-python.rst | 105 ++++++++++++++++++++++++++++ 1 file changed, 105 insertions(+) diff --git a/Doc/howto/free-threading-python.rst b/Doc/howto/free-threading-python.rst index 380c2be04957d5..3af6275849172e 100644 --- a/Doc/howto/free-threading-python.rst +++ b/Doc/howto/free-threading-python.rst @@ -165,3 +165,108 @@ to false. If the flag is true then the :class:`warnings.catch_warnings` context manager uses a context variable for warning filters. If the flag is false then :class:`~warnings.catch_warnings` modifies the global filters list, which is not thread-safe. See the :mod:`warnings` module for more details. + + +Increased memory usage +---------------------- + +The free-threaded build will typically use more memory compared to the default +build. There are multiple reasons for this, mostly due to design decisions. + + +All interned strings are immortal +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For modern Python versions (since version 2.3), interning a string (e.g. with +:func:`sys.intern`) does not cause it to become immortal. Instead, if the last +reference to that string disappears, it will be removed from the interned +string table. This is not the case for the free-threaded build and any interned +string will become immortal, surviving until interpreter shutdown. + + +Non-GC objects have a larger object header +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The free-threaded build uses a different :c:type:`PyObject` structure. Instead +of having the GC related information allocated before the :c:type:`PyObject` +structure, like in the default build, the GC related info is part of the normal +object header. For example, on the AMD64 platform, ``None`` uses 32 bytes on +the free-threaded build vs 16 bytes for the default build. GC objects (such as +dicts and lists) are the same size for both builds since the free-threaded +build does not use additional space for the GC info. + + +QSBR can delay freeing of memory +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In order to safely implement lock-free data structures, a safe memory +reclamation (SMR) scheme is used, known as quiescent state-based reclamation +(QSBR). This means that the memory backing data structures allowing lock-free +access will use QSBR, which defers the free operation, rather than immediately +freeing the memory. Two examples of these data structures are the list object +and the dictionary keys object. See ``InternalDocs/qsbr.md`` in the CPython +source tree for more details on how QSBR is implemented. Running +:func:`gc.collect` should cause all memory being held by QSBR to be actually +freed. Note that even when QSBR frees the memory, mimalloc may not immediately +return that memory to the OS and so the resident set size (RSS) of the process +might not decrease. + + +mimalloc allocator vs pymalloc +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The default build will normally use the "pymalloc" memory allocator for small +allocations (512 bytes or smaller). The free-threaded build does not use +pymalloc and allocates all Python objects using the "mimalloc" allocator. The +pymalloc allocator has the following properties that help keep memory usage +low: per-allocated-block overhead is small, effectively prevents memory +fragmentation, and quickly returns free memory to the operating system. The +mimalloc allocator does quite well in these respects as well but can have some +more overhead. + +In the free-threaded build, mimalloc manages memory in a number of separate +heaps (currently five). For example, all GC supporting objects are allocated +from their own heap. Using separate heaps means that free memory in one heap +cannot be used for an allocation that uses another heap. Also, some heaps are +configured to use QSBR (quiescent-state based reclamation) when freeing the +memory that backs up the heap (known as "pages" in mimalloc terminology). The +details of QSBR are their own topic but the short summary is that it creates a +delay between the object being freed and the memory being released, either for +new allocations or back to the OS. + +The mimalloc allocator also defers returning memory back to the OS. You can +reduce that delay by setting the environment variable +:envvar:`!MIMALLOC_PURGE_DELAY` to ``0``. Note that this will likely reduce +the performance of the allocator. + + +Free-threaded reference counting can cause objects to live longer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In the default build, when an object's reference count reaches zero, it is +normally deallocated. The free-threaded build uses "biased reference +counting", with a fast-path for objects "owned" by the current thread and a +slow path for other objects. See :pep:`703` for additional details. Any time +an object's reference count ends up in a "queued" state, deallocation can be +deferred. The queued state is cleared from the "eval breaker" section of the +bytecode evaluator. + +The free-threaded build also allows a different mode of reference counting, +known as "deferred reference counting". This mode is enabled by setting a flag +on a per-object basis. Deferred reference counting is enabled for the +following types: + +* module objects +* module top-level functions +* class methods defined in the class scope +* descriptor objects +* thread-local objects, created by :class:`_thread._local` + +When deferred reference counting is enabled, references from Python function +stacks are not added to the reference count. This scheme reduces the overhead +of reference counting, especially for objects used from multiple threads. +Because the stack references are not counted, objects with deferred reference +counting are not immediately freed when their internal reference count goes to +zero. Instead, they are examined by the next GC run and, if no stack +references to them are found, they are freed. This means these objects are +freed by the GC and not when their reference count goes to zero, as is typical. From ffc08a2df4a66490cfc502061c3dc70b6da9fd75 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Mon, 29 Dec 2025 16:25:26 -0800 Subject: [PATCH 2/4] Replace _thread._local with thread.local. --- Doc/howto/free-threading-python.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/howto/free-threading-python.rst b/Doc/howto/free-threading-python.rst index 3af6275849172e..3a01c95224b76e 100644 --- a/Doc/howto/free-threading-python.rst +++ b/Doc/howto/free-threading-python.rst @@ -260,7 +260,7 @@ following types: * module top-level functions * class methods defined in the class scope * descriptor objects -* thread-local objects, created by :class:`_thread._local` +* thread-local objects, created by :class:`thread.local` When deferred reference counting is enabled, references from Python function stacks are not added to the reference count. This scheme reduces the overhead From c2cc5d839ec430cf53c298dfa475a780c3058da0 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Mon, 29 Dec 2025 16:33:44 -0800 Subject: [PATCH 3/4] Improve wording of some text. --- Doc/howto/free-threading-python.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/Doc/howto/free-threading-python.rst b/Doc/howto/free-threading-python.rst index 3a01c95224b76e..b8670df169bb4b 100644 --- a/Doc/howto/free-threading-python.rst +++ b/Doc/howto/free-threading-python.rst @@ -207,9 +207,9 @@ freeing the memory. Two examples of these data structures are the list object and the dictionary keys object. See ``InternalDocs/qsbr.md`` in the CPython source tree for more details on how QSBR is implemented. Running :func:`gc.collect` should cause all memory being held by QSBR to be actually -freed. Note that even when QSBR frees the memory, mimalloc may not immediately -return that memory to the OS and so the resident set size (RSS) of the process -might not decrease. +freed. Note that even when QSBR frees the memory, the underlying memory +allocator may not immediately return that memory to the OS and so the resident +set size (RSS) of the process might not decrease. mimalloc allocator vs pymalloc @@ -219,8 +219,8 @@ The default build will normally use the "pymalloc" memory allocator for small allocations (512 bytes or smaller). The free-threaded build does not use pymalloc and allocates all Python objects using the "mimalloc" allocator. The pymalloc allocator has the following properties that help keep memory usage -low: per-allocated-block overhead is small, effectively prevents memory -fragmentation, and quickly returns free memory to the operating system. The +low: small per-allocated-block overhead, effective memory fragmentation +prevention, and quick return of free memory to the operating system. The mimalloc allocator does quite well in these respects as well but can have some more overhead. @@ -230,12 +230,12 @@ from their own heap. Using separate heaps means that free memory in one heap cannot be used for an allocation that uses another heap. Also, some heaps are configured to use QSBR (quiescent-state based reclamation) when freeing the memory that backs up the heap (known as "pages" in mimalloc terminology). The -details of QSBR are their own topic but the short summary is that it creates a -delay between the object being freed and the memory being released, either for -new allocations or back to the OS. +use of QSBR creates a delay between all memory blocks for a page being freed +and the memory page being released, either for new allocations or back to the +OS. -The mimalloc allocator also defers returning memory back to the OS. You can -reduce that delay by setting the environment variable +The mimalloc allocator also defers returning freed memory back to the OS. You +can reduce that delay by setting the environment variable :envvar:`!MIMALLOC_PURGE_DELAY` to ``0``. Note that this will likely reduce the performance of the allocator. From 97e4a9d46c9b08ccc5057c2006d7497e810d9323 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Mon, 29 Dec 2025 16:41:49 -0800 Subject: [PATCH 4/4] Argh, fix class ref. --- Doc/howto/free-threading-python.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/howto/free-threading-python.rst b/Doc/howto/free-threading-python.rst index b8670df169bb4b..f6c400f3de7ff8 100644 --- a/Doc/howto/free-threading-python.rst +++ b/Doc/howto/free-threading-python.rst @@ -260,7 +260,7 @@ following types: * module top-level functions * class methods defined in the class scope * descriptor objects -* thread-local objects, created by :class:`thread.local` +* thread-local objects, created by :class:`threading.local` When deferred reference counting is enabled, references from Python function stacks are not added to the reference count. This scheme reduces the overhead