Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions Doc/howto/free-threading-python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,3 +165,108 @@ to false. If the flag is true then the :class:`warnings.catch_warnings`
context manager uses a context variable for warning filters. If the flag is
false then :class:`~warnings.catch_warnings` modifies the global filters list,
which is not thread-safe. See the :mod:`warnings` module for more details.


Increased memory usage
----------------------

The free-threaded build will typically use more memory compared to the default
build. There are multiple reasons for this, mostly due to design decisions.


All interned strings are immortal
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For modern Python versions (since version 2.3), interning a string (e.g. with
:func:`sys.intern`) does not cause it to become immortal. Instead, if the last
reference to that string disappears, it will be removed from the interned
string table. This is not the case for the free-threaded build and any interned
string will become immortal, surviving until interpreter shutdown.


Non-GC objects have a larger object header
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The free-threaded build uses a different :c:type:`PyObject` structure. Instead
of having the GC related information allocated before the :c:type:`PyObject`
structure, like in the default build, the GC related info is part of the normal
object header. For example, on the AMD64 platform, ``None`` uses 32 bytes on
the free-threaded build vs 16 bytes for the default build. GC objects (such as
dicts and lists) are the same size for both builds since the free-threaded
build does not use additional space for the GC info.


QSBR can delay freeing of memory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In order to safely implement lock-free data structures, a safe memory
reclamation (SMR) scheme is used, known as quiescent state-based reclamation
(QSBR). This means that the memory backing data structures allowing lock-free
access will use QSBR, which defers the free operation, rather than immediately
freeing the memory. Two examples of these data structures are the list object
and the dictionary keys object. See ``InternalDocs/qsbr.md`` in the CPython
source tree for more details on how QSBR is implemented. Running
:func:`gc.collect` should cause all memory being held by QSBR to be actually
freed. Note that even when QSBR frees the memory, the underlying memory
allocator may not immediately return that memory to the OS and so the resident
set size (RSS) of the process might not decrease.


mimalloc allocator vs pymalloc
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The default build will normally use the "pymalloc" memory allocator for small
allocations (512 bytes or smaller). The free-threaded build does not use
pymalloc and allocates all Python objects using the "mimalloc" allocator. The
pymalloc allocator has the following properties that help keep memory usage
low: small per-allocated-block overhead, effective memory fragmentation
prevention, and quick return of free memory to the operating system. The
mimalloc allocator does quite well in these respects as well but can have some
more overhead.

In the free-threaded build, mimalloc manages memory in a number of separate
heaps (currently five). For example, all GC supporting objects are allocated
from their own heap. Using separate heaps means that free memory in one heap
cannot be used for an allocation that uses another heap. Also, some heaps are
configured to use QSBR (quiescent-state based reclamation) when freeing the
memory that backs up the heap (known as "pages" in mimalloc terminology). The
use of QSBR creates a delay between all memory blocks for a page being freed
and the memory page being released, either for new allocations or back to the
OS.

The mimalloc allocator also defers returning freed memory back to the OS. You
can reduce that delay by setting the environment variable
:envvar:`!MIMALLOC_PURGE_DELAY` to ``0``. Note that this will likely reduce
the performance of the allocator.


Free-threaded reference counting can cause objects to live longer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In the default build, when an object's reference count reaches zero, it is
normally deallocated. The free-threaded build uses "biased reference
counting", with a fast-path for objects "owned" by the current thread and a
slow path for other objects. See :pep:`703` for additional details. Any time
an object's reference count ends up in a "queued" state, deallocation can be
deferred. The queued state is cleared from the "eval breaker" section of the
bytecode evaluator.

The free-threaded build also allows a different mode of reference counting,
known as "deferred reference counting". This mode is enabled by setting a flag
on a per-object basis. Deferred reference counting is enabled for the
following types:

* module objects
* module top-level functions
* class methods defined in the class scope
* descriptor objects
* thread-local objects, created by :class:`threading.local`

When deferred reference counting is enabled, references from Python function
stacks are not added to the reference count. This scheme reduces the overhead
of reference counting, especially for objects used from multiple threads.
Because the stack references are not counted, objects with deferred reference
counting are not immediately freed when their internal reference count goes to
zero. Instead, they are examined by the next GC run and, if no stack
references to them are found, they are freed. This means these objects are
freed by the GC and not when their reference count goes to zero, as is typical.
Loading