@@ -165,3 +165,108 @@ to false. If the flag is true then the :class:`warnings.catch_warnings`
165165context manager uses a context variable for warning filters. If the flag is
166166false then :class: `~warnings.catch_warnings ` modifies the global filters list,
167167which is not thread-safe. See the :mod: `warnings ` module for more details.
168+
169+
170+ Increased memory usage
171+ ----------------------
172+
173+ The free-threaded build will typically use more memory compared to the default
174+ build. There are multiple reasons for this, mostly due to design decisions.
175+
176+
177+ All interned strings are immortal
178+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
179+
180+ For modern Python versions (since version 2.3), interning a string (e.g. with
181+ :func: `sys.intern `) does not cause it to become immortal. Instead, if the last
182+ reference to that string disappears, it will be removed from the interned
183+ string table. This is not the case for the free-threaded build and any interned
184+ string will become immortal, surviving until interpreter shutdown.
185+
186+
187+ Non-GC objects have a larger object header
188+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
189+
190+ The free-threaded build uses a different :c:type: `PyObject ` structure. Instead
191+ of having the GC related information allocated before the :c:type: `PyObject `
192+ structure, like in the default build, the GC related info is part of the normal
193+ object header. For example, on the AMD64 platform, ``None `` uses 32 bytes on
194+ the free-threaded build vs 16 bytes for the default build. GC objects (such as
195+ dicts and lists) are the same size for both builds since the free-threaded
196+ build does not use additional space for the GC info.
197+
198+
199+ QSBR can delay freeing of memory
200+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
201+
202+ In order to safely implement lock-free data structures, a safe memory
203+ reclamation (SMR) scheme is used, known as quiescent state-based reclamation
204+ (QSBR). This means that the memory backing data structures allowing lock-free
205+ access will use QSBR, which defers the free operation, rather than immediately
206+ freeing the memory. Two examples of these data structures are the list object
207+ and the dictionary keys object. See ``InternalDocs/qsbr.md `` in the CPython
208+ source tree for more details on how QSBR is implemented. Running
209+ :func: `gc.collect ` should cause all memory being held by QSBR to be actually
210+ freed. Note that even when QSBR frees the memory, mimalloc may not immediately
211+ return that memory to the OS and so the resident set size (RSS) of the process
212+ might not decrease.
213+
214+
215+ mimalloc allocator vs pymalloc
216+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217+
218+ The default build will normally use the "pymalloc" memory allocator for small
219+ allocations (512 bytes or smaller). The free-threaded build does not use
220+ pymalloc and allocates all Python objects using the "mimalloc" allocator. The
221+ pymalloc allocator has the following properties that help keep memory usage
222+ low: per-allocated-block overhead is small, effectively prevents memory
223+ fragmentation, and quickly returns free memory to the operating system. The
224+ mimalloc allocator does quite well in these respects as well but can have some
225+ more overhead.
226+
227+ In the free-threaded build, mimalloc manages memory in a number of separate
228+ heaps (currently five). For example, all GC supporting objects are allocated
229+ from their own heap. Using separate heaps means that free memory in one heap
230+ cannot be used for an allocation that uses another heap. Also, some heaps are
231+ configured to use QSBR (quiescent-state based reclamation) when freeing the
232+ memory that backs up the heap (known as "pages" in mimalloc terminology). The
233+ details of QSBR are their own topic but the short summary is that it creates a
234+ delay between the object being freed and the memory being released, either for
235+ new allocations or back to the OS.
236+
237+ The mimalloc allocator also defers returning memory back to the OS. You can
238+ reduce that delay by setting the environment variable
239+ :envvar: `!MIMALLOC_PURGE_DELAY ` to ``0 ``. Note that this will likely reduce
240+ the performance of the allocator.
241+
242+
243+ Free-threaded reference counting can cause objects to live longer
244+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
245+
246+ In the default build, when an object's reference count reaches zero, it is
247+ normally deallocated. The free-threaded build uses "biased reference
248+ counting", with a fast-path for objects "owned" by the current thread and a
249+ slow path for other objects. See :pep: `703 ` for additional details. Any time
250+ an object's reference count ends up in a "queued" state, deallocation can be
251+ deferred. The queued state is cleared from the "eval breaker" section of the
252+ bytecode evaluator.
253+
254+ The free-threaded build also allows a different mode of reference counting,
255+ known as "deferred reference counting". This mode is enabled by setting a flag
256+ on a per-object basis. Deferred reference counting is enabled for the
257+ following types:
258+
259+ * module objects
260+ * module top-level functions
261+ * class methods defined in the class scope
262+ * descriptor objects
263+ * thread-local objects, created by :class: `_thread._local `
264+
265+ When deferred reference counting is enabled, references from Python function
266+ stacks are not added to the reference count. This scheme reduces the overhead
267+ of reference counting, especially for objects used from multiple threads.
268+ Because the stack references are not counted, objects with deferred reference
269+ counting are not immediately freed when their internal reference count goes to
270+ zero. Instead, they are examined by the next GC run and, if no stack
271+ references to them are found, they are freed. This means these objects are
272+ freed by the GC and not when their reference count goes to zero, as is typical.
0 commit comments