Improve post-teardown performance by mjp41 · Pull Request #810 · microsoft/snmalloc

mjp41 · 2026-02-05T16:58:22Z

This change improves the performance of post-teardown operations. (See #809)

The main changes are:

Reduce time spent in flushing a thread
Back-off performing flush if a thread is used a lot during teardown.
Disable fork protection by default
Improve branch prediction for the spin lock.

- add perf test that compares baseline frees vs frees after ThreadAlloc teardown - cover atexit/static destruction scenario to repro post-finalisation slowdown

The teardown code can be a performance bottleneck when a lot of deallocations occur after a thread begins to be torn down. This code tracks how many operations have occured after the thread has been torndown, and will stop performing the tidying with an exponential back-off. This takes the overhead on the microbenchmark from around 50x to 10x.

This improves performance for most cases. The feature is problematic on glibc as there is potential for it to work incorrectly. This disables the feature by default, and can be enabled if someone really cares about it.

The in_use flag on pooled types is used to check for incorrect pool usage. This is not required for correct code. So moving to a Debug feature.

Copilot

Pull request overview

This PR is an experiment addressing #809 by reducing overhead in “post-teardown” allocator usage (e.g., static/global destruction), along with a few related performance-oriented tweaks and a new optional fork-protection build flag.

Changes:

Add a new perf micro-benchmark to measure deallocation cost before vs. after ThreadAlloc::teardown().
Reduce teardown/flush overhead in post-teardown scenarios (amortized flushing) and avoid expensive teardown-time validation work unless a mitigation is enabled.
Make fork protection (PreventFork) optional via a new CMake option and add small spin-lock micro-optimizations.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/test/perf/post_teardown/post-teardown.cc	Adds a perf test reproducing #809-like “free after teardown” behavior.
src/snmalloc/mem/pooled.h	Lowers overhead of pooled-object “in_use” tracking (debug vs release behavior).
src/snmalloc/mem/corealloc.h	Optimizes `flush()` teardown paths (skip validation unless mitigation; early-return in slab release).
src/snmalloc/global/threadalloc.h	Amortizes `flush()` calls across repeated `teardown()` invocations.
src/snmalloc/ds_aal/prevent_fork.h	Makes `PreventFork` a no-op unless `SNMALLOC_PTHREAD_FORK_PROTECTION` is enabled.
src/snmalloc/ds_aal/flaglock.h	Adds branch prediction hints to spin-lock acquisition loops.
CMakeLists.txt	Adds `SNMALLOC_PTHREAD_FORK_PROTECTION` option and enforces atfork availability when enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T14:01:03Z

CMakeLists.txt

+add_as_define(SNMALLOC_PTHREAD_FORK_PROTECTION)
 add_as_define(SNMALLOC_PLATFORM_HAS_GETENTROPY)
-add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)
+#add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)


SNMALLOC_PTHREAD_ATFORK_WORKS is still computed via CHECK_CXX_SOURCE_COMPILES, but the compile definition export is commented out. As a result, the C++ code’s #ifdef SNMALLOC_PTHREAD_ATFORK_WORKS branches (e.g., prevent_fork.h’s pthread_atfork registration and src/test/func/protect_fork/protect_fork.cc) will behave as if atfork is unsupported even when CMake detected it. Please restore exporting this macro (or otherwise plumb the result into the preprocessor), especially since SNMALLOC_PTHREAD_FORK_PROTECTION depends on it.

Suggested change

#add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)

add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)

Copilot · 2026-02-06T14:01:03Z

src/snmalloc/ds_aal/prevent_fork.h

    static void ensure_init()
    {
-#ifdef SNMALLOC_PTHREAD_ATFORK_WORKS
+#  ifdef SNMALLOC_PTHREAD_ATFORK_WORKS
      static stl::Atomic<bool> initialised{false};

      if (initialised.load(stl::memory_order_acquire))
        return;

      pthread_atfork(prefork, postfork_parent, postfork_child);
      initialised.store(true, stl::memory_order_release);
-#endif
+#  endif


When SNMALLOC_PTHREAD_FORK_PROTECTION is enabled, ensure_init() only calls pthread_atfork under #ifdef SNMALLOC_PTHREAD_ATFORK_WORKS. If that macro is not consistently defined by the build (and it currently isn’t exported from CMake), fork protection silently becomes a no-op even though the option is ON. Consider making the pthread_atfork path depend solely on SNMALLOC_PTHREAD_FORK_PROTECTION (given the CMake fatal-error guard), or ensure the SNMALLOC_PTHREAD_ATFORK_WORKS define is always provided when detected.

Copilot · 2026-02-06T14:01:04Z

src/snmalloc/ds_aal/prevent_fork.h

    }
  };
+#else
+  // The fork protection can cost a lot and it generally not required.


Grammar: "it generally not required" should be "it is generally not required".

Suggested change

// The fork protection can cost a lot and it generally not required.

// The fork protection can cost a lot and it is generally not required.

Copilot · 2026-02-06T14:01:04Z

src/snmalloc/global/threadalloc.h

+      times_teardown_called++;
+      if (bits::is_pow2(times_teardown_called) || times_teardown_called < 128)
+        alloc->flush();
      AllocPool<Config>::release(alloc);
      alloc = const_cast<Alloc*>(&default_alloc);


ThreadAlloc::teardown() now sometimes releases an allocator back to AllocPool without calling alloc->flush(). Since flush() is what drains fast free lists, processes the message queue, and posts the remote-dealloc cache, skipping it can leave remote frees unposted and memory retained in an allocator that may sit idle in the pool for a long time. If the intent is to amortize teardown cost, consider introducing a cheaper “post-teardown flush” that at least processes/publishes remote deallocations (and/or document why it’s safe to skip a full flush before pooling).

mjp41 added 2 commits February 5, 2026 16:58

Add perf microbenchmark for post-teardown frees (issue #809)

657bc67

- add perf test that compares baseline frees vs frees after ThreadAlloc teardown - cover atexit/static destruction scenario to repro post-finalisation slowdown

mjp41 force-pushed the post_teardown_perf branch from 75c1fc7 to bbd7edc Compare February 5, 2026 16:59

mjp41 mentioned this pull request Feb 5, 2026

snmalloc pessimal performance in post-main cleanup #809

Closed

akrieger added a commit to akrieger/Cataclysm-DDA that referenced this pull request Feb 5, 2026

Apply teardown optimizations from microsoft/snmalloc#810

024a9da

akrieger added a commit to akrieger/Cataclysm-DDA that referenced this pull request Feb 5, 2026

Apply teardown optimizations from microsoft/snmalloc/pull/810

0cf759c

mjp41 marked this pull request as ready for review February 6, 2026 11:11

mjp41 added 4 commits February 6, 2026 11:38

Make pthread_fork protection opt in.

ecf4a5c

This improves performance for most cases. The feature is problematic on glibc as there is potential for it to work incorrectly. This disables the feature by default, and can be enabled if someone really cares about it.

Add branch predictor hints to spin lock

1680012

Make in use checking only in debug builds

e0c80df

The in_use flag on pooled types is used to check for incorrect pool usage. This is not required for correct code. So moving to a Debug feature.

Clangformat

898e77d

mjp41 requested review from SchrodingerZhu and Copilot February 6, 2026 12:56

Copilot started reviewing on behalf of mjp41 February 6, 2026 13:54 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

mjp41 changed the title ~~Address #809~~ Improve post-teardown performance Feb 6, 2026

mjp41 added 2 commits February 6, 2026 14:10

Code review!

52cf718

Fix protect fork benchmarks

0c05f5b

SchrodingerZhu approved these changes Feb 6, 2026

View reviewed changes

Clangformat

a7b3711

mjp41 merged commit 6dbf08a into main Feb 6, 2026
185 checks passed

mjp41 deleted the post_teardown_perf branch February 6, 2026 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve post-teardown performance#810

Improve post-teardown performance#810
mjp41 merged 9 commits intomainfrom
post_teardown_perf

mjp41 commented Feb 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	#add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)
	add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)

	// The fork protection can cost a lot and it generally not required.
	// The fork protection can cost a lot and it is generally not required.

Conversation

mjp41 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mjp41 commented Feb 5, 2026 •

edited

Loading