Skip to content

Improve post-teardown performance#810

Merged
mjp41 merged 9 commits intomainfrom
post_teardown_perf
Feb 6, 2026
Merged

Improve post-teardown performance#810
mjp41 merged 9 commits intomainfrom
post_teardown_perf

Conversation

@mjp41
Copy link
Member

@mjp41 mjp41 commented Feb 5, 2026

This change improves the performance of post-teardown operations. (See #809)

The main changes are:

  • Reduce time spent in flushing a thread
  • Back-off performing flush if a thread is used a lot during teardown.
  • Disable fork protection by default
  • Improve branch prediction for the spin lock.

mjp41 added 2 commits February 5, 2026 16:58
- add perf test that compares baseline frees vs frees after ThreadAlloc teardown
- cover atexit/static destruction scenario to repro post-finalisation slowdown
The teardown code can be a performance bottleneck when a lot of deallocations occur after a thread begins to be torn down.

This code tracks how many operations have occured after the thread has been torndown, and will stop performing the tidying with an exponential back-off.

This takes the overhead on the microbenchmark from around 50x to 10x.
@mjp41 mjp41 force-pushed the post_teardown_perf branch from 75c1fc7 to bbd7edc Compare February 5, 2026 16:59
akrieger added a commit to akrieger/Cataclysm-DDA that referenced this pull request Feb 5, 2026
akrieger added a commit to akrieger/Cataclysm-DDA that referenced this pull request Feb 5, 2026
@mjp41 mjp41 marked this pull request as ready for review February 6, 2026 11:11
mjp41 added 4 commits February 6, 2026 11:38
This improves performance for most cases.  The feature is problematic on glibc as there is potential for it to work incorrectly.

This disables the feature by default, and can be enabled if someone really cares about it.
The in_use flag on pooled types is used to check for incorrect pool usage.  This is not required for correct code.  So moving to a Debug feature.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is an experiment addressing #809 by reducing overhead in “post-teardown” allocator usage (e.g., static/global destruction), along with a few related performance-oriented tweaks and a new optional fork-protection build flag.

Changes:

  • Add a new perf micro-benchmark to measure deallocation cost before vs. after ThreadAlloc::teardown().
  • Reduce teardown/flush overhead in post-teardown scenarios (amortized flushing) and avoid expensive teardown-time validation work unless a mitigation is enabled.
  • Make fork protection (PreventFork) optional via a new CMake option and add small spin-lock micro-optimizations.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/test/perf/post_teardown/post-teardown.cc Adds a perf test reproducing #809-like “free after teardown” behavior.
src/snmalloc/mem/pooled.h Lowers overhead of pooled-object “in_use” tracking (debug vs release behavior).
src/snmalloc/mem/corealloc.h Optimizes flush() teardown paths (skip validation unless mitigation; early-return in slab release).
src/snmalloc/global/threadalloc.h Amortizes flush() calls across repeated teardown() invocations.
src/snmalloc/ds_aal/prevent_fork.h Makes PreventFork a no-op unless SNMALLOC_PTHREAD_FORK_PROTECTION is enabled.
src/snmalloc/ds_aal/flaglock.h Adds branch prediction hints to spin-lock acquisition loops.
CMakeLists.txt Adds SNMALLOC_PTHREAD_FORK_PROTECTION option and enforces atfork availability when enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CMakeLists.txt Outdated
add_as_define(SNMALLOC_PTHREAD_FORK_PROTECTION)
add_as_define(SNMALLOC_PLATFORM_HAS_GETENTROPY)
add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)
#add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SNMALLOC_PTHREAD_ATFORK_WORKS is still computed via CHECK_CXX_SOURCE_COMPILES, but the compile definition export is commented out. As a result, the C++ code’s #ifdef SNMALLOC_PTHREAD_ATFORK_WORKS branches (e.g., prevent_fork.h’s pthread_atfork registration and src/test/func/protect_fork/protect_fork.cc) will behave as if atfork is unsupported even when CMake detected it. Please restore exporting this macro (or otherwise plumb the result into the preprocessor), especially since SNMALLOC_PTHREAD_FORK_PROTECTION depends on it.

Suggested change
#add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)
add_as_define(SNMALLOC_PTHREAD_ATFORK_WORKS)

Copilot uses AI. Check for mistakes.
Comment on lines 46 to +56
static void ensure_init()
{
#ifdef SNMALLOC_PTHREAD_ATFORK_WORKS
# ifdef SNMALLOC_PTHREAD_ATFORK_WORKS
static stl::Atomic<bool> initialised{false};

if (initialised.load(stl::memory_order_acquire))
return;

pthread_atfork(prefork, postfork_parent, postfork_child);
initialised.store(true, stl::memory_order_release);
#endif
# endif
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When SNMALLOC_PTHREAD_FORK_PROTECTION is enabled, ensure_init() only calls pthread_atfork under #ifdef SNMALLOC_PTHREAD_ATFORK_WORKS. If that macro is not consistently defined by the build (and it currently isn’t exported from CMake), fork protection silently becomes a no-op even though the option is ON. Consider making the pthread_atfork path depend solely on SNMALLOC_PTHREAD_FORK_PROTECTION (given the CMake fatal-error guard), or ensure the SNMALLOC_PTHREAD_ATFORK_WORKS define is always provided when detected.

Copilot uses AI. Check for mistakes.
}
};
#else
// The fork protection can cost a lot and it generally not required.
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: "it generally not required" should be "it is generally not required".

Suggested change
// The fork protection can cost a lot and it generally not required.
// The fork protection can cost a lot and it is generally not required.

Copilot uses AI. Check for mistakes.
Comment on lines +117 to 121
times_teardown_called++;
if (bits::is_pow2(times_teardown_called) || times_teardown_called < 128)
alloc->flush();
AllocPool<Config>::release(alloc);
alloc = const_cast<Alloc*>(&default_alloc);
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadAlloc::teardown() now sometimes releases an allocator back to AllocPool without calling alloc->flush(). Since flush() is what drains fast free lists, processes the message queue, and posts the remote-dealloc cache, skipping it can leave remote frees unposted and memory retained in an allocator that may sit idle in the pool for a long time. If the intent is to amortize teardown cost, consider introducing a cheaper “post-teardown flush” that at least processes/publishes remote deallocations (and/or document why it’s safe to skip a full flush before pooling).

Copilot uses AI. Check for mistakes.
@mjp41 mjp41 changed the title Address #809 Improve post-teardown performance Feb 6, 2026
@mjp41 mjp41 merged commit 6dbf08a into main Feb 6, 2026
185 checks passed
@mjp41 mjp41 deleted the post_teardown_perf branch February 6, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants