-
Notifications
You must be signed in to change notification settings - Fork 224
Avoid use-after-free with stdexec::run_loop #1746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Avoid use-after-free with stdexec::run_loop #1746
Conversation
|
@ccotter This should fix your additional concerns regarding lifetime. Would you mind actually checking it with relacy for me? |
|
After thinking about it, I would actually prefer this over #1742 since it fixed all users of @ccotter, @ericniebler what do you think? |
|
This still ends up being problematic .. one race (verified with Relacy as I described in #1742 (comment)) will be that a background thread can push the no-op task, but immediately after pushing in https://github.com/NVIDIA/stdexec/blob/6302fbe/include/stdexec/__detail/__atomic_intrusive_queue.hpp#L44, but before line 49, the sync_wait thread observe the no-op item, process it, decrement I'm nearly convinced this can't be done with atomics, and that we need to move back to the mutex/cv. It may not be worth dealing with atomics in run_loop. I'm not sure where else it'll be used, but from the sync_wait perspective, it will be used sparingly in programs (in many cases, exactly once in |
|
This still ends up being problematic .. one race (verified with Relacy as
I described in #1742 (comment)
<#1742 (comment)>)
will be that a background thread can push the no-op task, but immediately
after pushing in
https://github.com/NVIDIA/stdexec/blob/6302fbe/include/stdexec/__detail/__atomic_intrusive_queue.hpp#L44,
but before line 49, the sync_wait thread observe the no-op item, process
it, decrement __task_count_ to 0, then exit run() and destroy the state.
Thanks for checking, too bad...
I'm nearly convinced this can't be done with atomics, and that we need to
move back to the mutex/cv.
I have one more idea which I want to try...
It may not be worth dealing with atomics in run_loop. I'm not sure where
else it'll be used, but from the sync_wait perspective, it will be used
sparingly in programs (in many cases, exactly once in main()).
It is used in `exec::single_thread_context` as well and I think it's indeed
a nice abstraction for these purposes.
Regarding `sync_wait` usage, I slightly disagree, there are cases where you
want to call it in multiple places in the program. It's the primary way to
'return' from the async world. Basically one way to switch from red to blue
functions and sometimes essential to interface with other libraries.
… |
|
The latest commit addresses the one remaining use after free bug. I ran the relacy tests myself, which now execute cleanly. Also added the sync_wait test as well as accounting for missing support of atomic_ref in relacy. @ccotter thanks for the hints! |
We need to synchronize returning from `__run_loop_base::run` with potentially concurrent calls to `__run_loop_base::finish`. This is done by introducing a counter, ensuring proper completion of all tasks in flight. Also see NVIDIA#1742 for additional information.
- Properly synchronizing `finish` with `run` with task counts - Adding sync_wait relacy test for verification - adapting atomic wrappers to account for missing std::atomic_ref in relacy
113e104 to
8baf34b
Compare
|
First off, I want to say I admire the exploration of different ways for atomics to safely implement a race/deadlock/memory bug-free sync_wait/run_loop. Thanks for adding the Relacy test in - I think that would have helped detect the bug when it crept in in #1683. I'm glad it was able to help prove the correctness of this latest version. I added #1749 to see if we should add these tests into a more standard workflow like CI.
True. There are many programs that I think will be exactly one sync_wait in main. There will be some programs that need to transfer from red to blue functions, although I don't know if sync_wait's implementation choice (atomics vs mutex) would dominate those program's performance - and if it did, I think it'd be time for those programs to re-evaluate why they have so many sync_waits. And yes, I did ignore the fact that run_loop is not just an implementation detail of sync_wait, and that it's used in static_thread_pool, but that it's also a standalone C++26 component which may be used outside of sync_wait/static_thread_pool! We need to consider all of this for how we address the existing bug. In the most recent version of the fix, I think what can happen now is that If I can try to summarize, and with a myopic view of only addressing sync_wait for the sake of technical discussion, the core of what we are trying to solve simplifies to creating a class with an interface like the below (I know the actual GoalThere are a few ways we can try to implement this. Using atomics with wait/notifyThe problem with this is use-after-free may occur if the thread pool thread sets is_done to true, then wait sees is_done is true and exits, leading to the sync_object being destroyed, all before the thread pool thread can call is_done.notify_one(). Using atomics with spin loop (no wait/notify)This doesn't have any use-after-free bugs, but relies on spinning in some cases. Heap allocated state with atomic wait/notifyThis is safe because it ensures that even if the thread that owns the sync_object goes out of scope, the signal thread still has access to the atomic communication variable as long as it needs it. However, this incurs the cost of heap+shared_ptr's atomics. Using mutex/cvI believe this is safe in terms of races/deadlocks and memory, with no extra spinning. SummaryI'm not sure if this explores the full design space, but I think it shows some of the tradeoffs. We know the first version (atomic with wait/notify) has use-after-free bugs that I don't think we can get around, given the inherent nature of The spin loop version is memory safe, but has the unfortunate spinning. Then, we get to the heap solution which is safe, at the cost of the heap allocation + atomics of the shared_ptr. For some cases like sync_wait using run_loop, a mutex/cv is probably no worse than atomics as there will be almost no contention (and on some platforms, no additional kernel calls compared to atomics). But for longer lived / contentious use cases, like static_thread_pool (or other, given this is a public C++26 class), I wonder (without providing much more to go on) if we can specifically address finish() to use mutex/cv, without the core (and possibly numerous/contentious like in the static_thread_pool) push/pop operations using mutex/cv when the run_loop is not shutting down... Anyway, this has been an interesting deep dive! |
We need to synchronize returning from
__run_loop_base::runwith potentially concurrent calls to__run_loop_base::finish. This is done by introducing a counter, ensuring proper completion of all tasks in flight.Also see #1742 for additional information.