-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Avoid control threads race condition #20002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Avoid control threads race condition #20002
Conversation
When starting the dt gui and calling `gtk_main()` we must be sure we can safely dispatch a control job. We can do so only if all control threads are running. We achieve this by adding a control->running atomic counter which is increased whenever any of the threads code has been started. As we know the number of started threads we can check & wait (for up to 1sec) until all threads are up&running before leaving `dt_control_jobs_init()`. The new function `gboolean dt_control_all_running()` tests for running vs requested threads and is used to test if we can finally use gtk_main().
|
Can't produce a hang right now, maybe the logo change? |
Sorry - i don't understand. |
|
The darktable icon changed to the santa hat |
| dt_pthread_setname(name); | ||
| free(params); | ||
| const int32_t threadid_res = _control_get_threadid_res(); | ||
| dt_atomic_add_int(&s->running_jobs, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have three add 1 for running_jobs, how is this working then? Or I'm not understanding the meaning of this new variable. Is that the current number of running jobs or the total amount of parallel threads started at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the fixed number of threads running used by the job scheduler. Maybe a renaming ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More precise, we know the overall number of threads used. But we don't know (without the new counters) if a specific thread is already running (that's pthread behaviour). So we test & wait until the OS scheduler was friendly enough. On my system logs look like this
0,0669 [dt_control_jobs_init] 6 of 11 threads running
0,0680 [dt_control_jobs_init] 11 of 11 threads running
and if we run dt without this test we might be in trouble. Very likely a simple usleep() would do it but we can't be sure.
When starting the dt gui and calling
gtk_main()we must be sure we can safely dispatch a control job. We can do so only if all control threads are running.We achieve this by adding a control->running atomic counter which is increased whenever any of the threads code has been started. As we know the number of started threads we can check & wait (for up to 1sec) until all threads are up&running before leaving
dt_control_jobs_init().The new function
gboolean dt_control_all_running()tests for running vs requested threads and is used to test if we can finally usegtk_main().@wpferguson this would be my "proper" solution for the hypothesis, that we have a race condition as the root cause for #19992. At least: if we dispatch a control job either in lighttable or darkroom it could be the case that the worker for that job is not running yet and thus ends in nirvana. I have to confess i couldn't trigger it yet a single time. (but i am mostly on my slow notebook still).
Would you and others discussing in #19992 be able to compile and test (@victoryforce @gi-man)
@TurboGit code is absolutely safe to me.