Skip to content

Conversation

@jenshannoschwalm
Copy link
Collaborator

When starting the dt gui and calling gtk_main() we must be sure we can safely dispatch a control job. We can do so only if all control threads are running.

We achieve this by adding a control->running atomic counter which is increased whenever any of the threads code has been started. As we know the number of started threads we can check & wait (for up to 1sec) until all threads are up&running before leaving dt_control_jobs_init().

The new function gboolean dt_control_all_running() tests for running vs requested threads and is used to test if we can finally use gtk_main().

@wpferguson this would be my "proper" solution for the hypothesis, that we have a race condition as the root cause for #19992. At least: if we dispatch a control job either in lighttable or darkroom it could be the case that the worker for that job is not running yet and thus ends in nirvana. I have to confess i couldn't trigger it yet a single time. (but i am mostly on my slow notebook still).

Would you and others discussing in #19992 be able to compile and test (@victoryforce @gi-man)

@TurboGit code is absolutely safe to me.

When starting the dt gui and calling `gtk_main()` we must be sure we can safely dispatch a control job.
We can do so only if all control threads are running.

We achieve this by adding a control->running atomic counter which is increased whenever any of the threads
code has been started. As we know the number of started threads we can check & wait (for up to 1sec) until
all threads are up&running before leaving `dt_control_jobs_init()`.

The new function `gboolean dt_control_all_running()` tests for running vs requested threads and is
used to test if we can finally use gtk_main().
@jenshannoschwalm jenshannoschwalm added the bugfix pull request fixing a bug label Dec 26, 2025
@wpferguson
Copy link
Member

Can't produce a hang right now, maybe the logo change?

@jenshannoschwalm
Copy link
Collaborator Author

Can't produce a hang right now, maybe the logo change?

Sorry - i don't understand.

@TurboGit TurboGit added this to the 5.4.1 milestone Dec 26, 2025
@wpferguson
Copy link
Member

The darktable icon changed to the santa hat

dt_pthread_setname(name);
free(params);
const int32_t threadid_res = _control_get_threadid_res();
dt_atomic_add_int(&s->running_jobs, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have three add 1 for running_jobs, how is this working then? Or I'm not understanding the meaning of this new variable. Is that the current number of running jobs or the total amount of parallel threads started at this point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the fixed number of threads running used by the job scheduler. Maybe a renaming ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precise, we know the overall number of threads used. But we don't know (without the new counters) if a specific thread is already running (that's pthread behaviour). So we test & wait until the OS scheduler was friendly enough. On my system logs look like this

     0,0669 [dt_control_jobs_init] 6 of 11 threads running
     0,0680 [dt_control_jobs_init] 11 of 11 threads running

and if we run dt without this test we might be in trouble. Very likely a simple usleep() would do it but we can't be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix pull request fixing a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants