arduarn:
You also share signal_mask but, since it is initialised before the threads are created and since pthread_create also triggers a memory barrier, it should be OK.
Look, the fact remains, multi-threaded stuff is tricky and so you have to be scrupulous. Casually sharing even an int between threads is risky. In your problem statement you said: My handler finally gets the signal, prints its message, and sets Terminate. BUT, it does NOT terminate until I send one MORE SIGTERM! And that is exactly what could happen if Terminate is set, but sits in the cache on one core, until the cache is at some point later synchronised and made visible to the main thread.
Are you running this on a multi-core device?
At the end of the day, if you are adamant that your test program is flawless, then the problem must be in the full app's code which we haven't seen yet.
Unless I understand a lot less than I think I do (very likely).... sharing a variable, multi-core processor or not, is perfectly safe provided:
- The variable is declared as volatile
- Accesses are atomic
- Only one thread ever writes it
Volatile should negate any cache effects. I am running an RPi3 which is a 4-core ARM, so ever 32-bit accesses are atomic (assuming the compiler is not stupid enough to mis-align them). I have used the above rules, and they have worked flawlessly, for many years on all kinds of processors.
The app in question kicks off lots of threads, and they're often sharing data, with no problems, and there are only perhaps 3 mutexes in the entire thing, for those places where the rules cannot be enforced (in particular when using the accept() call to respond to socket requests, each of which kicks off a new handler thread).
I would not be at all surprised to find I am doing something wrong, but I don't even know where to look, when the signal handling thread is not even being called! Posting the whole program would be pointless, even if I could do it, because it is a gigantic, and complex, application spread across multiple RPis.
What seems to be a likely problem, and something that is not at all clear to me, is the blocking of signals to the various threads. When a signal occurs, the OS will send it, more or less randomly, to any one of the threads for which that signal is masked. Right now, I suspect that is ALL threads. I've tried un-blocking the signals in all but the handler thread, but that was unsuccessful, likely because I mis-understand the process and terminology. It appears to me that "blocking" refers to blocking the OS from handling the signal, in which case I would want to un-block in all threads EXCEPT the handler thread, as follows:
static sigset_t my_mask;
sigemptyset(&my_mask);
sigaddset(&my_mask, SIGINT);
sigaddset(&my_mask, SIGTERM);
sigaddset(&my_mask, SIGCONT);
sigaddset(&my_mask, SIGTERM);
sigaddset(&my_mask, SIGSEGV);
rc = pthread_sigmask(SIG_UNBLOCK, &my_mask, NULL);
if (rc != 0)
{
logprintf("Failed in pthread_sigmask in signal_thread!\n");
exit(1);
}
then block in the signal handler thread, as follows:
static sigset_t my_mask;
sigemptyset(&my_mask);
sigaddset(&my_mask, SIGINT);
sigaddset(&my_mask, SIGTERM);
sigaddset(&my_mask, SIGCONT);
sigaddset(&my_mask, SIGTERM);
sigaddset(&my_mask, SIGSEGV);
rc = pthread_sigmask(SIG_BLOCK, &my_mask, NULL);
if (rc != 0)
{
logprintf("Failed in pthread_sigmask in signal_thread!\n");
exit(1);
}
Do you know if that is correct?
It's also not at all clear to me if/how/from whom threads "inherit" their signal handling. I've read a lot of conflicting information (one of the joys of the Internet - you can find ANY answer you want!).
Regards,
Ray L.