flak rss random

winning the race

While working on boehm-gc, kurt ran into a threaded test case that sometimes got stuck, spinning on the sched_yield system call. In theory, yielding allows other processes to run, but on an otherwise idle machine, it just ends up using all the CPU itself, in a futile effort to not run. This initially looked like a case of trying to recursively acquire a spinlock (not supported) somewhere in the guts of librthread. Peering at the test case, this seemed a reasonable explanation (it was doing some twisty stuff, creating a new thread in a dying thread’s thread specific data destructor), but further inspection revealed that librthread is careful to release its internal locks before calling the destructor. The bug remains a mystery.

Mysterious bugs are interesting, but mysterious bugs also tend to earn the curiosity label and not the fix it label. Bug goes onto the back burner for a while. Just kill the process and move on, sorry. Then kurt mentions killing the process doesn’t work. Yeah, whatever, boehm does weird things with signals, who knows, use SIGKILL. Wait, what? kill -9 doesn’t work? This isn’t a bug, it’s a serious problem.

digging deeper

A little more info about the broken test. Sometimes, but not always, the process would get stuck. One thread would be idle and one thread would be spinning wildly. The machine would generally remain responsive (furiously yielding has that effect), but the process could not be killed.

The killed process (specifically, the thread containing main) is waiting in “thrdeath” in the kernel, meaning it’s circling the drain. In fact, it’s quite a ways down the drain. All it’s waiting for is for all the other threads to make their own way to exit before tearing down the last bits of the process. Why is its other thread running loose and not heading to the exit? (Refer to exit1 in kern_exit.c)

But first, why doesn’t SIGKILL work? That’s easy, the process has the PS_EXITING flag set, which says the process is so close to death that there’s no point delivering the signal. Bad things can happen if a partially torn down process gets a signal. (Refer to ptsignal in kern_sig.c) Back to the original question, what’s up with the spinning thread?

Obviously, the runaway missed the call to exit. That’s what the single_thread_set function does, iterate over all the other threads and tell them to exit (or simply suspend, depending on arguments). This function isn’t supposed to return until all the other threads have reached single_thread_check, meaning if the main thread got past it (and it did, evidenced by sleeping in “thrdeath“), then the function must have returned too early. Either because it forgot to wait for somebody or because one of the exiting threads started up again. The second possibility seems less likely, even though it’d be consistent with the observed behavior. When (and who) would this function forget to wait for a thread?

single_thread_set uses the linked list of threads in struct process as the canonical list of all threads in a process. How can it be wrong? The list is manipulated in two places. In exit1, which we’ve already looked at, and since the problem is a thread not getting to exit1, we can bet that’s not the problem. The other place is fork1 in kern_fork.c. That’s a bingo!

time for a fix

If the process starts to exit while another thread is creating a new thread, the new thread will not yet be visible via the linked list. It won’t be told exit. It won’t exit. And the main thread won’t wait for it, instead waiting only for the creating thread. (Thread M calls exit while thread X creates a new thread Y. M will signal X to exit and wait for it, but Y will be neither signalled nor waited for.) (Somewhat later, M will indeed wait for Y to exit, but that’s farther down. Essentially, M waits for everybody to start exiting, and once they’ve all started, M waits for everybody to finish. M was stuck at the latter point, but Y didn’t know it should start exiting. At this point, there is no longer any way to tell Y to exit, hence the bug.)

The fix is quite simple.

--- src/sys/kern/kern_fork.c	2013/04/06 04:44:34	1.146
+++ src/sys/kern/kern_fork.c	2013/06/01 17:04:46	1.147
@@ -325,6 +325,14 @@ fork1(struct proc *curp, int exitsig, int flags, void 
 		p->p_p = pr = curpr;
 		TAILQ_INSERT_TAIL(&pr->ps_threads, p, p_thr_link);
 		pr->ps_refcnt++;
+		/*
+		 * if somebody else wants to take us to single threaded mode,
+		 * count ourselves in.
+		 */
+		if (pr->ps_single) {
+			curpr->ps_singlecount++;
+			atomic_setbits_int(&p->p_flag, P_SUSPSINGLE);
+		}
 	} else {
 		process_new(p, curpr);
 		pr = p->p_p;

When a new thread is added to the list, also check if it should immediately exit/suspend and adjust the counter appropriately. (This works because it’s X running in the context of the patched code. M is still waiting for X and won’t proceed until X indicates it has reached single_thread_check and started exiting. Bumping the counter means M will also now wait for Y and a flag has also been set to tell Y to exit immediately after it starts running.)

atomic note

For the most part, there’s just one large kernel lock protecting everything, which is why there aren’t any explicit locks protecting the thread list, the single counter, or any of the other structures involved here (note that single_thread_set does need to lock out the scheduler, because that does run concurrently). The race exists because several operations (allocating memory in pool_get) in fork1 may not be atomic and will release the kernel lock, which introduces the window into which another thread can call exit. At the point where the new thread is added to the list, the main thread can either have called single_thread_set or not, but it cannot be calling it.

Posted 03 Jun 2013 07:13 by tedu Updated: 03 Jun 2013 21:18
Tagged: openbsd programming