Advanced Namespace Tools blog 16 December 2016

Hubfs: patching the orphaned readers bug

In the previous post, I described discovering and diagnosing a bug in the hubfs pipe/message multiplexer, where if multiple readers were reading from a hub, some would be forgotten when the ring buffer of queued 9p read requests "wrapped around" to the beginning.

The bug has now been patched, using different logic than I originally anticipated. My first thoughts were to create a "wraparound flag" and counter of unanswered messages, then have the msgsend() procedure backtrack to deal with the read requests that would be orphaned otherwise. I wasn't thrilled with this design because it seemed a bit complex and would require either awkward shuffling of variables or some ugly mostly duplicated code.

I tried explaining the bug to my friend Maevele by describing a small town with a line of numbered post office boxes and a mail delivery person who walked the line in sequence and kept track of what messages had been retrieved or not. "This seems like a weird way to deliver mail..." was her initial response, but then I revealed it was all just a metaphor for handling 9p messages. She thought for a little bit, and suggested "Couldn't you just change the numbers on the boxes?"

Of course! Rather than having complex logic changing the pattern the mail carrier walks, it was much simpler to "change the numbers on the boxes" and copy the pointers from the end of the message queue to the beginning. That way, no extra variables or variant message-answering loops were needed. Adding the new logic took about ten minutes of work, and after fixing the inevitable off-by-one error in my first version of the change, I spent awhile running tests of the new version. Now, all readers were correctly tracked when the queue of 9p read requests wrapped around to the beginning. Here is the old, broken logic:

if(h->qrnum >= MAXQ -2){
	h->qrnum = 1;
	h->qrans = 1;
}

Here is the new, fixed logic:

if(h->qrnum >= MAXQ - 2){
	j = 1;
	for(i = h->qrans; i <= h->qrnum; i++) {
		h->qreqs[j] = h->qreqs[i];
		h->rstatus[j] = h->rstatus[i];
		j++;
	}
	h->qrnum = h->qrnum - h->qrans + 1;
	h->qrans = 1;
}

The fixed version is now in the hubfs and antsexperiments repos on bitbucket. The same logic fix has now been added to the two other places in the code (freeze mode and write queueing) which are parallel.