Advanced Namespace Tools blog 2 January 2017

9front plan9rc Boot Script Improvements

Technical Debt Origins and Accounting

In 9front/ANTS there are two different boot methods supported: a modified version of the 9front bootrc script which includes an ANTS section to create the independent administrative namespace, or a 9front-specific version of the ANTS plan9rc script, selected by setting "bootcmd=plan9rc" in plan9.ini.

(For systems which don't need to boot unattended, it is often useful to leave "bootfile=" blank in plan9.ini, which allows you to specify your choice of kernel and change/enter other plan9.ini variables before the kernel is loaded. You can also interrupt the pre-kernel boot by spamming any key on the keyboard to achieve the same result.)

The ANTS independent boot namespace has its origins in work I did in 2010 in Bell Labs Plan 9, which I called the "rootlessboot" kernel. In standard Labs plan 9, the boot process was handled by C programs until the handoff to /rc/bin/termrc or cpurc. The original version of the plan9rc script was written then, which replaced almost all of the C code and duplicated much of the original behavior while also creating an independent namespace making use of only kernel resources.

The original Labs boot design involved compiling the needed userspace programs for booting (fossil, venti, factotum, ipconfig) into the built-in /boot directory. I started by building off that mechanism by adding more and more programs into /boot. Very early in the process, I modified rc to look for /boot/rcmain instead of /rc/lib/rcmain, and changed the default path to include /boot along with the current directory and /bin. As things evolved, I started using a ramdisk to create a more complete root namespace, binding /boot under /bin, and eventually using a paqfs to hold binaries, but there were still coded-in assumptions about things being found in /boot.

Additionally, 9front has its own minimal boot environment created for its own bootrc script, although that environment isn't preserved or accessible later. However, there was a lot of duplication between the contents of the 9front bootfs.paq and the ANTS bootpaq. The plan9rc boot and ANTS scripts were pretty much just ignoring the 9front bootpaq.

So, the goals of paying down the technical debt are:

Compile, Reboot, Compile and Reboot again

This kind of work has a fairly painful and time consuming development workflow. When the program being modified and debugged is the entire kernel and each test run requires rebooting a machine, the process of making small incremental changes, and testing them and fixing problems is quite time consuming. I do my testing and debugging using some of my vultr vps nodes, which is relatively quick and painless compared to working with physical hardware, but still involves a lot of repeated cpu-ing, repository syncing, and web console management-ing.

The first thing to do was fix something that was broken in the plan9rc script: use of the Cache File Server (cfs) to speed up the use of filesystems acquired from the network. I had a completely broken line of code that looked like this:

cfs -a tcp!$fs!564 -f $cfs -z boot

The biggest problem here is that cfs doesn't even HAVE a -z flag. I had at one point made a customized cfs which did have a -z flag to provide a file descriptor in /srv, but I had moved that to "retired" awhile ago, and I can't remember if it ever actually worked properly. So, I did what all good coders do: borrow code that actually works. I just took how the 9front bootrc sets up cfs using an existing /srv/boot and adopted it:

{/bin/cfs -s -f $cfs </srv/boot &} | echo 0 >/srv/cfs
mv /srv/boot /srv/boot.nocfs
mv /srv/cfs /srv/boot

With that taken care of, there was some more cleanup to be done in the handling of the bootargs variable and starting the root fileserver. When I adapted plan9rc to 9front, I left fossil and kfs under the heading "case local" in the "dogetrootfs" fn, with different cases for cwfs and hjfs. This was based on some weird parsing logic, and 9front has ditched kfs anyway. Unlike fossil+venti, I don't see any reason to try to preserve kfs support in ANTS, so I restructured the bootargs parsing and dogetrootfs case structure to have a more consistent treatment of the different disk fs options. Additionally, I added some trivial sanity checks and simple auto-detection logic, so for instance the cwfs case tries to find an fscache partition if the user supplied data isn't correct:

if (! test -e $bootpartition)
	bootpartition=`{ls /dev/sd*/fscache}

With those improvements made to the base plan9rc script, it was time to bring the 9front bootfs.paq into the ANTS namespace and remove duplicate programs. The first was accomplished by a few small changes. First, the 9front boot.c sets up the paqfs with a mountpoint rather than a /srv file, so I added a line to plan9rc:

paqfs -s -S frontpaq /boot/bootfs.paq

And added a few lines to the root/lib/namespace file so that newly created processes via cpu in will have the correct namespace:

mount -a #s/frontpaq /mnt/broot
bind -a /mnt/broot /
bind -a /mnt/broot/lib /lib

With those changes made, it was time to remove duplication. The way to do this was to compare the 9front file 9/boot/bootfs.proto with the ANTS files cfg/mkpaq and cfg/toolcopy, and remove anything from the ANTS bootpaq and tools.tgz which was present in the 9front bootfs.paq.

Removing Customized rc and /boot Pathing

The final step was removing the customized rc, changing any scripts which used #!/boot rather than #!/bin, and adjusting any pathed invocations. The only exception to this is that there is still a customized factotum placed in /boot, to make the factotum -x option available. As a result, invocations of factotum were adjusted to use /boot/factotum rather than simply factotum. Previously, the customized rcmain file added /boot to the path prior to checking bin, so alternate versions of utilities in /boot would have priority automatically. With /boot/factotum the only remaining use of a /boot path, it was easy to just add that prefix to its occurences in the boot scripts.

The build script also had to be adjusted, with the addition of a "build frontpatched" target in addition to the basic "build patched", because the Labs version of ANTS is still using the original architecture. (There were also several places in this process where it was necessary to add another customized copy of something such as the mkpaq and toolcopy scripts into the frontmods subdirectory, and make sure it was being bound on top of the original versions.)

With all of this done, it was just a matter of another few compile-reboot-test cycles to glue down any loose edges. The whole de-crufting process took a full coding workday and almost 20 small incremental commits. In fact, I discovered another namespace issue while doing this writeup which required another fix to the namespace files.

The Next (and final?) Step

Having come this far, I think I should carry through to "the end" - ending the use of two separate paqfs. I am planning on making an ANTS customized version of the bootfs.proto file in 9front to add in the extra programs in the mkpaq and finish bringing the boot namespace into compliance with standard conventions. In particular, this will involve changing scripts that are expecting to find utilities like listen1 in bin rather than bin/aux. (Ants uses a "flat" structure for bin without subdirectories currently.)

And with another few hours of tiny patches and reboot testing, the ANTS bootpaq has been entirely replaced by a fattened-up 9front bootfs.paq. There were no real surprises and fewer changes were necessary than I was expecting. The only unexpected change was discovering that the custom ANTS namespace files had a mistaken mount -a rather than mount -b for /srv/factotum to /mnt. I'm not certain of how that error hadn't been causing trouble before. What has changed in the namespace structure to expose that problem?

All that is really remaining now would be to change the toolcopy script/tools.tgz to also use subdirs of bin for programs that conventionally appear there, but that would require some annoying sub-binds in the namespace files. Remember, creating a union bind does NOT recursively union bind the subdirectories. If the boot ramfs and the boot paqfs are both unioned to /bin, whichever is first in the union order will take precedence for walks to the subdirectories, so you would need additional binds for bin/aux, bin/ip, etc. My instinct is to leave the tools.tgz structure alone for now.