Advanced Namespace Tools blog 1 January 2017

Testing System Updates in RAM

My local native cpu/file server is a bit of a strange beast: 9front mostly works fine on it, but I have never been able to get it to support GUI use, or boot from the hard drive. I use it as a cpu server via Drawterm, and I boot it via the 9front cd, which then takes an hjfs root from the disk, and then I use reboot path/to/kernel to restart with a locally compiled ANTS kernel that supports fossil+venti and uses that for its root.

As a result, rebooting is extra hassle, and because it is my main system, I have been extra-conservative (read: lazy) about applying updates. I tend to use my remote vultr nodes for keeping up-to-date with 9front and doing development and testing. The arrival of the new year, however, seemed like a good time to get caught up. Since I was quite far behind the main distribution, I was slightly worried that I might have issues performing the update. Since I use fossil, I could of course always restore to a previous snapshot if so, but it occurred to me I could test a full system update entirely non-destructively.

Fossil on a RAM disk for Update Testing

ANTS has a short script called "ramfossil" which starts a ramfs, fills a file with 100mb of zeros, and then initializes a fossil on it to the rootscore of your choice. (A "rootscore" is the signature of the root of the filesystem as stored in Venti). Because fossil is designed as a temporary cache for Venti blocks, a freshly initialized fossil uses almost zero storage; only new or changed blocks are stored in the fossil, the snapshotted data is all available from the Venti, accessible from the fossil but not stored within it. The first step was making a snapshot of the current state of the filesystem:

con /srv/fscons
fsys main snap -a
[a rootscore such is printed: 9ed9dd563697a419fe4bd906b69adb5ee6f41029]
9fs 9fat
echo '9ed9dd563697a419fe4bd906b69adb5ee6f41029' >>/n/9fat/altroots

ANTS, by convention, stores rootscores to a file called "rootscor" in the 9fat partition - for this process, I started another file called "altroots" so I would know they were part of the update process. With the snapshot complete, I made a slightly edited version of the ramfossil script, to provide a larger amount of storage than just 100mb. I actually made it too small (only 200mb), because I underestimated how much space a full all libraries+binaries system rebuild would use.

ramfossil 9ed9dd563697a419fe4bd906b69adb5ee6f41029
rerootwin -f ramfossil
service=con
. $home/lib/profile
grio -s -c 0x00cc00ff

This ramfossil command initializes the fossil to the given score. The rerootwin command performs a chroot like operation to leave the old namespace and take a new root from /srv/ramfossil, while still keeping the previous terminal devices available at /mnt/term. Setting service=con and executing the user profile finish the construction of a standard namesapce. (Although the output of the "ns" command still looks quite different than usual due to a different order of binds and mounts and the terminal devices being "passed through" a srvfs.) Starting grio lets you work within the new root in multiple windows, and the -c flag sets an unusual color, so you will easily be able to tell that you are within an unusual namespace.

sysupdate
cd /
. /sys/lib/rootstub
cd /sys/src
mk install

This is just the standard update procedure from the 9front fqa. I ran into some complications because I had set my ramfossil size too small to hold all the data generated by a full system rebuild. As a result, I ended up needing to re-snapshot and re-initialize the fossil file system a few times. This was a bit sketchy and if I repeat this process in the future, I will try to plan the sizing and update/compile steps a bit more carefully. Eventually, though, I managed to rebuild everything successfully.

Building a New Kernel and Testing

At this point, I was ready to build a new kernel and test out the updated system. Due to the peculiar multistage boot process mentioned above, I had to make a couple customizations to the ANTS boot scripts. The issue is that because I am booting first from a cd, the plan9.ini stored in the disk 9fat isn't read, so the necessary environment variables aren't set. The solution is to comment out one line in the ANTS frontmods/boot/boot.c to force plan9rc to be run rather than the bootrc script (normally this requires setting bootcmd=plan9rc in plan9.ini) and then setting the environment variables that would normally be in plan9.ini at the start of the plan9rc script:

fatpath=/dev/sdE0/9fat
tgzfs=tools.tgz
privpassword=supersecret
bootargs='local!/dev/sdE0/fossil'
venti='#S/sdE0/arenas tcp!127.1!17034'

With these changes, the ANTS kernel will know what to do even without a plan9.ini to set up the environment variables. I made a final snapshot from the ramfossil to venti, copied the kernel and tools.tgz to 9fat so a kernel without fossil support could access it, and then used the reboot command with the optional path to kernel parameter to start it.

The boot parameters were still starting the old disk fs, but that was fine, I wasn't going to be using it yet. From my laptop, I accessed the ANTS service namespace via:

drawterm -B -a 0.0.0.0 -c tcp!192.168.99.99!17060 -u glenda

This put me into the "rootless" environment with the kernel's compiled in /boot software and the 9fat tools.tgz loaded into the boot ramdisk root. Now I needed to instantiate another ramfossil from the last saved rootscore. A minor issue popped up here, because the script expected the fossil binaries to be in a subdirectory of bin, whereas the ANTS boot environment uses a flat /bin without subdirs. I am about to patch this in the next commit, but I had to do a bit of the ramfossil script manually:

gnot: ramfossil 01a3f549f940d907818cba97b7dd4aff9d1d4b39
400000+0 records in
400000+0 records out
fossil/conf: '/bin/fossil' not a directory
fossil/flfmt: '/bin/fossil' not a directory
fossil/fossil: '/bin/fossil' not a directory
gnot: fossilconf -w /n/ramdisk/fossil /tmp/ramfosconf
gnot: flfmt -y -v 01a3f549f940d907818cba97b7dd4aff9d1d4b39 /n/ramdisk/fossil
17e00c0d0c1a99593df0489bc30ab8bf97d35749
aed6dd569697a419fe4bd706b64adb5ee6f41037
b9155cebce31a3cbe91db46069a7cecac94be7bb
gnot: fossil -f /n/ramdisk/fossil
fsys: dialing venti at tcp!127.1!17034

With this done, I was able to repeat the rerootwin -f ramfossil process, and now I was inside a fully updated userspace environment running on a fully updated ANTS kernel. I am currently testing everything I normally do, and once I am satisfied that everything is in good order, I will use fossil/flfmt -v to re-initialize the disk fossil to a new snapshot from the up-to-date ramdisk fossil, at which point I can rerootwin into a standard disk-based environment.

(Note: all the rootscores in this post have been altered. In general, rootscores should be kept private so someone with network read access to your venti server can't snoop on your data.)

Updating the Disk Fileserver

Everything has been working as expected, now its time to take the final step and update the hard drive to the latest rootscore. The old fossil was started at boot and has been running unused in the background. We need to kill off the services that were using it:

kill cs |rc
kill dns |rc
con /srv/fscons
fsys all halt

Now we exit the fossilcons (ctrl-d then q) and

ps -a |grep fossil

This shows a ton of fossil processes. Fortunately, the disk fossil procs are recognizable by using a different amount of memory due to a larger cache setting, and they are all clustered together. By using the kill command without piping it to rc, we get a list of kill commands. Then I select the section of the list that matches the processes I noted from the ps command, and use the Rio menu "send" command to beam them to the execution point. The rest is pretty simply done from the service namespace:

fossil -f /dev/sdE0/fossil
rerootwin -f fossil
service=con
. $home/lib/profile
ndb/cs
ndb/dns -r
grio -s -c 0x9eeeeeff

And we have arrived at a standard disk-rooted environment, fully updated.

Benefits, Lessons, and Ideas

This was the first time I have done a system update using this method, so it took me awhile and I made a few mistakes along the way, but the final result was as intended: testing a full system update wihout any change to the primary disk filesystem. This was possible because of the following ANTS features;

After testing and finding this kind of flow useful, I think it would be good to support launching ram-rooted fossils directly from the plan9rc script, along with the ability to select a stored rootscore and re-initialize a disk fossil to it. The functionality is mostly identical, differing in choice of target fs to flfmt.

Happy 2017 and may your namespaces flourish in the New Year!