Advanced Namespace Tools blog

07 March 2018

History of ANTS, part 3

This post, 3rd in the series, will complete the story of the work done prior to the 2013 synthesis of disparate components into the first ANTS release.

Writable /proc/pid/ns

I got this idea from a 9fans post by Roman Shaposhnik in 2008 that occurred in a discussion of automounting. He wrote: "I would imagine that making '#p'//ns writable and receptive to messages of exact same format that is being output right now (plus an 'unmount X Y' message) would be a very natural thought in a Plan9 environment. Yet, it wasn't implemented that way which makes me believe that I do (as usual) overlook something obvious here." The followup discussion didn't delve deeply into the idea, but to me it was something that seemed fundamentally correct conceptually. Whenever you have a representation of a structure presented in the form of a file, it seems natural that you should be able to modify that file to cause corresponding modification of the structure it represents.

When I was looking for something else to work on after I recovered from the intensity of creating the pipe-muxing code, I remembered that idea and thought I would try to implement it. It was my first in-kernel hacking project and I'm glad that I was naive enough to just dive right in. The implementation of namespaces in /sys/src/9/port/chan.c is more or less the heart of the system, and deciding to modify it was a bit presumptuous for a relative newcomer. Recognizing my limitations though, I decided to adopt as conservative approach to doing the modification as I could work out - rather than modifying the existing functions in chan.c to take an additional process parameter, I would just create a duplicate set of near-identical functions which would only be invoked via the /proc mechanism. Furthermore, for the implementation of invoking them from within devproc.c, I would copy-paste most of the code used by the standard userspace invocation of the bind/mount syscalls.

In general, copypaste style development is regarded as a poor approach, but I believe it is justified in some contexts. In particular, I desperately wanted to avoid breaking or destabilizing the kernel. The way to ensure that I didn't cause some kind of breakage that wasn't apparent in my tests was simply to avoid modifying the existing working codepaths. By using a copy-and-modify system for implementing my patch, I could guarantee that the new logic I was adding didn't change the behavior of any existing code. The modified code was entirely isolated to the execution path triggered by writing to the /proc/pid/ns files.

"Rootless" boot, plan9rc, and the admin namespace

This is what I think of as the "core idea" of ANTS - restructuring the boot process so that a self-sufficient environment independent of the main root filesystem is created and persists independently of the standard user namespace. It all got started because I was frustrated that I couldn't respecify the ip address of my venti server to fossil at boot time. The parameter telling the fossil fs where to dial was set in plan9.ini, and if that changed for some reason while the system was off, the system would boot, fossil would dial the wrong address and fail to be able to provide a root, and the kernel would panic. The only way to fix this was to boot from the livecd so that I could edit add the right data to plan9.ini.

This struck me as an unnecessary hassle - if you were tcp booting, you could enter the ip of the fileserver to dial at boot time, why wasn't there a similar ability to prompt for a venti address? So, I dove into the boot process and made a simple mod to let you set venti=ask in plan9.ini for a prompt. I also applied the same =ask mechanism to the sysname variable, because many things are run from /cfg/sysname so it seemed like a useful knob for parameterizing system behavior. This got me thinking about the boot process in general, and other issues that were problematic at the time - fossil unreliability (thankfully much improved since then).

It seemed to me unfortunate that if a system had a problem with the root fileserver - either because it was tcp booted and there was a network interruption, or because fossil ran into an issue - that even though the kernel was still working fine, the whole system had to be rebooted. If the kernel was still running properly, why couldn't we start new processes and spawn a new environment? With the standard namespace setup, however, this was impossible - every process on the box was fundamentally dependent on the file descriptor for the root fs, and if data stopped flowing through it, the whole system would freeze up into an unresponsive state.

But Plan 9 has independent per process namespaces, so if there was a way of creating a namespace that was truly independent of the main root, we could use it as an "escape pod" to keep working and rebuild a new environment. To make this work, the boot process had to be changed. Standard Plan 9 from Bell Labs (this was all before 9front was created) had the fileservers and factotum compiled into the kernel, and used C programs to control boot. What if we compiled rc into the kernel, and put boot under the control of rc, with the ability to drop into an interactive shell that wasn't attached to the disk or network root?

Getting this to happen was a lot of challenging and very educational work for me. Just compiling in rc and starting it was relatively simple - but the kernel alone didn't provide enough of an environment for rc to work properly. The namespace of kernel-devices-only was inadequate. What was needed was to create a workable mini-environment so rc could operate normally. I decided to start a ramdisk at boot, and make a skeleton fs with a workable minimal environment within it. This led to an intense period of trial-and-error as I learned exactly what parts of the namespace were essential. Some of my most satisfying moments in software development were when things started to actually work and I was able to boot a kernel and drop into an interactive rc session with a minimal set of standard tools and do things with no standard root fs at all.

I got excited by the possibilities, and realized that I could make the minimal boot namespace truly self-sufficient if I was able to get rio working in that environment, and allow the user to access the independent namespace via cpu. Once that was worked out, I realized that it would be very powerful to be able to create a subrio that would be inside the standard user namespace - this led to the creation of the "rerootwin" script, a tool I find very useful but which hasn't caught on with other users as much as I expected. Plan 9 has a nice mechanism to enter a new namespace, auth/newns, which uses a namespace file to rebuild the namespace entirely. Unfortunately, trying to use newns with the standard namespace file when you are cpu in remote doesn't work right - when you are cpu/drawterm in, your input/output is coming from devices bound from /mnt/term, and not only does standard ns file not know to bind from there, but /mnt/term won't even exist. The solution is one that I still think is quite clever - what is needed is to use srvfs of /mnt/term to create a mountable connection to the originating fs of the machine being cpu'd from, and then have a customized namespace file that mounts the main root from /srv and also mounts the srvfs of the /mnt/term and binds the devices in place. This is the "rerootwin" script which I find to be an essential component of working with multiple independent namespaces.

Release, and development hiatus

I announced both the writable ns mod and the rootless boot system to 9fans at the start of January 2010, and received little response other than skepticism that these mechanisms were useful. In the years since, I have learned that even though feedback from others can be invaluable, it is generally a mistake to be emotionally dependent on the reactions of other people to feel fulfilled or satisfied by one's own work. Due to the lack of demand for additional work on my projects, as well as other factors in life, I didn't really continue to build on what I had done. I played around with a few projects and ideas for the rest of 2010, but released nothing of note. Personal factors also started to supervene: I had remained close to family in adulthood, and my father's worsening health began to sap my energy and attention for other things. He died in early 2012, and the rest of that year I was focused on helping my mother in the aftermath, as well as moving through my own grieving process. The story of ANTS resumes in 2013.