flak rss random

compiling an openbsd kernel 50% faster

This is approximately as wise as taking off from Mars in a ragtop rocket, but don’t worry, the math all checks out.

My theory is that compiling less code will be faster than compiling more code, but first we must find the code so we know not to compile it.

code

The OpenBSD kernel source tree has a total of 6.93M lines and 383.41M bytes of code. That’s counting c, h, and s files, but excluding Makefiles, some minor awk scripts, and the like. We can break that down by directory; for example, the virtual memory directory uvm is 29.39k lines and 816.41k bytes. The arch subtree which contains all the machine dependent CPU and platform support is 729.42k lines and 20.46M bytes.

By far, the largest directory is dev, housing most device drivers, at 5.69M lines and 348.84M bytes. Specifically, dev/pci/drm/amd contains 3.33M lines and 273.93M bytes. 71% of the OpenBSD kernel by size is support for modern (and semi-modern) Radeon graphics. (Older models are supported by the relatively featherweight 201.19k lines and 6.71M bytes drm/radeon driver.)

A lot of this code is just header files filled with enums like AZALIA_F0_CODEC_INPUT_PIN_PARAMETER_AUDIO_WIDGET_CAPABILITIES_TYPE_VOLUME_KNOB_RESERVED which is defined four times in four headers (it’s always 6, if you’re curious), but never used in a c file. So the impact on compile times is probably not as bad as it first appears, but it’s still not great. We’re still going to be pouring hundreds of megabytes of text through the lexer, thought not all of it will result in expensive codegen.

start

Here are some numbers to start. Compiling a current kernel on my laptop takes almost five minutes to create a 22 megabyte kernel.

4m40.69s real    14m01.75s user     2m50.25s system
21.9M   bsd

Let’s cut those numbers down by excluding all the amdgpu code. The simple approach would be to edit the Makefile to remove all its objects, but someone on the internet told me Makefiles are scary because they contain tabs, and my god, the horror. The correct approach would be to run config after editing the kernel config, but the plot restricts us from doing that.

peak dumb

If we need to compile all the same object files, we can at least put the least amount of code into them. Compile one empty file, and use the resulting object file for everything we want to replace.

echo > dummy.c
cc -c dummy.c

files=`grep -o "drm/amd.*\\.c" Makefile`
for f in $files ; do
        c=`basename $f`
        cp dummy.o ${c%c}o
        echo > ${c%c}d
done

I initially compiled a new blank file for each o file, but that’s really slow. It would be even faster to use ln, but I’m worried about overflowing the link count.

Now we just run make to build the rest of the kernel as normal. Just. This won’t result in a functional kernel yet. We’ll discover at the end that some symbols required to link are missing, but they are very few. We need to create one stub c file with the missing references.

cat > amdgpu_kms.c << __EOF
#include <sys/param.h>
#include <sys/device.h>
int
amdgpu_probe(struct device *parent, void *match, void *aux)
{
        return 0;
}
const struct cfattach amdgpu_ca = {
        sizeof (struct device), amdgpu_probe,
};

struct cfdriver amdgpu_cd = {
        NULL, "amdgpu", DV_DULL
};
__EOF

cc -c amdgpu_kms.c

That’s all that the rest of the kernel needs to know about the amdgpu driver. And now we have a complete set of objects and symbols that links. Once the set of dummy objects has been generated in about two seconds, I’m down to three minutes to compile a 16 megabyte kernel.

3m12.37s real     9m39.34s user     1m54.67s system
16.6M   bsd

That’s a 31% reduction in build time. Good enough to compile nearly 50% more kernels per hour. And a similar improvement in size. The gains aren’t directly proportional to the driver size because despite its voluminous text, it doesn’t all turn into machine code.

For reference, a build with amdgpu configured out of the kernel, the proper way, has identical numbers.

3m12.38s real     9m39.18s user     1m54.71s system
16.6M   bsd

limit break

I started on this nightmare quest after reading a mailing list post regarding memory use while relinking the kernel during boot. I generally agree with the thread conclusion that if you want to relink, you need sufficient memory, and if you don’t want sufficient memory, you need to not relink. But today the bad ideas circus is in town.

Less object code should mean less memory required to link. The original size of all object files and the data limit required to link the kernel. Below this limit, linking fails with out of memory errors.

385M    total
ulimit -d 165000

Our modified kernel build with fake objects has much less code and requires less memory to link.

209M    total
ulimit -d 110000

With some more effort, we could create additional zombie object files for other large drivers, although by now the keen observer has noticed that as big as the kernels are, they are substantially smaller than the set of object files going into them. The object files are so large because the kernel is initially compiled with debug symbols. They are stripped out after linking, but we can strip them before hand. After running strip -g, the required memory is substantially reduced.

38.1M   total
ulimit -d 55000

I literally only know how to debug with printf, but other developers may not approve.

conk

I was curious what would be involved in editing a post-compilation kernel to remove large drivers. And now I will be patenting this technique to prevent anyone else from doing something so idiotic. Should you find yourself stranded on a remote planet and your last means to reestablish communication with earth is relinking an openbsd kernel in only 32 megabytes, we can work out a license in exchange for a cut of your book deal.

Posted 02 May 2022 14:38 by tedu Updated: 02 May 2022 14:38
Tagged: openbsd