starting from scratch bugs
Or everything I didn’t know about unix. The OpenBSD source tree has lots of example code for solving any number of problems, but I like to do things my own way. Occasionally this means something gets overlooked. A few examples. Previous thoughts on rewrites and reuse: out with the old, in with the less and hoarding and reuse.
Quite a few programs contain variations of the same parse.y file. This file originated in pfctl (649 revisions!) but spread to bgpd, relayd, and smtpd, among many other places. The grammar changes every time of course, but the lexer that drives it stays mostly the same. parse.y also implements a few other grammar niceties, like including other files and setting macros (variables). This gives a consistent feel to each config file.
When developing doas, I knew I wanted the config grammar to feel familiar, but I also felt that include files were a bit overkill when the most common doas.conf would be exactly one line long, and a dozen lines would qualify as exceptionally long. And similarly for other features which were likely to be more complicated than useful. I didn’t want to dig through and extract the good parts, nor keep it all (hoarding!), so I started from scratch.
Writing a simple lexer from scratch isn’t hard, but remembering all the little things that need polish is. The default yacc error message of “you suck” isn’t very helpful, so we need to keep track of line numbers, etc. And then you probably want to allow spaces in string arguments, so you need support for quotes. A little of this, a little of that, but all discarded by rewriting from scratch. Fortunately, zhuk was able to add back the essential features without much difficulty.
Much like writing a parser with yacc, writing a daemon starts simple enough, but requires some care. Many OpenBSD daemons are influenced by the design of bgpd by henning, borrowing bits of code that solves these problems. I’m a snowflake, so rebound also started from scratch. It’s just a dumb proxy, so no need for the privsep framework (imsg, etc.) used elsewhere, or even libevent since the internal state machine has so few states it’s easy enough to drive by hand.
One little bug rebound had was that log messages had incorrect timestamps, off by the local machine’s timezone offset. This is strange; it’s just using syslog. The syslog function prepends the timestamp to the message before sending it to syslogd, adjusted for local time. Except, oops, rebound doesn’t log anything until after it’s done a chroot into an empty directory. There are no timezone files in here to read, so the timestamp remains UTC. The fix for this is to stick a tzset call somewhere, but this is hardly obvious. The syslog documentation doesn’t mention timestamps. The tzset documentation is focused on the TZ
environment variable (which isn’t set) and not on the magic side effect of loading the tzinfo database. A search for tzset changes suggests this is a frequent first timer bug. Of course, those developers smart enough to copy henning’s log.c from bgpd pick up this fix for free. (Subsequently, syslog was fixed to not bother with the timestamp, since syslogd is capable of inserting them on its own.)
Alternatively, there’s also the daemontools school of thought, but we’re here, not there.