a prog by any other name

What is a name, really?

Sometimes two similar programs are really the same program with two names. For example, grep and egrep are two commands that perform very similar functions and are therefore implemented as a single program. Running ls -i and observing the inode number of each file will reveal that there is only one file. Calling the program egrep is a shorthand for -E and does the same thing.

names

In fact, every program has three names: its name in the filesystem, the name it has been invoked with, and whatever it believes its own name to be. Under normal circumstances the first two will be the same, but it is possible to call execve with a path and argv[0] not in alignment. Sometimes by accident, as in mv.

execl("/bin/rm", "cp", "-rf", "--", from, NULL);

We’re calling rm, but we’re calling it cp. Fortunately, rm is not one of the programs that cares what we call it.

A second program that may change behavior based on name is test, which is more frequently invoked with the name [ in shell scripts. But what happens when invoked as bin/[? Prior to 5.7, OpenBSD looked at the entirety of argv, not just the basename, and behaved like test. So there’s another set of choices for each name, the full path and the basename. For convenience, most programs choose to deal with just the basename (excepting bugs).

It’s even possible on some systems for argv[0] to be NULL.

progname

The BSD ancients provided a simple variable to simplify things.

extern char *__progname;

Before main is called, progname will be set to the basename of argv. Most BSD programs will then use this variable for their sense of their own name. Returning to our example of grep, if we run the program with no arguments, we want usage to print the correct name:

carbolite:~> grep
usage: grep [-abcEFGHhIiLlnoqRsUVvwxZ] [-A num] [-B num] [-C[num]]
        [-e pattern] [-f file] [--binary-files=value] [--context[=num]]
        [--line-buffered] [pattern] [file ...]
carbolite:~> egrep
usage: egrep [-abcEFGHhIiLlnoqRsUVvwxZ] [-A num] [-B num] [-C[num]]
        [-e pattern] [-f file] [--binary-files=value] [--context[=num]]
        [--line-buffered] [pattern] [file ...]

Internally to libc, progname is also used by common error functions such as err and errx.

At some point, an accessor function getprogname was added to NetBSD, then FreeBD. It made its way into OpenBSD as well in order to prevent a progname gap from developing. There is also a setter function, setprogname, which is not to be confused with the slightly different setproctitle. There’s no getter for proctitle, however, unless you count ps.

silly

Using progname occasionally gives some funny results. Despite not having any alternative modes, signify will print its usage according to the invoked name.

carbolite:/tmp> ln -s /usr/bin/signify gorilla
carbolite:/tmp> ./gorilla
usage:  gorilla -C [-q] -p pubkey -x sigfile [file ...]
        gorilla -G [-n] [-c comment] -p pubkey -s seckey
        gorilla -S [-e] [-x sigfile] -s seckey -m message
        gorilla -V [-eq] [-x sigfile] -p pubkey -m message

Is this helpful? I doubt it. But using progname in usage was the prevailing style, so I copied it. Some newer programs I’ve written just use hard coded names.

carbolite:/tmp> ln -s /usr/bin/doas banana
carbolite:/tmp> ./banana                                                                                 
usage: doas [-ns] [-a style] [-C config] [-u user] command [args]

Which of these alternatives makes the most sense? If the user gives a program a silly name, should the command think of itself that way? Note that this is mostly a concern for command line utilities only.

carbolite:/tmp> ln -s /usr/X11R6/bin/xcalc abacus
carbolite:/tmp> ./abacus

Still creates a window titled “Calculator”. On the other hand

carbolite:/tmp> xcalc 99 beads    
xcalc: unknown options: 99 beads
Usage:  xcalc [-rpn] [-stipple]
carbolite:/tmp> ./abacus 99 beads 
./abacus: unknown options: 99 beads
Usage:  ./abacus [-rpn] [-stipple]

xcalc doesn’t originate from BSD, so it doesn’t use progname. Instead it echoes back the entirety of argv here.

trouble

Now, where were we? Should a program that is otherwise name agnostic print its original name or what the user picks for it?

“A foolish consistency is the hobgoblin of little minds.” Or perhaps in this case, a little consistency is the hobgoblin of foolish minds? Do we care to appease foolish users who rename utilities unnecessarily?

One can argue using progname increases the reusable modularity of the code. This is an excellent case for the use of progname in libc functions, like those noted above. A scan of the src tree however reveals that most other occurences are restricted to usage functions. But how frequently can the usage from one utility be copied to another without modification?

Sometimes using progname can even complicate the code. The signify usage could just be a fixed string, but for the attempt at flexibility. Depending on compilation, though, it may print the name more than once, but format string arguments need to come all at the end.

        fprintf(stderr, "usage:"
#ifndef VERIFYONLY
            "\t%1$s -C [-q] -p pubkey -x sigfile [file ...]\n"
            "\t%1$s -G [-n] [-c comment] -p pubkey -s seckey\n"
            "\t%1$s -S [-e] [-x sigfile] -s seckey -m message\n"
#endif
            "\t%1$s -V [-eq] [-x sigfile] -p pubkey -m message\n",
            __progname);

The time I spent looking up positional arguments to make this work was probably better spent doing something else.

The getprogname man page notes another danger. This is a user supplied string. It cannot be trusted in a security context. Most of the time this is harmless, but sometimes error messages are constructed with multiple iterations of snprintf. One wrong format string and things go bad. This may be a negligible risk, but we avoid mistakes by avoiding things that can become mistakes.

On that note, another possible bug is to realize that syslog by default uses progname. A user may be able to evade log monitoring by invoking doas with a different name. (Just fixed.)

My current thoughts are that progname is a useful abstraction when flexibility is desired, but that doesn’t mean it should be used all the time. Like a lot of best practice, it started slowly devolving towards cargo cult programming.

Posted 28 Apr 2016 12:26 by tedu Updated: 29 Apr 2016 02:22
Tagged: c openbsd programming