features are faults redux
Last week I gave a talk for the security class at Notre Dame based on features are faults but with some various commentary added. It was an exciting trip, with the opportunity to meet and talk with the computer vision group as well. Some other highlights include the Indiana skillet I had for breakfast, which came with pickles and was amazing, and explaining the many wonders of cvs to the Linux users group over lunch. After that came the talk, which went a little something like this.
I’m a developer with the OpenBSD project. If you’ve never heard of OpenBSD, it’s a free unix like system. So kind of like Linux, but better in every way. Totally unbiased opinion.
I got started with OpenBSD back about the same time I started college, although I had a slightly different perspective then. I was using OpenBSD because it included so many security features, therefore it must be the most secure system, right? For example, at some point I acquired a second computer. What’s the first thing anybody does when they get a second computer? That’s right, set up a kerberos domain. The idea that more is better was everywhere. This was also around the time that ipsec was getting its final touches, and everybody knew ipsec was going to be the most secure protocol ever because it had more options than any other secure transport. We’ll revisit this in a bit.
There’s been a partial attitude adjustment since then, with more people recognizing that layering complexity doesn’t result in more security. It’s not an additive process. There’s a whole talk there, about the perfect security that people can’t or won’t use. OpenBSD has definitely switched directions, including less code, not more. All the kerberos code was deleted a few years ago.
Through OpenBSD I’m also to blame for LibreSSL, a fork of the OpenSSL library. In case you’re confused, the OpenBSD project produces a number of other open projects, but we’re not the exclusive user of the name. OpenSSH, yes. OpenVMS, no. Nothing to do with that. I’ll come back to SSL later, but the punch line is bugs are temporary, workarounds are forever. That’s another talk for another time.
features are faults
Getting closer to the subject for today, my thesis is that features are faults. What I mean by this is that a feature represents a weak point where something can go wrong. Where do you see lots of earthquakes? Near the faults.
1 bug per 100 lines of code 100 million lines of code
Let’s assume about one bug per 100 lines of code. That’s probably on the low end. Now say your operating system has 100 million lines of code. If I’ve done the math correctly, that’s literally a million bugs. So that’s one reason to avoid adding features. But that’s a solveable problem. If we pick the right language and the right compiler and the right tooling and with enough eyeballs and effort, we can fix all the bugs. We know how to build mostly correct software, we just don’t care.
I really want to talk about the interaction between features. Cases where in hindsight it becomes obvious that there was a bug, but it’s not always clear who or what to blame. This is a series of examples I’ve collected from the past few years. Every time something happens, the general reaction is “wow, that’s crazy” but it’s treated like an isolated incident. I figure maybe if I pummel you with enough examples you’ll see a pattern emerging.
Hypothesis: The minimum greatest interference set is two features.
As we add features to software, increasing its complexity, new unexpected behaviors start to emerge. What are the bounds? How many features can you add before craziness is inevitable? We can make some guesses. Less than a thousand for sure. Probably less than a hundred? Ten maybe? I’ll argue the answer is quite possibly two.
Corollary: No program has two features.
Interesting corollary is that it’s impossible to have a program with exactly two features. Any program with two features has at least a third, but you don’t know what it is.
My first example is a bug in the NetBSD ftp client. We had one feature, we added a second feature, and just like that we got a third misfeature.
cd coolfiles ls get README |more
Our story begins long ago. The origins of this bug are probably older than I am. In the dark times before the web, FTP sites used to be a pretty popular way of publishing files. You run an ftp client, connect to a remote site, and then you can browse the remote server somewhat like a local filesytem. List files, change directories, get files. Typically there would be a README file telling you what’s what, but you don’t need to download a copy to keep. Instead we can pipe the output to a program like more. Right there in the ftp client. No need to disconnect.
Fast forward a few decades, and http is the new protocol of choice. http is a much less interactive protocol, but the ftp client has some handy features for batch downloads like progress bars, etc. So let’s add http support to ftp. This works pretty well. Lots of code reused.
GET http://somefile Status: 302 Found Location: http://|reboot GET http://|reboot Status: 200 OK
http has one quirk however that ftp doesn’t have. Redirects. The server can redirect the client to a different file. So now you’re thinking, what happens if I download http://somefile and the server sends back 302 http://|reboot. ftp reconnects to the server, gets the 200, starts downloading and saves it to a file called |reboot.
Except it doesn’t. The function that saves files looks at the first character of the name and if it’s a pipe, runs that command instead. And now you just rebooted your computer. Or worse.
It’s pretty obvious this is not the desired behavior, but where exactly did things go wrong? Arguably, all the pieces were working according to spec. In order to see this bug coming, you needed to know how the save function worked, you needed to know about redirects, and you needed to put all the implications together.
Imagine one day a new required class of input validation was discovered. “In what?” “Exactly.”
Some time ago, the security researcher Dan Kaminsky tweeted a hypothetical conversation which I really liked. It perfectly captures the nature of the problem, and our incapacity to deal with it. He’s exaggerating a bit, but not far from the mark.
That was after the bash bug now known as shellshock. The problem there is that bash has this underdocumented feature that lets you store functions in environment variables. This obviously requires feeding environment variables to the shell parser, and since the environment comes from just about anywhere, this seems kind of dangerous, but maybe it’ll be ok. The problem they had here is that the bash parser, when it reaches the end of a function, switches back into interpreter mode and starts executing commands. This is definitely not what you want, considering scripts run by web servers have all sorts of user supplied variables. It’s easy to say in hindsight that the parser should have stopped, but flipping back to execution mode is actually pretty natural for a shell parser. The shell generally interprets things as it sees them.
Feature predates web by several years CGI assumes nobody looks at environment Parser assumes the thing after a function is code
Lots of assumptions. Ultimately it was decided that bash was to blame, parser bug, which is pretty obvious, but if we think about this in simple terms of trusting user input, should CGI be putting user input in environment variables? Why not blame the web server and say it should only set good environment variables?
The standard advice for secure development is that you’re not supposed to trust user input. Everybody knows this, but what does this mean? And how do we apply it? Sometimes it gets a little complicated.
Let’s say you’re making a social network. You want your users to upload profile pictures. You write some wonky boilerplate code that requires calling setjmp because libjpeg is strange like that. And then you add some code to handle png files, too. But then a user tries to upload a gif. OK, let’s add that too. Basically, for every factor of ten users you add, you’ll get somebody who wants to use a different file format. You may ask why you need to be the one who handles tiff conversion. Shouldn’t the person with the tiff selfie convert it before uploading? Turns out there’s very little overlap between people who would choose to upload tiff files and people who can convert them to another format.
The situation is getting a little out of hand with all these different formats you need to support. Wouldn’t it be great if some library could just handle it all for you? Practically magic. Actually, it’d be magick with a k. ImageMagick. Now you’ve also got support for JPEG-2000 and PhotoCD and 100 other things you’ve never even heard of.
push graphic-context viewbox 0 0 640 480 image over 0,0 0,0 'label:@/etc/passwd' pop graphic-context
You also have support for MSL. The magick scripting language. This is a bit of an unexpected surprise. MSL lets you do things like read files and draw their contents into an image. This is actually very convenient if you’re doing a lot of batch image manipulation. However, when users upload files like profile.msl, this is much less convenient. To make things even more fun, where my sample just reads a local file, that can be a URL. So now an attacker can probe your internal network and interact with all your passwordless microservices.
untrusted input | trusted code input | code | code | code
To return to the question of trusting user input, what happened here? Was it an error for imagemagick to trust its input? Or was it an error to send user input to imagemagick? We could write a filter so that only recognized formats are fed to imagemagick, but that’s back to where we started. We’re using this library because we don’t know, or want to know, about every possible image format that exists. Technically, this is all there in the manual and you’re supposed to configure imagemagick to disable scripting support.
More generally, advice about trusting input assumes there’s a single | barrier between the good and the bad. But more often than not, the input gets passed to several layers of code, each with different assumptions about what’s valid. Why don’t we have the first layer do the validation? Sometimes this can be tricky, when one program passes its inputs to another.
Programs executing programs is a very common pattern. You’re probably familiar with the patch program. Applies the output of the diff program to a file. Today, the only diff format in use is the “unified” diff. Before that, “context” diffs were somewhat popular, and before that diffs took the form of ed scripts. Yes, ed the editor. A diff would be the exact sequence of keystrokes needed to edit a file into the output.
To apply an ed diff, patch would do the obvious thing. Execute ed. Just one problem. ed itself is capable of executing arbitrary commands via !. We don’t want patches running arbitrary commands. Instead, patch has a mini ed parser and it tries to filter out bad commands. But this parser does not exactly match the ed parser, and it can become desynchronized, and then it accidentally lets some ! commands through to ed. And oops.
Everything execs everything else. This is really basic unix philosophy, no? A collection of simple tools. These simple tools may not individually implement many features, but they certainly have many features. Every program has all of the features of all of the programs it can run.
We can consider the example of pdflatex. A tex file also allows the execution of external commands. Fortunately, the pdflatex developers are aware of this and ship a whitelist of only safe programs. “They also have no features to invoke arbitrary other programs.” Unfortunately, this is a lie. One of the whitelisted programs can also run external programs.
OK, so having established that it may not be safe to view a tex file with pdflatex, what can we use? Maybe plain less? Well, it turns out less also has a feature where it will run a helper program for various file formats. If you want to view README.gz, less can run the file through zcat and display the output instead of the raw file. This depends on local configuration, but can include some really weird stuff like cpio, which is a tar like also ran archive format that died about 20 years ago. Who uses less to view cpio archives? Who uploads tiff selfies?
So the program you’re using to view a file because you don’t trust the normal viewer may actually invisibly run the normal viewer anyway. Everything runs everything.
Another bit of unix trivia. I apologize for the craziness of what I’m about to say. When can one process signal another? The short answer is if the user ID matches, but the somewhat longer more complete answer involves process groups. Or sessions. A process group and a session are kind of similar, except for the part about the controlling terminal. On some older unix systems, if you don’t have a controlling terminal and you open the wrong file by accident, it becomes your controlling terminal. That’s bad. Hence, the O_NOCTTY flag to open().
At this point you all think I’m speaking gibberish. And that’s kind of my point. You’re thinking there’s no way you need to know this. And you’d be right, until you’re wrong. The question isn’t when you’ll need to know this; it’s when you’ll realize you need to know this.
There’s a current trend towards using containers more. Now what happens if a process in one container has a controlling terminal in another container? Is that a shouldn’t happen or a can’t happen? I don’t actually know, but you should find a Linux expert and ask them.
All this stuff, all these features, are still there in the murky depths. But ok, I am exaggerating. Some people really are unlucky enough to never develop any unix software. Let’s leave the depressing past behind and look to the bright future. The world wide web.
There’s a lot that can be said about browser security. How many features does a browser have? But let’s talk about server side programming first. There’s three broad categories that web security vulnerabilities might fall into.
First, there’s regular stuff like sql injection. These attacks remain viable because people are people, but you can understand them from first principles. Don’t trust user input. If you transport a competent developer from 1990 forward to today, they would get this stuff right.
Second, there’s more complicated stuff which an experienced server developer, but web novice, will likely overlook. Cross site request forgery and clickjacking and various other attacks where the browser sends seemingly legit requests to the server. This doesn’t really fit in our usual security model. Our 1990s dev will get all of this wrong, because it’s the kind of stuff that results from surprising interactions of features, and you’ll never think of it until somebody explains it to you. So you should use a framework which solves all these problems you never thought about.
Third, there’s all the bugs you inherit from using a framework because somebody thought it would be clever to parse and decode all the inputs. You may or may not have any idea this is happening.
Form: user: uuu display name: ddd Request: user=uuu&funname=ddd $user = "uuu" $funname = "ddd"
Consider the common technique of turning form values into local program values. This is really convenient and lots of frameworks offer similar functionality.
Request: user=uuu&funname=ddd&admin=1 $user = "uuu" $funname = "ddd" $admin = 1
So now an attacker does the clever thing of setting some extra variables in their request. And what happens? The admin variable is dutifully set to 1, even for a user who is not an admin. PHP somewhat infamously invented this technique with register_globals, but rails update_attributes makes the same bug possible. You want a framework to save you from browser features, but now you have framework features. Who will save you from that?
Browsers, as you know, display images from possibly malicious servers. So from the beginning that’s a threat they’ve had to deal with. 20 years of battle hardening later and we can say that things mostly secure. I say mostly secure because there are lots of bugs which were considered low priority, such as reading uninitialized data. I send you a corrupted image that’s missing some pixel data. This usually results in some garbage being displayed on the screen. Like this.
That’s actually my /etc/passwd file interpreted as pixel data. So who cares? Surely if someone wanted to grief me, there are far more horrible images to display than random chunks of memory. You can see why nobody really paid much attention to such bugs.
Enter the canvas element. The canvas element allows a website to draw something on screen and read back the pixel data. The canvas element of course has some basic protections like same origin policy, so you can’t steal image data from other sites. But the ability to read back data from this site is based on the assumption that if I sent it to you, I already know what it is. In other words, assume no data leaks in the image libraries. So in this case, the bug is pretty obvious, but there’s a long tail of these disregarded bugs because nobody has yet connected the bug to the internet. But it’s only a matter of time. Remember the million bugs I mentioned earlier? Many of them aren’t security relevant because they aren’t internet exposed. Yet.
Consider a library like gstreamer, which is used to play various media files. Maybe your browser uses gstreamer, maybe your cheap imitation of file explorer uses gstreamer. It’s useful code, so lots of things probably use it. gstreamer actually has a few components like gstreamer-plugins-good, gstreamer-plugins-bad, and gstreamer-plugins-ugly. (Pause.) I don’t actually want to criticize the gstreamer devs too hard. At least they’re honest. Nevertheless, some of this sounds like stuff I definitely don’t want connected to the internet. Now it’s been standard security practice for over a decade to not open attachments. Everybody knows this, right? You never open attachments. But what about tweets? Somebody a little cooler than you tweets a link. You’re going to click it right?
So now the question becomes, if I click a link on twitter, is it possible that that action will somehow involve one of the ugly plugins? If you ask around, various people will assure you that most programs will not load every plugin. Not actually reassuring.
I’m going to divert here and relate a funny unix story. Not my story. Vim gets “lililililililill” inserted in current file, and beeps a lot. Everytime somebody started vim in tmux, vim would squirt some extra characters in the file. And it would beep a lot. The beeping is understandable, that’s what vim does, but where did the letters come from? This was not a bug, but a misconfiguration. An option was being set, but not the one you’d expect, from loading a file, but not the one you’d expect, from a directory, but not the one you’d expect. No malicious actor. Just one possible result from the myriad ways in which two programs can be configured. vim and tmux. A text editor and a virtual typewriter. And this was a mystery worthy of Sherlock Holmes. My point here is that there’s no way anyone can speak with any authority about which media plugins will or will not be activated.
At this point I tried to explain what’s going on with gstreamer and Nintendo Sound Files, but it’s better you just read the original posts. Compromising a Linux desktop using... 6502 processor opcodes on the NES?! and the followup Risky design decisions in Google Chrome and Fedora desktop enable drive-by downloads. The details of exploiting these particular vulnerability are somewhat complex, but the punchline is that even if you never click on or open the downloaded file, merely opening the directory in a file manager is sufficient to allow exploitation. It’s only a matter of when, not if, the user will take some trivial action that activates the feature an attacker is seeking to exploit.
This seems like a good point to circle back around to something I mentioned much earlier, ipsec and other networking protocols. Is your ipsec config secure? Who knows, there’s so many options it’s impossible to tell. One problem for example is it’s possible to setup a tunnel with no encryption. I believe the idea was the internet would eventually consist of all these authenticated overlays, but only the very inner most tunnel needed to encrypt the data. So they added the opt out feature, which I am certain has been engaged unintentionally far more often than intentionally.
Another concept, common to SSL, SSH, and ipsec, is that of cipher agility. Nobody knows if DES or RC4 is better, so we’ll let the endpoints negotiate. Then we can add new ciphers as time goes by. In theory, this is a feature that lets endpoints choose the best cipher. In practice, this is the feature that lets attackers choose the worst cipher via downgrade attacks. To fix this, we add another feature to perform secure handshakes. But older endpoints won’t support it, so we may have to try again with the old handshake, thus recreating the downgrade attack. So the next thing to try is to try the good handshake first, and if that succeeds, remember it for next time, so every endpoint gradually accumulates a list of all the other upgraded endpoints. We just keep adding more layers hoping that eventually there will be enough security piled on top to hide the underlying insecurity.
What do we do about this? That’s a tough question. It’s much easier to poke fun at all the people who got things wrong. But we can try. My attitudes are shaped by experiences with the OpenBSD project, and I think we are doing a decent job of containing the complexity. Keep paring away at dependencies and reducing interactions. As a developer, saying “no” to all feature requests is actually very productive. It’s so much faster than implementing the feature. Sometimes users complain, but I’ve often received later feedback from users that they’d come to appreciate the simplicity.
As a user, I’ve got my browser so locked down I can’t even play quake. It’s a tough life.
There was a question about which of these vulnerabilities were found by researchers, as opposed to troublemakers. The answer was most, if not all of them, but it made me realize one additional point I hadn’t mentioned. Unlike the prototypical buffer overflow vulnerability, exploiting features is very reliable. Exploiting something like shellshock or imagetragick requires no customized assembly and is independent of CPU, OS, version, stack alignment, malloc implementation, etc. Within about 24 hours of the intial release of shellshock, I had logs of people trying to exploit it. So unless you’re on about a 12 hour patch cycle, you’re going to have a bad time.