reproducible builds are a waste of time
Sort of. Maybe. It depends.
Yesterday I read an article on Motherboard about Debian’s plan to shut down 83% of the CIA with reproducible builds. Ostensibly this defends against an attack where the compiler is modified to insert backdoors in the packages it builds. Of course, the defense only works if only some of the compilers are backdoored. The article then goes off on a bit of a tangent about self propagating compiler backdoors, which may be theoretically possible, but also terribly, unworkably fragile.
I think the idea is that if I’m worried about the CIA tampering with Debian, I can rebuild everything myself from source. Because there’s no way the CIA would be able to insert a trojan in the source package. Then I check if what I’ve built matches what they built. If I were willing to do all that, I’m not sure why I need to check that the output is the same. I would always build from scratch, and ignore upstream entirely. I can do this today. I don’t actually need the builds to match to feel confident that my build is clean. Perhaps the idea is that a team of incorruptible volunteers will be building and checking for me, much like millions of eyeballs are carefully reviewing the source to all the software I run.
The original source document doesn’t actually mention deployment of the whacked SDK, just research into its development. Perhaps they use it, perhaps they rejected it as being too difficult and risky. Tricking a developer into using a whacked toolchain leaves detectable traces and it’s somewhat difficult to deny as an accident. If we assume that the CIA has access to developer’s machines, why not assume they have access to the bug database as well and are mining it for preexisting vulnerabilities to exploit? Easy, safe, deniable.
Or stick with the existing attack. Manually implant the backdoor in a selected binary, then have the (presumably compromised as well) download server offer it up only to selected, targeted users. The world at large will never see the difference. Geoexploitation is hardly new. Now we’re again back to everybody building everything from source themselves.
Or submit “harmless” refactoring patches that exploit minifier, sorry, compiler bugs to introduce new vulnerabilities. The compiler backdoors are already there, lying in wait until the proper source sequence unlocks them.
Meanwhile, software remains riddled with holes. How assuring is it, really, to know that the software you’re running has only all the holes it’s “supposed” to have? Now you can sit back and relax, knowing the bash you built on your server has the exact same shellshock hole that the bash built on the official build server has. Feel the assuring warmth wash over you. Wait, no, no, stop. Do not place your nose directly over the steaming pile. Waft, waft!
There is an argument to be made that is naive to believe the CIA has researched this capability without deploying it. There is a better argument to be made that it is naive to believe that backdoored Trojan inserting compilers are the reason software is not trustworthy.
Of course, there are uses for reproducible builds besides shutting down the CIA. When migrating from an old build server to a new one, it definitely helps to know that the same product is being built. Builds that can’t be reproduced are more likely to accidentally incorporate hidden dependencies. A reproducible build is, by necessity, a deterministic build. Having struggled with “random” build failures at various times, I’m all in favor of more deterministic builds.
This post is excessively pessimistic. I’m not opposed to reproducible builds, just the hoopla. It’s rational for evil adversaries to research attacks in preparation for the day when end points have impenetrable security. It’s less rational to expend considerable effort defending against those attacks while end points are still far from secure. It’s like replacing a two inch deadbolt with a three inch deadbolt when the door is hanging in a rotted out frame. They’re not breaking in by sawing through the deadbolt. Our door frame isn’t that secure yet.
The appeal of reproducible builds is understandable. Backdoor implanting compilers is a sexy attack. It’s fun to talk about. It’s like an adversary with a magic death ray. So you sew reflective micro mirror sequins into all your clothes. Take that, evil doer! Their death ray will be reflected right back at them. Never mind your new outfit won’t do much to protect against getting stabbed with a knife. That’s boring. Let’s talk some more about the death ray.
The original post remains above so as to not obsolete anybody’s complaints. It was meant to be more polemic than persuasive, on that front I appear to have succeeded. I wanted to complain about a ridiculous article, so I complained. Ridiculously. Probably should have gone with the rants tag instead of thoughts though. (Apparently writing is only cathartic when nobody reads it. Heh.)
lunar@debian sent me a very nice email explaining, among other things, that I mischaracterized their motives. Yes. The Debian wiki is considerably more level headed. Don’t believe all the strawmen I wrote above.
The kind sir kragen, while offering up quite the beat down rebuttal also organized my post into an ordered list of bullet points.
There are broadly, two categories of exploit. Implanted backdoors and accidental vulnerabilities. The good news is every one agrees that one category dominates the other. The bad news is we don’t agree which one it is.
There is a good argument that implanted backdoors are the bigger concern. My attack surface is all of the programs I run, plus all of the programs run by the developers of that software, possibly recursively. If an adversary implants a backdoor in something, it’s very likely to be effective since it was designed that way. It won’t depend on a misconfiguration on my part. Are such backdoors likely to be detected, and in what timeframe? In contrast to my usual pessimism, I’m probably unreasonably optimistic here.
My argument is that the latent vulns in software represent the larger problem. It doesn’t matter that some of them only apply to some users or some configurations, because they are so legion in number there will always be another one when the attacker needs it. The Chrome browser goes to some effort to sandbox and otherwise isolate itself from itself. Nevertheless, attackers consistently find RCE exploits by stringing together 7 or 17 or however many smaller vulns.
Or, in short, when somebody sends me a link and I’m not sure I want to click on it, the knowledge that my browser was not built in a reproducible manner is not what gives me pause. Admittedly unfair comparison, since a backdoored browser wouldn’t wait until I visit a naughty link to work its magic.
Perhaps it doesn’t matter. We’re all screwed no matter what. :(
Although it’s in the context of scientific computing, Why bit wise reproducibility matters has some relevance as well. I’m debugging some problem. I make a few changes and recompile, and the problem vanishes. Did I fix it? Or did the mere act of recompiling on my system change something? If I can reproduce upstream builds at will, I can know that any observed change is the result of intended source modification, and not spurious environmental contamination.
I dismissed self propagating compiler backdoors with a wave of the hand. I tend to quickly dismiss the idea because whenever it comes up, it seems to be somebody saying, “The MIB could have implanted a backdoor in gcc in 1988 and it’d still be there and we’d never know!“, but there are aspects of the idea that are more reasonable. A few more thoughts on that. I would not claim they are impossible; just fragile. As in, the backdoor will fail to identify either the compiler itself or its victim program. “Am I compiling a compiler?” is a question that approaches “Will this program halt?” in difficulty. To the best of my knowledge, very little research has been conducted into the practical limits of making such a backdoor reliable in the face of changing inputs.
An early effort is probably 80s and early 90s MSDOS viruses. They’d infect anything and everything, then (e.g.) delete some random file at midnight. Propagation and payload. Any file created by an infected compiler would also be infected, but any file copied by an infected xcopy.exe would also be infected. Invisible compiler backdoor or something else? On the fragile side, viruses were notorious for corrupting files because crude checks for the MZ header weren’t 100% reliable. This line of development seems to have evolved towards Stuxnet and Flame, etc.
Or consider a previous effort to install a backdoor in sshd. Even with source access, developing a backdoor that worked as intended required quite some effort. The next logical step, whacking the compiler to insert all these changes automatically, would appear even more difficult to achieve. Even small changes to the ssh source would be hard to resolve. There are, of course, easier backdoors to introduce, but given this was one that we know was developed, it’s worth considering how easily it would be converted into a compiler introduced backdoor.
Compilers do not appear to be the preferred delivery mechanism for replicating, persisting backdoors.
Although... XcodeGhost is the first compiler malware in OS X, infecting iOS apps. Ouch.
Anyway, never trust the thoughts of some rando on the internet.