some more books 3
Complex systems are hard to debug. When there’s too many pieces for any one person to understand, how can we know what we know?
soul of a new machine
At Data General, teams have to compete for resources. de Castro thinks this allows the best ideas to find their way, but it means if you can’t convince the software team to write software you need, you won’t ship. And so, you had to commit to tight schedules, the closest deadline that nobody can prove you won’t make. But for those who do, you might earn a reward of some extra stock. Except nobody knows which rules are real, and which are mere suggestions. “They lived in a land of mists and mirrors. Mushroom management seemed to be practiced at all levels in their team.” By fall 1978, the team has been assembled and work is seriously underway to build Eagle. Another way to motivate the team is call someone into a meeting, point out that VAX is going to ship soon, and “No one’s saying it’s your fault, but Eagle’s late, very late. It really must be designed and brought to life and be ready to go by April. Really. In just six months. That won’t be easy, but the brass think you can do it.” So I think our one year schedule is cut back here?
The team is split between hardware, Hardy Boys, and microcode, Microkids. The hardware side doesn’t want to implement a difficult function, so they ask microcode to do it. But microcode is already busy, so they only agree if hardware adds a new function to make their job easier. Computer design via horse trading. There’s certainly merit in having two teams work together to split responsibility, although the situation here sounds a bit more chaotic than ideal.
A rule of thumb when building a new system is to stick with existing chips, but West gambles on a new circuit, PAL, Programmable Array Logic.
“Not Everything Worth Doing Is Worth Doing Well”
Focus on the value the machine will deliver to customers. Don’t worry about what other computer engineers will think if they take it apart. “To some the design reviews seemed harsh and arbitrary and often technically shortsighted. Later on, though, one Hardy Boy would concede that the managers had probably known something he hadn’t yet learned: that there’s no such thing as a perfect design. Most experienced computer engineers I talked to agreed that absorbing this simple lesson constitutes the first step in learning how to get machines out the door. Often, they said, it is the most talented engineers who have the hardest time learning when to stop striving for perfection.”
After six months, the design of Eagle is taking shape, on paper at least. It’s going to have a multi board CPU, seven boards. Maybe some day miniaturization will matter to signal delay between components, but for today, CPUs spread across multiple boards run faster by doing several things at once. The use of PAL has made development faster. A complex part of the design can be indicated with a box that simply says PAL, and they’ll fill it in later.
Running a program is hard work. We have to fetch the next instruction. We have to translate its memory address to find it. We may have to swap it in. But what if the page fault handler is swapped out? That should never happen, but just in case there’s a microNova attached to the Eagle, monitoring it for fatal deadlocks. The clock inside Eagle ticks every 220 nanoseconds.
In early 1979, we have two prototype machines built and have begun the painful process of debugging. The Eagle is a complex machine, which means that many parts need to work before any of it works. This will be much harder than debugging the Eclipse, which West excelled at. But they’re still playing games with the schedule. Maybe. Are the commitments real, or does everyone pretend they’re real while secretly knowing the truth? Nobody really seems to know. But now, word comes that the FHP is going to be very late. Suddenly finishing Eagle by April isn’t a matter of pride, but of business necessity.
Next to each computer is a logic analyzer, like a hospital crash cart. They allow operators to see what’s happening inside the Eagle. Where signals are going, and how long they take. Every day each team writes up their fixes, and then in the morning the other team applies the same fix to their machine. This is slow, tedious work, unwrapping wires from here, moving them there, testing with ohmmeter to verify the right connection. Every day, a very difficult very manual merge process. This does not sound fun, and indeed they hated doing it. But allowing the machines to diverge would mean solving the same problem twice. At the same time, they have to be careful not to drop a sliver of wire by accident, or they will spend hours chasing down phantom faults.
A good anecdote about discipline in problem solving. Also, precursor to CPU binning. “On one occasion, the Hardy Boys found that while one board was failing, another one, identically wired, worked right. The problem, they discovered, lay in a single chip. They wanted to throw that chip away, put in another, and go on. But Rasala would not allow it. The suspect chip might not be defective; it might just be working a little more slowly than most of its kind. In that case, they would have a real problem to fix. It might never show up again in the lab, but when the time came to mass-produce this machine, it would almost certainly come back to haunt them. Some slow chips where inevitable, and they had to provide for them.” Timing tolerances in the real world are inevitably less friendly than in development.
As the man in charge of debugging, Rasala is getting really invested in the Eagle, taking over some anxiety from West. He feels like he’s living in the movie Duel. When his young son gets angry, he says, “I hope your machine breaks, Daddy.”
Windows 3.0 was released on June 8, 1990 and changed everything, seven years after the first public demo of Windows 1.0. Support for OS/2 is now rapidly dwindling, and the NT team is increasingly looking at Windows. Maybe this new operating system should be called Windows NT? I’d always known that NT was for new technology, but I hadn’t realized it had been in development for two years without necessarily being a Windows successor. The success of Windows 3.0 changed the status of Microsoft in the computer world, and made Windows an integral part of the brand, and thus NT followed suit.
This requires quite a shift in team focus, and kills any change of demoing four OS/2 apps running on NT. A large API is designed, with over 1000 functions. I’m a little curious what state NT was in before this. A kernel with no API? In any case, on January 17, 1991, IBM developers are invited over to review the native NT API, and lo, it’s all Windows functions and no OS/2 functions. Such was IBM’s first notice that NT would not be supporting OS/2 as a first class citizen.
Cutler is now forced to reckon with backwards compatibility. VMS had compatibility, but only for a limited set of applications. Windows and DOS applications come in many varieties, and Microsoft has little control over them, and NT must support all of them. However, there was already code written for OS/2 to emulate DOS, so, even though NT’s kernel was entirely different, that code can be rewritten for NT. Some aspects of the OS/2 work weren’t a total waste then. You learn how to do something in one project, you can do it again for another project.
The NT teams has now exploded from one group of 25 to three groups of 35, all under Cutler. “He was no longer viewed as a familiar patriarch whose mercurial personality was accepted. Most of the new people were afraid of Cutler and his temper tantrums. They spoke with him only at their own risk.” Meanwhile, at least some of the new team were happy to be working on something other than OS/2. An MS engineer and IBM engineer had been working together when OS/2 crashed, and the IBM remarked, “Wow, what a nasty problem. Glad that isn’t in my code.” Then rebooted and ignored the problem.
Starting in 1991, we have a new direction and new timeline. Beta in August. Code complete by October 31. Testing and fixing, then final release sometime in 1992. We still don’t have a file system however. Either a decision to write a new one or reuse an existing one.
Things are coming together, but looking back, I’m a little surprised at how little they accomplished in two years? No graphics, no networking, no back compat, no userland API? For 25 people? Even with shifts in direction, this seems like slow progress. Linux 0.01, limited as it was, doesn’t even seem that far behind? It’s probably a big error to make such comparisons with information available, but at this point NT seems to be little more than a scheduler and memory mapper?
dealers of lightning
Gary Starkweather wanted to play with lasers, but his bosses at Xerox’s Webster lab think it’s foolishness. Ironic, given that manipulation of light is core to their business. One interesting point is that it was believed that lasers were incompatible with selenium receptors, because the laser would be too intense, and lead to semiconductor fatigue, after which it wouldn’t work. But in short bursts, the laser had no such effect. However, one had to build a device to test this theory. Knowing it won’t work in advance did not actually save time, but rather held the lab back. Eventually, Starkweather swings a transfer to PARC, against his bosses’s wishes. He is surprised at how barren the lab is, but quickly sets to work. Where he had trouble ordering a $2500 laser in Webster, at PARC a $15000 laser was no question.
Building a laser printer out of a copier presented several challenges. The Model 7000, the top end copier, fed pages in sideways for speed. But this would mean the printer couldn’t print one line, then the next, down the page. It would need to print one column, then the next, across the page. This required buffering an entire page of bitmapped data. On the bright side, however, this meant they wouldn’t be limited to simple text. They could render different fonts, with bold and italics. In the end, refusing to take shortcuts led to a much more versatile printer.
Yet bringing the laser printer to market would require overcoming many political battles within Xerox. In 1972, LLNL put out an order request for a laser printer, knowing only Xerox could make it, but Xerox HQ turned down the bid because they refused to sell prototype equipment, fearing reputation loss. In 1974, Xerox was planning to create a new printer model, but was going to base it on CRTs, not lasers. Jack Goldman was able to intervene, only by flying executives out to Palo Alto to see the laser printer for themselves. But still, they refused to attach the printer extension to the Model 7000 copier, waiting until the release of the Model 9000 in 1977. It would become one of Xerox’s top sellers, after refusing to commercialize it for five years.
Taylor used to say PARC employed 56 or 78 of the 100 best computer scientists in the country. Fact check: PARC never employed 78 computer scientists. But the point remains, they were assembling quite a team.
Then Rolling Stone magazine, ironically funded by Max Palevsky, ran Spacewar: Fanatic Life and Symbolic Death among the Computer Bums. To the suits back East, PARC sounds out of control, even though the article was a fantastic advertisement for the people most interested in working there. Soon there were consequences. Visitors were required to sign NDAs. “Quirkily enough, the pledge attested that the visitor would not ‘import’ any of his or her ideas into PARC, a departure from customary agreements, which bar visitors from carrying proprietary information out of the lab. In any case, the goal was to protect Xerox from a claim that PARC had misappropriated someone else’s idea.”
And now’re going to build an Alto! Well, soon. Alan Kay first pitches the idea to the group, but Jerry Elkind, budget director, is not impressed. “As a manager he responded to rationales on paper and rigorous questions asked and answered, not hazy visions of children toying with computers on grassy meadows. He was a tough customer, demanding and abrasive. He asked too many questions and, more’s the pity, they were often good ones. As Jim Mitchell once remarked, ‘Jerry Elkind knows enough to be dangerous.’” So Kay regroups, and starts to design a series of educational programs for Novas. Then Thacker and Lampson approach him and ask if they can borrow some of his budget for a new machine they are working on. Elkind was out of town for a few months, and they figure they can get something done by reappropriating budgets from around the lab.
Getting ahead of ourselves, but the Alto is going to be very popular. “To computer scientists who had spent too much of their lives working between midnight and dawn to avoid the sluggishness of mainframes burdened by primetime crowds, the Alto’s principal virtue was not its speed but its predictability. No one said it better than the CSL engineer Jim Morris: ‘The great thing about the Alto is that it doesn’t run faster at night.’”
The key realization that made them commit to the Alto was that it would be okay to use two thirds of the processor and three fourths of its memory just to run the display. Advances in hardware technology would mean that fraction would diminish over time, leading to increasingly capable machines. But you want to build the machine with the display now, to gain experience using it, and to design the software it will eventually run.
To save cost, the main processor will be responsible for nearly everything, switching between tasks as needed. (Compare with the Eagle design?) Disk controllers and other components would be eliminated in favor of CPU processing. They work out a complex interrupt task system. Later, Wes Clark asks how it compares to the TX-2, thirteen years old at that point. Turns out to be identical, but they hadn’t read the paper. So even in 1972, we were forgetting how things used to work.
The Office of Technology Licensing at Stanford as been a success, bringing in more revenue than Research Corporation ever had, including a patent for the sound processing system that would become the Yamaha DX7. Two labs, one at Stanford and one at UCSF, have invented recombinant DNA, transferring a gene from a frog into a bacterium. Niels Reimers thinks this could be patentable. The inventor, Stanley Cohen, disagrees, arguing that science belongs to everyone. Finally, an agreement is reached. The scientists would not personally benefit from the patent, to avoid impropriety, but the university would license it to corporations. And other universities would not be required to license it.
There’s one hurdle. The University of California patent office is very backlogged, and not sure this patent is worth persuing. Stanford offers to cover all expenses, in exchange for 15 percent of revenue, then the remaining 85 percent will be split evenly. Later, when the patent returns $255 million, UC will complain that the 15 percent ($40 million) Stanford earned was too much.
More about the development of recombinant DNA: The Invention of Recombinant DNA Technology.
Mike Markkula joined Intel shortly before introduction of the 4004. One day he goes to the shipping warehouse and asks for a report on ordered but not shipped parts. The manager gets out some paper and pencil and starts adding it up. So he writes an inventory management system running on a PDP-10 by Tymshare. I think this is pretty funny. The computer maker’s son has no computer. As Intel grows, they move to a new campus on a pear orchard in Santa Clara. For several months, employees were able to pick pears, until it was replaced with a second building. I’m still amazed at how late Silicon Valley was an agricultural area. And now there’s no more room to grow.
“In general, Grove did not value marketing in Intel’s earliest years. He liked to say that engineering developed products, manufacturing built them, and sales sold them -- so what did marketing do?” So Markkula is not getting a lot of internal support, but he stays on because he likes his job. In 1974, Intel goes from being a “debt-free money machine” per Forbes, to laying off 30 percent of its workforce. Markkula gets shuffled around internally, just the last of his stock options vest. He quits at age 33, with $10 million in 2016 dollars worth of Intel stock. He starts holding office hours in his house for other entrepreneurs. In fall 1976, two guys named Steve come to visit.
Fawn Alvarez has graduated from high school, and gets a job on the ROLM manufacturing line. “Manufacturing jobs were plentiful in Silicon Valley in the mid-1970s. Local companies were building chips, calculators, computers, peripherals, video games, and electronic equipment. In the two decades after 1964, Silicon Valley added more than 200,000 manufacturing jobs, 85 percent of which were in high tech.” Although... “The solder, which flowed over a metal bar to make what looked like a tiny waterfall of silver mercury, stuck to the metal and secured the parts in place. The boards were then washed in a chemical bath, after which workers hung them to dry like photographs in a darkroom. (‘That would be so illegal now that I can’t believe I don’t have cancer,’ Alvarez says.)”
Every employee received 12 weeks off with pay after six years. Some employees couldn’t afford to take much of a vacation, however, so the company altered the benefit to all the option of six weeks off at double pay. The ROLM founders did not like unions, but felt any company where employees wanted to unionize had done something wrong. Better to treat employees well to begin with.
Soon Alvarez is making suggestions on how to improve the line. In 1977, she is manufacturing supervisor, at twenty years old managing employees twice her age. ROLM is going to move a new campus in Santa Clara (what is apprently now the Levi’s Stadium parking lot). “The campus had a six-lane lap pool; volleyball, racquetball, and tennis courts; a fitness room, a sauna and steam room, waterfalls, a jogging trail, and ponds stocked with fish.” A vision of Valley offices to come. There’s also a cafeteria, which will mean fewer arguments over immigrants bringing “stinky” meals to work.
Atari needs more money. Their Grass Valley research center is building a new machine based on the 6502 microprocessor, so they are considering selling the company to Warner Communications. There’s a bit of trouble, though. Atari has very weak financial controls. Alcorn seems alright, but Bushnell and others are a bit loose in a way that worries east coast money men. Atari seems to be making money, but it’s hard to see how, and they suspect it may be a sham. Eventually, they reach a deal. Atari will receive $12 million immediately, but if they make certain revenue targets, there’s an additional $16 million.
Sandy Kurtzig is still trying to make ASK successful. She’s having trouble getting a version of MANMAN for HP computers finished, in addition to DEC and Data General. Every computer is different. She cuts back to only the HP version, and two years after starting on it, delivers MANMAN for HP, written in FORTRAN. All caps, all the time. The customer was Hughes, who provided a computer for development, but once it was done, the computer was shipped with the software, leaving her without a development machine. Quite a different era, where I think we take it for granted that computer access is the least of anyone’s troubles. “The venture capital community in Silicon Valley did not yet appreciate the importance of software. Today software accounts for more than half of venture capital investments in the Valley, but in 1977, the figure was 7 percent.” That was the year Larry Ellison tried and failed to raise VC funding for Oracle.
There have been rumors of zeroday markets for some time, but it reached public notice when “fearwall” tried to sell an Excel zeroday on eBay. They contacted Microsoft first, who declined to fix it, and so off to the market it went, before eBay pulled the listing. At first the market was mostly underground, used to develop programs to steal banking credentials. But now goverment agencies are getting into the bidding process, using exploits for intelligence gathering and other purposes. Charlie Miller published a paper, The Legitimate Vulnerability Market in 2007, causing quite a stir. Governments still research exploits in house, but it’s become more expensive as it now usually takes multiple vulnerabilities to bypass defences. Outsourcing to specialists means a more reliable stream of ready to use exploits. Endgame, VUPEN, etc.
Back to stuxnet, Nico Falliere is reversing the payload. It contains a copy of the Step 7 DLL, but modified to hook all the read and write calls. Trying to figure out what it’s doing is rather difficult, because first he must figure out what the normal Step 7 DLL does. This is a complex task, for which there’s no source or references. Finally, he discovers what stuxnet is doing. Whenever Step 7 programs a PLC, it inserts a little extra code to run on the controller. Then later, if the PLC code is read back, the extra code is removed. So an infected machine will deliver a second stage payload to the PLC, but even if it’s inspected or reprogrammed, there will be no evidence.
This second stage is quite sophisticated. Before doing anything, it simply monitors the system’s operations. It records normal data. Later, when it activates the sabotage code, it plays back the recorded data. Everything will appear to be operating normally, even as it’s all falling apart. It’s clear now that stuxnet was designed for sabotage and not espionage. Additionally, by disabling fail safes and returning fake operation data, it seems the intent is not merely to disable some physical system, but to damage it. Live Free or Die Hard, as they say.
When industrial control systems are compromised, bad things can happen. Chapter 9 is a quite nice summary of all the many attacks on systems. The punch line is that accidents had been happening for years, but the security implications had been resolutely ignored. Also, “Why spend money on security, they argued, when none of their competitors were doing it and no one was attacking them?” When news of stuxnet broke, Dillon Beresford bought some old Siemens PLC equipment off eBay, and quickly identified many vulnerabilities. No longer quite so secure in obscurity. Default passwords abound. “Switches and breakers for the power grid, for example, are often set up this way with default passwords so that workers who need to access them in an emergency will remember the password.” And there’s no lockout, to prevent paniced operators from disabling their own access.
We can access the potential damage of a hacker by studying some accidental failures. The Sayano-Shushenskaya dam failed in 2009 when a turbine broke loose. A drop in power output from another power plant, caused the turbine to speed up, which vibrated a bit too much, and boom goes the dynamite. Such vibrations would be easy to deliberately trigger via software control as well. A dike in Missouri collapsed in 2005 when sensors accidentally detached from the dam wall and failed to register that the reservoir was full.
We can also attack smart meters. New neighborhoods often have wifi mesh connected smart electric meters, which allow remote service cutoff. They can receive firmware updates over wifi as well. Which enables a worm to spread. The vendor says the meters lack the ability to update each other. Of course, that’s true at the factory, but the worm adds the feature for each meter to send updates to others nearby, impersonating a technician. The vendor says they’ll just drive around and flash the meters back. No, the worm also disables the update port, so it will no longer receive updates. There’s an alarming amount of pushback here, by vendors asserting that if it’s not in the manual, it’s not possible.
Another major incident is the 2003 Northeast blackout. The primary cause was sagging electric lines, overloaded with current. But a software failure that prevented alerts was a major contributor. Report.