Space Shuttle Fault Tolerance, 1970s

Handing a major piece of the national destiny to a university laboratory worked so well that NASA resolved never to do that again. And relying on North American Aviation to design and build Apollo's Command and Service Modules worked so well that NASA made sure to ask them (in their new hats labeled "Rockwell International Space Division") to reprise that role with the Space Shuttle Orbiter.

In the flight computer area, NASA's (mostly) good experience with the Saturn Launch Vehicle Digital Computer (LVDC), from IBM's Federal Systems Division, led them back to FSD for the Shuttle's General Purpose Computers (GPCs). Being under political pressure to buy "COTS" -- Commercial Off-The-Shelf -- avionics, NASA couldn't ask IBM to develop a Shuttle-rated fault-tolerant version of the LVDC, so they bought IBM/FSD System 4Pi computers used in B-52 bombers and other aerospace vehicles.

At the Lab, we AGC veterans had re-invented ourselves as fault-tolerance experts, putting ourselves in the right place for the moment when NASA realized how much inherent aerodynamic instability was unavoidable in the Shuttle because of its requirement to maneuver for landing at any major airport in emergencies. Unlike every other place 4Pi computers had been installed, recovery from a failure with worst-case timing had to be complete in four tenths of a second.

So where manual switchover to a backup system wouldn't cut it, our expertise had to be applied to design "down-to-the-metal" software to bring the survivability of a multi-GPC redundant set to, or even beyond, the level of the LVDC. Combined with some non-avionics Rockwell developments like hydraulic actuators with quadruple-redundant secondary actuators to achieve correct operation by majority voting, we went one step beyond the Triple Modular Redundancy of the LVDC.

Despite skepticism from NASA and Rockwell about the complex intricacies of our design, we did not create a system that would eventually commit suicide by outsmarting itself. The tragic accidents that destroyed two Orbiters and their crews were caused by events far beyond the ability of electronics to affect, and we did catch and control some avionics failures.

It was a psychological wrench for us to have to give up finding ways to make a dinky computer do 6 impossible things before breakfast and focus instead on failures, but in the flight computer area, that was our contribution to making Shuttle avionics work faithfully for 3 decades. It made for more memorable "war stories," and those occupy a section in my book.