method and methodology « Bug counting

Do you trust nuclear energy? I know, it’s a complex question — not the kind that easily leads to a “yes” or “no” answer. Let me rephrase it into a possibly less controversial form: do you trust nuclear engineers? Now, the odds are turning, and there’s a good chance Joe or Jane Sixpack would answer in the affirmative, without feeling the need to add qualifications to their answer.

Indeed, for all the negative stereotyping about being boring or uncreative, engineers have a generally solid reputation among the general public of being trustworthy and competent professionals. Well, I mean, traditional engineers have. Those who build bridges, buildings, and power plants. Those are the guys you can trust. Ah, if only software development were a solid engineering discipline in the same league with civil, mechanical, and nuclear engineering! We wouldn’t have to deal with lousy apps or insecure systems anymore! We could enjoy the same level of reliability and safety of, say, nuclear power which, in the immortal words of Homer J. Simpson, “has yet to cause a single proven fatality.”

Seriously, I don’t want to compare apples and oranges, but it is true that software developers may enjoy a generally poorer reputation than other professionals. The latest outcry follows an all too familiar argument: 1. Software is spectacularly unreliable; 2. Traditional engineering delivers highly reliable products; 3. Ergo, software development should adopt engineering practices (or, as the article goes, stop calling itself software engineering).

I will not pick on that particular article, even though it’d be a particularly easy target given its technical shallowness and shaky arguments (how is the dieselgate an indictment of software’s low quality?). Instead I’m interested in the more general issue of what traditional engineering has to teach to software engineering. I also want to avoid going down the line of nominalism — which all too often emerges in debating these kinds of topics — looking for the “essence” of engineering, software development, or the like. That’d be neither useful nor particularly interesting.

First of all, I believe that the gap of reliability between the best software systems and the best artifacts produced by traditional engineering is much smaller than the common opinion would lead us to believe. In fact, the line between the two is blurry, in that some of the most reliable complex software lies at the core of airplanes, cars, or control plants. If the latter are reliable, it is largely because the former is. As another, historical but still very significant, example see Feynman’s report on the Challenger disaster: among the various engineering divisions, the one responsible for avionics software is the only one whose practices pass with flying colors: “the computer software checking system and attitude is of the highest quality.” Of course most software development does not come even close to complying with the standards of quality of avionics software; but neither is the engineering of the majority of traditional consumer products with no mission-critical requirements.

Another feature traditionally ascribed to traditional engineering as opposed to software engineering is a rigorous, systematic approach that does not leave anything to chance. A nuclear engineer, for example, knows exactly what goes on in every stage of development and deployment of a nuclear power plant, and knows how to ensure that no accidents occur. Except this is not exactly true. It turns out that [Mahaffey, 2015] the riskiest activity related to harnessing nuclear power is fuel (re)processing, where fissile fuel is produced by chemical means. The major risk is that some of the fuel becomes critical, that is capable of sustaining a spontaneous chain reaction. Criticality does not only depend on the amount of fuel but also on its geometric configuration. It is practically impossible to predict every possible configuration the fuel may take while being processed, and in fact several accidents occurred due to unexpected accumulation of material in what turned out to be criticality-inducing configurations. This does not mean that the whole enterprise is left to chance, only that perfect planning is unattainable and one must deal flexibly with uncertainties. The best practices are to ensure that the operators are thoroughly trained not only in the procedures but are also fully aware of the general issues that may arise and on the lookout for unpredicted sources of danger. This attitude towards prevention and awareness was first adopted in the fuel processing plants of the Manhattan Project, when a young Feynman (him again) pointed out to his superiors that the excessive secrecy initially imposed, which prevented workers from knowing what exactly they were doing and what risks they could encounter, was counterproductive and foolishly dangerous. Prevention is the best protection, and prevention requires knowledgeable people, not drones.

Which leads us to the other main point: what can software engineering really learn from traditional engineering — and nuclear engineering in particular? It’s not the scientific foundations: informatics provides rock-solid foundations. It’s not so much the institutionalization as a licensed profession: while it may be useful in certain contexts (for example for freelance software developers) it’d have little or no relevance in others. The attitude that software engineering can learn from nuclear engineering is what to in the aftermath of an accident.

When something goes wrong in a nuclear reactor and it gets scrammed, it is imperative that the dynamics of the accident be understood down to minute details. This involves understanding the physics of the reaction, any failure of the equipment against its supposed behavior, how the personnel reacted, what practices were followed up to the time of the accident that may have altered operating conditions or generally increased risks. Based on an exhaustive post mortem, procedures, technologies, equipment, and training are revised. These activities have top priority, and must be completed to satisfaction before operations can restart. The net effect is that the chances of the same kind of problem occurring twice are minimal, and reliability improves by building on an ever growing knowledge base. (Bertrand Meyer made similar remarks about aerospace engineering.)

Two final comments. First, I believe such kind of practices are already in place, in some form at least, in the best software development environments. Besides safety-critical software, where we have several well-known case studies, the practice of large-scale software development includes inspections, regressions, and rigorous analysis. Even Facebook has given up on their “move fast and break things”, and realized the importance of stable infrastructure and rigorous analysis. Second, engineering is ultimately all about trade-offs. If you’re developing a silly app, you may just accept that “mediocre” is good enough. And that what you’re doing is not quite “engineering”.

Reference

James A. Mahaffey: Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima. Pegasus, 2015.

It was the best of times, it was the worst of times, it was the age of fastness, it was the age of sluggishness, it was the epoch of abstraction, it was the epoch of detail, it was the season of generality, it was the season of specificity, it was the spring of right, it was the winter of wrong, we had improving performance before us, we had degrading performance before us, we were all going direct to where Moore predicted, we were all going direct the other way — in short, the period was so far like the present period, that some of its most influential experts insisted on its being received, for good or for evil, in the superlative degree of comparison only.

Sometimes I have the impression we’re experiencing a collective delusion regarding how we’re dealing with the end of Moore’s law — the prediction that the density of transistors in integrated circuits grows exponentially over time.

The growth predicted by Moore’s law has gone on for decades, providing a veritable, Lucullan free lunch to the entire information technology industry. But now this bonanza is running out: the density growth is slowing down, and improvements in miniaturization do not directly translate into faster clock rates.

That exponential growth is not sustainable in the long term should come as no surprise to any trained engineer. In Moore’s words: The nature of exponentials is that you push them out and eventually disaster happens. Or, using Linus Torvald’s unmistakably caustic style (talking about a different exponential growth): Unending exponential growth? What drugs are those people on? I mean, really.

Rather than coming to terms with the harsh reality — the party is over — we prefer to sugar the pill and tell a less dramatic story. Sure, the clock rate of microprocessors has not increased significantly in a while, but the progress of electronics technology is still spectacular, and is giving us chips with more and more computational units crammed onto the same circuit. So, the updated story goes, the new challenge is exploiting the increasingly parallel hardware by writing increasingly concurrent software. Of course, concurrent software is notoriously hard to get right, but software technology is up to the challenge. More of Moore through concurrency.

But can concurrent programming really let us keep on riding the exponential speedup wave? There’s no question that improving the state of concurrent programming is a fascinating research challenge, both intellectually and practically. But I’m afraid the end of Moore’s law is more disruptive than we are willing to admit. And its long-term consequences cannot be countered simply by better programming.

The elephant in the room is that we know of only a handful of algorithms and programs that can be massively parallelized. In most cases, there is a significant fraction of the computation that is intrinsically sequential, which drastically limits, per Amdahl’s law, the speedup that can be obtained by adding computing cores. You can have the best concurrency model in the world, a flawless implementation, and as many cores as you like, but still be nowhere near anything like exponential speedup (in fact, not even linear speedup!).

Here’s what some Don Knuth said about it in a 2008 interview that should be quoted more often.

To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! […]

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. […]

[Hardware vendors] think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. […]

From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation.

I’m inclined to give him the benefit of the doubt and assume he’s right, as I have the impression the guy knows a thing or two about algorithms and programming.

I look forward, by all means, to the challenges of combining abstraction and efficiency in concurrency research. But perhaps we should not be selling concurrent programming as a potential “fix” to the end of Moore’s law. While every exponential growth eventually stops (I mean, really), we can have the more down-to-earth goal of making the most out of the new exotic hardware architectures that are being developed. May we live in interesting times.

“Welcome to the first class of our course on formal verification. I’d like to start with some motivation: why verification is important in the real world, and what the consequences of lack of verification in software development are.” Lights go off. You hear a countdown recited in French, “Trois, deux, unité, feu!“. On the screen, a rocket blasts off its platform into the sky. Thirty-something seconds into its ascension, it awkwardly topples off and quickly disintegrates. “That was the Ariane 5 rocket’s first launch, which failed due to a software bug. The failure resulted in an estimated 300 million Euros loss. Software errors can even cost human lives, such as in the case of the Therac-25 machine for radiation therapy, whose malfunctioning resulted in several patients dying from radiation burns. Formal verification can ensure that software is free from defects before it’s deployed, thus avoiding catastrophic disasters such as those just mentioned. Now here’s the course syllabus…”

OK, I admit to slightly exaggerating the above for dramatic purposes. But, if you ever took a course on formal verification, chances are that it started with some graphic pictures of exploding rockets, unresponsive Mars rovers, or defective medical equipment, followed by stark tallies of the enormous losses, in money or human lives, resulting from the disasters. The bottom line: software quality is generally abysmal; and the systematic all-out application of formal verification is our only hope to avert further similar disasters from occurring in the future.

What’s the problem with this angle, to which I may have also been guilty of paying lip service in the past? I think this way of pitching formal verification points to motivations that are not fairly representative of the varied landscape in the research and practice of formal methods, and risks alienating software development practitioners, who have different concerns in their everyday’s work — concerns which conflict with the oversimplified picture painted above.

The first problem I see is that pointing to a handful of major disasters due to software failures may be construed as suggesting that formal verification is something only justified for mission critical applications. What about the myriad software projects that don’t cost hundreds of million dollars and whose failure does not result in ruinous losses? Is there a place for formal methods within such projects’ modest budgets? Related to these points is the misled vision that formal verification is an all-or-nothing business: either it enters every step of development or you might as well do things as you’ve always done them — but then disaster will impend. The flip side is that the only failures that matter are the catastrophic ones; everything else is not a big deal. In reality, everyday software developers generally remain genuinely concerned about software quality even when working on non mission-critical development; but also have to cope with all sorts of project constraints pulling in different directions.

A better way to frame the usage of formal verification is in terms of a continuum of trade-offs. Each trade-off balances some of the additional costs incurred by applying verification techniques against some gains resulting from higher reliability or confidence in the software’s correctness. This way, each project can pick the trade-off that best fits its requirements and budget. It’s easy to guess that the overwhelming majority of development projects occupy some intermediate position in the trade-off continuum: their failures won’t result in apocalyptic losses, nor are they of no consequence whatsoever. These are the projects that most practitioner developers are familiar with and can relate to. To each their own: Paris Métro Line 14 deployed the B method to build correct-by-construction modules through refinement; for my students learning Java and C#, writing a decent collection of unit tests and running a lint-like tool on their projects is good enough.

The second problem with the above pitching of formal methods is that it may suggest that they are a silver bullet. You want high reliability: just use formal methods (but it will cost you)! It doesn’t matter what your development process is, what programming language you’re using, whether your software is interactive, if you’re deploying in a legacy system, what your amount of technical debt is. Start using [your favorite verification technique] and your problems will evaporate!

But reliability, and practically every desirable quality property, is never the result of a single technique, tool, design choice, or individual talent. It can only follow from a sensible, well-organized process; one that is transparent, traceable, technologically up to date, based on a clear understanding of the application domain, and so on. The investigations following the failures of the examples mentioned above (Ariane 5, Therac-25, etc.) invariably pointed to communication and process shortcoming as well as, often more egregious than, technical ones. Correctness cannot be built as an afterthought; it requires planning and practice. Based on his experience at Microsoft, James Larus claimed that a root cause of software bugs is “missing, outdated, and incorrect knowledge“, which easily result from poorly organized processes. Conversely, the practical adoption of formal verification is very often slowed down by unintended process constraints; progress in this sense doesn’t necessarily require more powerful techniques but mainly integrating known techniques in a way that is not disruptive of existing practices — perhaps except for unfixable noxious practices whose disruption is desirable anyway.

The third problem I see with the “disaster-oriented” motivation for formal methods is that it seems to suggest that verification research is worthy only to the extent that it’s directly applicable to relevant software development issues and leads concrete gains, for example averting failure. This view encourages a view of science as a provider of cost-effective technology. To be sure, verification technology has had enough success stories that even long-standing skeptics must concede it can be useful in practice. (If you find the previous sentence controversial, I have blog posts planned to convince you!) But practical applicability need not be — should not be — the only goal of researchers. There must be room for speculative inquiries, also because long term progress invariably and crucially depends on them. Think, for example, of how the spectacular progress of SAT solvers has fueled advances in automatic verification in the last decade or so. The performance of SAT solvers hasn’t mushroomed overnight, but is the result of decades of research on a topic that seemed, at the time of its inception, quite removed from practicality. As Ed Clarke often recalls when presenting SAT-based model checking, if he had suggested using an NP-complete problem such as SAT as the basis for a practical verification technique at the time he was working on his Ph.D. thesis, his colleagues would have called his suggestion implausible because clearly clueless about the well-understood notion of intractable problem.

In spite of not always being pitched and motivated in the most convincing way, formal verification remains an exciting, quickly growing research topic that has made spectacular progress, which has been winning, in one form or another, the favors of more and more serious practitioners. One example among several: John Carmack‘s recent opinion that

The most important thing I have done as a programmer in recent years is to aggressively pursue static code analysis. Even more valuable than the hundreds of serious bugs I have prevented with it is the change in mindset about the way I view software reliability and code quality.

This is quite an encouraging endorsement, even if you work on verification techniques other than static analysis. Or perhaps all the “motivation” that verification research needs comes from Leibniz. In the next formal methods course I teach, I’ll try to open with his famous quote instead of discussing rockets and radiation machines:

The only way to rectify our reasonings is to make them as tangible as those of the mathematicians, so that we can find our error at a glance, and when there are disputes among persons, we can simply say: let us calculate without further ado to see who is right.

Bug counting

A little specification can go a long way

Category Archives: method and methodology

What does the nuclear engineer say to the software engineer?

Reference

Not any-Moore

What is verification for?

More than one way (also in verification)