Walter Tichy recently interviewed Westley Weimer and Martin Monperrus on the subject of Automated Bug Fixing; the interesting interview appears on the March 2015 issue of ACM Ubiquity and is available here. In a follow-up message, he invited the AutoFix team — which includes yours truly — to comment on the interview. In this post I take up his suggestion and briefly discuss the approach to automated bug fixing that we developed in the AutoFix project. Even if I’ll refer to the whole AutoFix team (which also includes Yu “Max” Pei, Yi “Jason” Wei, Martin Nordio, Bertrand Meyer, and Andreas Zeller), these comments reflect my own views — this is my blog, after all 😉 Nonetheless I believe my co-authors’ opinions are in line with mine; and everybody’s welcome to leave comments to this post if there’s anything I missed that they want to point out.

The term “automated bug fixing” — or, as it is more commonly referred to in software engineering research, “automated program repair” — denotes techniques and tools capable of suggesting changes that remove faulty behavior of programs. The fixes are often in the form of source-code patches but can also target binary executables or other representations. The most widespread approaches are driven by tests: failing tests identify incorrect behavior that must be corrected; a fix is a change that turns failing tests into passing tests (that is, it corrects some faulty behavior) without affecting the behavior on originally passing tests (that is, it preserves correct behavior).

AutoFix is our technique for automated program repair. Its main distinguishing feature — apparent from the titles of the main publications describing it and listed below — is that it takes advantage of contracts (assertions such as pre- and postconditions) available in the source code. Contracts, as they are normally used in Eiffel or in other contract-equipped languages, are simple specification elements embedded in the program text. The key word here is simple: don’t think quantifiers, first-order logic, and formal verification; think list /= Void, set.count > 0, and runtime checking.

How does AutoFix use contracts? The main usage is actually independent of the details of the program repair technique: AutoFix generates random unit tests (using another tool called AutoTest) and uses contracts as oracles to classify them into passing and failing. Contracts also remain available for runtime checking throughout the various steps of the AutoFix algorithm. For example, AutoFix uses that information to sharpen the precision of its fault localization algorithms. (As Weimer explains in the interview, fault localization answers the question: “where in the program should we edit?”, before fix generation decides “what should we edit it to?”.) Contract checking also helps regression testing of candidate fixes, and the Boolean expression of a violated assertion may suggest cases that require special handling, such as an empty list or an index on an array’s boundary.

In terms of final output, the greater precision in fault localization and fix generation enabled by having contracts helps build good-quality fixes with limited search. Indeed, emphasis on the quality and acceptability of fixes is an aspect that we stressed early on in our research on AutoFix, predating the growing concern with similar aspects that characterizes some recent related work (such as PAR, Monperrus’s, and Rinard’s). Since the fixes produced by automated program repair are validated against a finite set of tests, they come with no formal guarantees of correctness or even adequacy. The work with AutoFix is no exception here because it deals with simple, and hence glaringly incomplete, contracts that cannot be used for static verification. In our experiments, however, we have manually inspected the fixes produced by AutoFix to assess their quality from a programmer’s perspective. It turns out that over half of the valid fixes produced by AutoFix are what we call proper; that is, they not only pass the available tests (a requirement that may be trivial when the tests are few or of poor quality) but are ostensibly consistent with the program’s implicit specification and can be applied to the codebase without introducing undesired behavior.

Performance is another aspect indirectly impacted by improving precision through using contracts: since AutoFix’s repair process explores a search space that is better structured by means of the annotations, it needs limited computational resources. Running our experiments with AutoFix in a cloud computing environment would cost, on average, only few cents per produced fix.

In the quickly growing landscape of automated program repair techniques, AutoFix offers an interesting, particular trade-off that is different than other approaches. On the one hand, relying on contracts and related techniques restricts AutoFix’s applicability to programs annotated with some contracts; in exchange for this additional burden (which is not unreasonable!) AutoFix can produce, requiring limited computational resources, fixes that often are of high quality.

References

“Welcome to the first class of our course on formal verification. I’d like to start with some motivation: why verification is important in the real world, and what the consequences of lack of verification in software development are.” Lights go off. You hear a countdown recited in French, “Trois, deux, unité, feu!“. On the screen, a rocket blasts off its platform into the sky. Thirty-something seconds into its ascension, it awkwardly topples off and quickly disintegrates. “That was the Ariane 5 rocket’s first launch, which failed due to a software bug. The failure resulted in an estimated 300 million Euros loss. Software errors can even cost human lives, such as in the case of the Therac-25 machine for radiation therapy, whose malfunctioning resulted in several patients dying from radiation burns. Formal verification can ensure that software is free from defects before it’s deployed, thus avoiding catastrophic disasters such as those just mentioned. Now here’s the course syllabus…”

OK, I admit to slightly exaggerating the above for dramatic purposes. But, if you ever took a course on formal verification, chances are that it started with some graphic pictures of exploding rockets, unresponsive Mars rovers, or defective medical equipment, followed by stark tallies of the enormous losses, in money or human lives, resulting from the disasters. The bottom line: software quality is generally abysmal; and the systematic all-out application of formal verification is our only hope to avert further similar disasters from occurring in the future.

What’s the problem with this angle, to which I may have also been guilty of paying lip service in the past? I think this way of pitching formal verification points to motivations that are not fairly representative of the varied landscape in the research and practice of formal methods, and risks alienating software development practitioners, who have different concerns in their everyday’s work — concerns which conflict with the oversimplified picture painted above.

The first problem I see is that pointing to a handful of major disasters due to software failures may be construed as suggesting that formal verification is something only justified for mission critical applications. What about the myriad software projects that don’t cost hundreds of million dollars and whose failure does not result in ruinous losses? Is there a place for formal methods within such projects’ modest budgets? Related to these points is the misled vision that formal verification is an all-or-nothing business: either it enters every step of development or you might as well do things as you’ve always done them — but then disaster will impend. The flip side is that the only failures that matter are the catastrophic ones; everything else is not a big deal. In reality, everyday software developers generally remain genuinely concerned about software quality even when working on non mission-critical development; but also have to cope with all sorts of project constraints pulling in different directions.

A better way to frame the usage of formal verification is in terms of a continuum of trade-offs. Each trade-off balances some of the additional costs incurred by applying verification techniques against some gains resulting from higher reliability or confidence in the software’s correctness. This way, each project can pick the trade-off that best fits its requirements and budget. It’s easy to guess that the overwhelming majority of development projects occupy some intermediate position in the trade-off continuum: their failures won’t result in apocalyptic losses, nor are they of no consequence whatsoever. These are the projects that most practitioner developers are familiar with and can relate to. To each their own: Paris Métro Line 14 deployed the B method to build correct-by-construction modules through refinement; for my students learning Java and C#, writing a decent collection of unit tests and running a lint-like tool on their projects is good enough.

The second problem with the above pitching of formal methods is that it may suggest that they are a silver bullet. You want high reliability: just use formal methods (but it will cost you)! It doesn’t matter what your development process is, what programming language you’re using, whether your software is interactive, if you’re deploying in a legacy system, what your amount of technical debt is. Start using [your favorite verification technique] and your problems will evaporate!

But reliability, and practically every desirable quality property, is never the result of a single technique, tool, design choice, or individual talent. It can only follow from a sensible, well-organized process; one that is transparent, traceable, technologically up to date, based on a clear understanding of the application domain, and so on. The investigations following the failures of the examples mentioned above (Ariane 5, Therac-25, etc.) invariably pointed to communication and process shortcoming as well as, often more egregious than, technical ones. Correctness cannot be built as an afterthought; it requires planning and practice. Based on his experience at Microsoft, James Larus claimed that a root cause of software bugs is “missing, outdated, and incorrect knowledge“, which easily result from poorly organized processes. Conversely, the practical adoption of formal verification is very often slowed down by unintended process constraints; progress in this sense doesn’t necessarily require more powerful techniques but mainly integrating known techniques in a way that is not disruptive of existing practices — perhaps except for unfixable noxious practices whose disruption is desirable anyway.

The third problem I see with the “disaster-oriented” motivation for formal methods is that it seems to suggest that verification research is worthy only to the extent that it’s directly applicable to relevant software development issues and leads concrete gains, for example averting failure. This view encourages a view of science as a provider of cost-effective technology. To be sure, verification technology has had enough success stories that even long-standing skeptics must concede it can be useful in practice. (If you find the previous sentence controversial, I have blog posts planned to convince you!) But practical applicability need not be — should not be — the only goal of researchers. There must be room for speculative inquiries, also because long term progress invariably and crucially depends on them. Think, for example, of how the spectacular progress of SAT solvers has fueled advances in automatic verification in the last decade or so. The performance of SAT solvers hasn’t mushroomed overnight, but is the result of decades of research on a topic that seemed, at the time of its inception, quite removed from practicality. As Ed Clarke often recalls when presenting SAT-based model checking, if he had suggested using an NP-complete problem such as SAT as the basis for a practical verification technique at the time he was working on his Ph.D. thesis, his colleagues would have called his suggestion implausible because clearly clueless about the well-understood notion of intractable problem.

In spite of not always being pitched and motivated in the most convincing way, formal verification remains an exciting, quickly growing research topic that has made spectacular progress, which has been winning, in one form or another, the favors of more and more serious practitioners. One example among several: John Carmack‘s recent opinion that

The most important thing I have done as a programmer in recent years is to aggressively pursue static code analysis. Even more valuable than the hundreds of serious bugs I have prevented with it is the change in mindset about the way I view software reliability and code quality.

This is quite an encouraging endorsement, even if you work on verification techniques other than static analysis. Or perhaps all the “motivation” that verification research needs comes from Leibniz. In the next formal methods course I teach, I’ll try to open with his famous quote instead of discussing rockets and radiation machines:

The only way to rectify our reasonings is to make them as tangible as those of the mathematicians, so that we can find our error at a glance, and when there are disputes among persons, we can simply say: let us calculate without further ado to see who is right.