It was the best of times, it was the worst of times, it was the age of fastness, it was the age of sluggishness, it was the epoch of abstraction, it was the epoch of detail, it was the season of generality, it was the season of specificity, it was the spring of right, it was the winter of wrong, we had improving performance before us, we had degrading performance before us, we were all going direct to where Moore predicted, we were all going direct the other way — in short, the period was so far like the present period, that some of its most influential experts insisted on its being received, for good or for evil, in the superlative degree of comparison only.

Sometimes I have the impression we’re experiencing a collective delusion regarding how we’re dealing with the end of Moore’s law — the prediction that the density of transistors in integrated circuits grows exponentially over time.

The growth predicted by Moore’s law has gone on for decades, providing a veritable, Lucullan free lunch to the entire information technology industry. But now this bonanza is running out: the density growth is slowing down, and improvements in miniaturization do not directly translate into faster clock rates.

That exponential growth is not sustainable in the long term should come as no surprise to any trained engineer. In Moore’s words: The nature of exponentials is that you push them out and eventually disaster happens. Or, using Linus Torvald’s unmistakably caustic style (talking about a different exponential growth): Unending exponential growth? What drugs are those people on? I mean, really.

Rather than coming to terms with the harsh reality — the party is over — we prefer to sugar the pill and tell a less dramatic story. Sure, the clock rate of microprocessors has not increased significantly in a while, but the progress of electronics technology is still spectacular, and is giving us chips with more and more computational units crammed onto the same circuit. So, the updated story goes, the new challenge is exploiting the increasingly parallel hardware by writing increasingly concurrent software. Of course, concurrent software is notoriously hard to get right, but software technology is up to the challenge. More of Moore through concurrency.

But can concurrent programming really let us keep on riding the exponential speedup wave? There’s no question that improving the state of concurrent programming is a fascinating research challenge, both intellectually and practically. But I’m afraid the end of Moore’s law is more disruptive than we are willing to admit. And its long-term consequences cannot be countered simply by better programming.

The elephant in the room is that we know of only a handful of algorithms and programs that can be massively parallelized. In most cases, there is a significant fraction of the computation that is intrinsically sequential, which drastically limits, per Amdahl’s law, the speedup that can be obtained by adding computing cores. You can have the best concurrency model in the world, a flawless implementation, and as many cores as you like, but still be nowhere near anything like exponential speedup (in fact, not even linear speedup!).

Here’s what some Don Knuth said about it in a 2008 interview that should be quoted more often.

To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! […]

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. […]

[Hardware vendors] think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. […]

From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation.

I’m inclined to give him the benefit of the doubt and assume he’s right, as I have the impression the guy knows a thing or two about algorithms and programming.

I look forward, by all means, to the challenges of combining abstraction and efficiency in concurrency research. But perhaps we should not be selling concurrent programming as a potential “fix” to the end of Moore’s law. While every exponential growth eventually stops (I mean, really), we can have the more down-to-earth goal of making the most out of the new exotic hardware architectures that are being developed. May we live in interesting times.

These are good times for formal verification. Amazon Web Services is using formal specifications and model checking to support rigorous design at a very large scale. Facebook, of all companies, is getting serious about static analysis, with Monoidics’s separation logic gurus and Francesco Logozzo of Clousot fame among its recent hires. John Carmack, as a programmer, is aggressively pursuing static analysis. Only thanks to formal verification has a remote but pernicious bug in Java’s standard library sort been discovered. It’s clear that — slowly, one step at a time — the offerings of formal methods are becoming practically helpful.

Still, the transition from pure research to practical applications requires time, in formal verification as in other areas. Among the intermediate steps that have to be covered there’s dealing with instances that are realistic: not necessarily full-blown industrial-strength, but not discounting any features that are expected of a fully usable implementation. Realistic software also moves past the challenges of benchmarks, which typically focus on one or few complicated aspects while eliding features that are considered commonplace and hence not interestingly problematic.

Nadia Polikarpova‘s presentation at the Formal Methods Symposium in Oslo a few weeks ago was precisely about dealing with realistic software in verification: our paper describes the verification of EiffelBase2, a realistic general-purpose data-structure library, against its complete functional specification.

Before you roll your eyes thinking, “Seriously?! Verification of data structures in 2015? Wasn’t the problem solved in the 1970s? Or maybe earlier! And they gave them a best paper award for this stuff? Unbelievable!”, let me explain what’s new and, hopefully, interesting about this work. It is true that the amount of work on verifying data structures is overwhelming. Not only because the effort poses interesting challenges but also because data structures are a domain that is naturally amenable to formalization and analysis: everybody agrees on the expected behavior of, say, a linked stack, and it can be formalized succinctly. In other domains (reactive software for example) even specification is a challenge in terms of both “what” and “how”. Still and all, most data-structure verification work focuses on demonstrating certain aspects of a technique and is applicable to implementations written in a certain way, possibly with some restrictions, and not in others. In contrast, Nadia implemented and specified EiffelBase2 before verification (even if, naturally, some later adjustment were necessary to complete the verification effort) with the goal of providing a usable realistic container collection without compromising on any aspect (object-oriented design, API usability, implementation performance, …) that is to be expected in practice. In fact, EiffelBase2 now ships with the EiffelStudio compiler as an alternative to the old EiffelBase library. (It did not fully replace it only for backward compatibility reasons.)

Here are a few examples of EiffelBase2 features that you would expect in a usable data-structure library (think java.util or .NET Collections, or C++ STL), but which are often simplified or omitted altogether in verification research:

  • External iterators, with the possibility of statically checking that multiple iterators operating on the same structure are safe, that is modify it consistently.
  • Hash tables with arbitrary user-defined objects as keys and object-based key comparison, with the possibility of statically checking that modifying objects used as keys does not affect table consistency.
  • Rich, convenient APIs offering operations for searching, replacing, inserting, and removing elements, merging containers, converting between different containers, object-based comparison (i.e., comparing objects according to their content rather than by references pointing to it), and copy constructors.
  • Abstract specifications of data structures, consistent with the inheritance hierarchy, and provably complete for functional correctness.
  • Implementations that do not trade performance off for verifiability: array-based lists use ring buffers, hash tables automatically resize their bucket arrays to avoid performance degradation, and so on.

If some of these features do not sound impressive it is because we are blasé, used as we are to having them in any realistic implementations, but also happy to ignore or abstract them away when performing verification because they do not seem to introduce complex, as opposed to complicated, novel challenges. However the devil’s in the details, and some sources of complexity are emergent from the combination of seemingly plain and innocuous features. Even verifying something as workaday as copy constructors brought its own set of challenges, in ensuring that the copy is consistent with the abstract class model and with its relatives by means of inheritance.

Part of the challenge of verifying EiffelBase2 was achieving all this with a good level of automation and a reasonable amount of annotations (besides specifications) — in other words, with an effort proportional to the magnitude of the challenge. To this end, we developed a verification methodology and supporting tool (AutoProof, which I presented in a previous post) with the goals of providing flexibility and proportionality: verifying simple stuff should be straightforward, verifying complex stuff (such as EiffelBase2) should still be attainable but require a bigger annotation overhead. While I personally think that being able to interact directly with the prover for the most challenging proofs may support even greater levels of flexibility, the auto-active interaction mode of AutoProof may have hit on a sweet spot in the landscape of annotations vs. automation — at least for full functional correctness in the domain of data structure libraries.

You’re welcome to add comments about features of realistic software that are often overlooked in practical verification, as well as about other success stories of formal methods that narrowed the gap between lab and field practice.

Breaking news! A recent study found that Barack Obama is, with high probability, not an American citizen! The study — destined to revive the controversy that emerged during the President’s first presidential campaign — is based on new evidence and a simple analysis using widely accepted statistical inference tools. I’ll leave it to the political pundits to analyze the grave effects that this shocking finding surely will have on the upcoming presidential campaign. This post focuses on the elegant technical machinery used to reach the unsettling conclusion.

The crux of the analysis applies, in a statistical setting, modus tollens, a basic inference rule of logic. Given two facts X and Y such that if X is true then Y is true, modus tollens derives the falsehood of X from the falsehood of Y. In formal notation:

  \begin{matrix}  X \Longrightarrow Y, \quad \neg Y \\  \hline  \neg X  \end{matrix}

For example, take X to be “It rains” and Y to be “I have an umbrella with me”. From the fact that I am carrying no umbrella, by applying modus tollens, you can conclude that it’s not raining.

The next step introduces a simple generalization of modus tollens to the case where facts are true with some probability: if X is true then Y is true with high probability. Then, when Y happens to be false, we conclude that X is unlikely to be true. If I have an umbrella with me 99% of the times when it rains, there’s only a 1% chance that it rains if I have no umbrella with me.

All this is plain and simple, but it has surprising consequence when applied to the presidential case. A randomly sampled American citizen is quite unlikely to be the President; the odds are just 1 in 321-something millions. So we have that if “a person p is American” (or X) is true then “p is not the President” (or Y) is true with high probability. But Mr. Barack Obama happens to be the President, so he’s overwhelmingly unlikely to be American according to probabilistic modus tollens!

(The ironic part of the post ends here.)

Sure you’re thinking that this was a poor attempt at a joke. I would agree, were it not the case that the very same unsound inference rule is being applied willy-nilly in countless scientific papers in the form of statistical hypothesis testing. The basic statistical machinery, which I’ve discussed in a previous post, tells us that, under a null hypothesis H_0, a certain data D is unlikely to happen. In other words: if “the null hypothesis H_0” is true then “the data is different than D” is true with high probability. So far so good. But then the way this fact is used in practice is the following: if we observe the unlikely D in our experiments, we conclude that the null hypothesis is unlikely, and hence we reject it — unsoundly! How’s that for a joke?

Having seen for ourselves that modus tollens does not generalize to probabilistic inference, what is a correct inference from data to hypothesis testing? We can use Bayes’s theorem and phrase it in terms of conditional probabilities. P(X \mid Y) is the probability that X occurs given that Y has occurred. Then P(H_0 \mid D) — the probability that the null hypothesis H_0 is true given that we observed data D — is computed as P(D \mid H_0) \cdot P(H_0) / P(D). Even if we know that D is unlikely under the null hypothesis — P(D \mid H_0) is small — we cannot dismiss the null hypothesis with confidence unless we know something about the absolute prior probabilities of H_0 and D. To convince ourselves that Bayes’s rule leads to sound inference, we can apply it to the Barack Obama case: H_0 is “a person p is American” and D is “p is the President”. We plug the numbers in and do the simple math to see that P(H_0 \mid D), the probability that the President is American, is indeed one:

(1 / A) \cdot (A / W) / (1 / W) = 1, where A is the population of the USA and W is the world population. Bayes 1 – birthers 0.

Now you understand the fuss about statistical hypothesis testing that has emerged in numerous experimental sciences. Sadly, this blunder is not merely a possibility; it is quite likely that it has affected the validity of numerous published experimental “findings”. In fact, the inadequacy of statistical hypothesis testing is compounded by other statistical results such as the arbitrariness of a hard-and-fast confidence threshold, the false hypothesis paradox (when studying a rare phenomenon, that is a phenomenon with low base rates, most positive results are false positives), and self-selection (the few research efforts that detect some rare phenomenon will publish, whereas the overwhelming majority of “no effect” studies will not lead to publication). In an era of big data, these behaviors are only becoming more likely to emerge.

The take home message is simple yet important. Statistical hypothesis testing is insufficient, by itself, to derive sound conclusions about empirical observations. It must be complemented by other analysis techniques, such as data visualization, effect sizes, confidence intervals, and Bayesian analysis. Unless you remain convinced that Obama’s not American, Elvis is alive, and the Apollo moon landings were staged. In this case, this blog is not for you — with high probability.

Dante Alighieri, the doyen of Italian writers, was born somewhere in Florence, sometime between May and June, 750 years ago. To celebrate this anniversary as a long-standing fan of his work, I will illustrate some aspects of his writing that make his work so timeless and fascinating. I will cast them as “writing tips”, since you can hardly find a more convincing writer than Dante. Sure, he was more artistic than technical writer, but the basic principles of good writing are largely the same in either domain.

All of the references are from the Divine Comedy, where I cite Longfellow’s English translation next to the original Italian text (Petrocchi’s standard version). Both full texts are available with commentary here.

Start with content

Good writing starts with good content, and great writing starts with great content. High-quality content is characterized by originality, variety, and clear connections with existing knowledge.

Dante is thoroughly familiar with the major cultural figures of the past, as well as with his illustrious contemporaries. The overarching plot of the Commedia is built on a comprehensive vision of the universe and unrolls a smorgasbord of styles, moods, topics, and characters neatly organized in a tripartite structure (Inferno, Purgatorio, and Paradiso), opening with an introductory canto followed by 33 cantos per part. As if he were writing a detailed Related work section, he pays homage to many a great people of the past in the Inferno’s Canto IV, including poets (Homer, Ovid, Virgil) and their characters (Hector, Aeneas), philosophers (Plato, Aristotle, Democritus), and scientists of their time (Ptolemy, Avicenna). Dante’s Related work dutifully compares against related approaches and gives credit. He gratefully acknowledges Virgil for his writing style:

   

and considers his own writing work sixth in “impact”, following Homer, Horace, Ovid, Lucan, Virgil:

   

Abstract to generalize

When the subject matter allows so, good writing includes abstract parts that generalize its subjects. Abstraction is the key to producing content that is of general interest and whose message goes beyond the specific, contingent context in which it was originally produced.

Dante’s idea of the cosmos is irremediably outdated — not to mention flat-out wrong — based as it is on Ptolemy’s astronomical models, Aristotle’s view of the physical world, and a variety of arcane inconsistent theological arguments. Nonetheless, his writings are everlasting because his characters and images are underlain by general concepts that represent timeless mental categories. Showing a knowledge of geometry which is unusual for a poet, he describes the heavens as a series of concentric spheres of decreasing radius. The smallest sphere — symbolizing God — reduces to a point, which however is said to encompass all the larger spheres as well as the Earth. We can explain the paradox by considering the concentric spheres three-dimensional projections of four-dimensional objects of increasing radius, whose projections appear as spheres of decreasing size. The complex, abstract image makes it suggestive even if unrealistic, and is an unintended forerunner of speculative ideas in modern physics about higher-dimensional universes (whose poetic descriptions can draw inspiration from Dante!).

(For technical details, see Odifreddi‘s explanation in Italian. If you cannot read Italian, Unix to the rescue: download the Python library goslate.py and run

to get an English translation on the fly.)

Make realistic — not necessarily real — examples

Abstraction is balanced by using realistic images as examples. Realistic means based on tangible, generally understandable situations, but not necessarily faithful reflection of the actual reality. Just like abstraction, exemplification is a means of expression rather than an end in itself.

Dante’s main subject matter is drastically removed from everyday experience, being concerned with a hypothetical afterworld populated by ghosts of long departed figures. To give such an abstract context a tangible flavor, he deploys a motley variety of metaphors and similes based on concrete physical images that everybody has experienced. Describing his leaving the “selva oscura” (itself a metaphor for moral perdition), Dante compares himself and his feelings to those of a shipwreck survivor who has just escaped with great difficulty from the treacherous waters onto the shore:

   

Adapt style to content

Every writer has their style, but the best writers are fluent in many different styles, which they will select according to what is most effective to convey the content at hand and to fit the context.

Dante called his masterpiece Commedia also to indicate its sweeping variety of styles and forms. Styles go from the gentle “dolce stilnovo” suitable to talk about love:

   

to the severe chastising of Italian political practices:

   

to the desperate grieving tone of human tragedy:

   

Write memorable words

Good writing is unforgettable for its eloquence. It always picks the mots justes. It is penetrating, often strong; it does not hedge.

Dante peppered his work with indelible passages, which make for great quotations thanks to their succinctness and persuasive eloquence. Nine lines are all it takes his Ulysses to sweep up his crew and persuade them to embark in the ultimate ill-fated exploration expedition:

   

There are countless other aspects that make Dante’s work still so fascinating after over seven centuries. While some of them may appeal only to certain readers, or may reflect outdated cultural views, his style still goes a long way in terms of influence, interest, and inspiration.

Walter Tichy recently interviewed Westley Weimer and Martin Monperrus on the subject of Automated Bug Fixing; the interesting interview appears on the March 2015 issue of ACM Ubiquity and is available here. In a follow-up message, he invited the AutoFix team — which includes yours truly — to comment on the interview. In this post I take up his suggestion and briefly discuss the approach to automated bug fixing that we developed in the AutoFix project. Even if I’ll refer to the whole AutoFix team (which also includes Yu “Max” Pei, Yi “Jason” Wei, Martin Nordio, Bertrand Meyer, and Andreas Zeller), these comments reflect my own views — this is my blog, after all 😉 Nonetheless I believe my co-authors’ opinions are in line with mine; and everybody’s welcome to leave comments to this post if there’s anything I missed that they want to point out.

The term “automated bug fixing” — or, as it is more commonly referred to in software engineering research, “automated program repair” — denotes techniques and tools capable of suggesting changes that remove faulty behavior of programs. The fixes are often in the form of source-code patches but can also target binary executables or other representations. The most widespread approaches are driven by tests: failing tests identify incorrect behavior that must be corrected; a fix is a change that turns failing tests into passing tests (that is, it corrects some faulty behavior) without affecting the behavior on originally passing tests (that is, it preserves correct behavior).

AutoFix is our technique for automated program repair. Its main distinguishing feature — apparent from the titles of the main publications describing it and listed below — is that it takes advantage of contracts (assertions such as pre- and postconditions) available in the source code. Contracts, as they are normally used in Eiffel or in other contract-equipped languages, are simple specification elements embedded in the program text. The key word here is simple: don’t think quantifiers, first-order logic, and formal verification; think list /= Void, set.count > 0, and runtime checking.

How does AutoFix use contracts? The main usage is actually independent of the details of the program repair technique: AutoFix generates random unit tests (using another tool called AutoTest) and uses contracts as oracles to classify them into passing and failing. Contracts also remain available for runtime checking throughout the various steps of the AutoFix algorithm. For example, AutoFix uses that information to sharpen the precision of its fault localization algorithms. (As Weimer explains in the interview, fault localization answers the question: “where in the program should we edit?”, before fix generation decides “what should we edit it to?”.) Contract checking also helps regression testing of candidate fixes, and the Boolean expression of a violated assertion may suggest cases that require special handling, such as an empty list or an index on an array’s boundary.

In terms of final output, the greater precision in fault localization and fix generation enabled by having contracts helps build good-quality fixes with limited search. Indeed, emphasis on the quality and acceptability of fixes is an aspect that we stressed early on in our research on AutoFix, predating the growing concern with similar aspects that characterizes some recent related work (such as PAR, Monperrus’s, and Rinard’s). Since the fixes produced by automated program repair are validated against a finite set of tests, they come with no formal guarantees of correctness or even adequacy. The work with AutoFix is no exception here because it deals with simple, and hence glaringly incomplete, contracts that cannot be used for static verification. In our experiments, however, we have manually inspected the fixes produced by AutoFix to assess their quality from a programmer’s perspective. It turns out that over half of the valid fixes produced by AutoFix are what we call proper; that is, they not only pass the available tests (a requirement that may be trivial when the tests are few or of poor quality) but are ostensibly consistent with the program’s implicit specification and can be applied to the codebase without introducing undesired behavior.

Performance is another aspect indirectly impacted by improving precision through using contracts: since AutoFix’s repair process explores a search space that is better structured by means of the annotations, it needs limited computational resources. Running our experiments with AutoFix in a cloud computing environment would cost, on average, only few cents per produced fix.

In the quickly growing landscape of automated program repair techniques, AutoFix offers an interesting, particular trade-off that is different than other approaches. On the one hand, relying on contracts and related techniques restricts AutoFix’s applicability to programs annotated with some contracts; in exchange for this additional burden (which is not unreasonable!) AutoFix can produce, requiring limited computational resources, fixes that often are of high quality.

References