These are good times for formal verification. Amazon Web Services is using formal specifications and model checking to support rigorous design at a very large scale. Facebook, of all companies, is getting serious about static analysis, with Monoidics’s separation logic gurus and Francesco Logozzo of Clousot fame among its recent hires. John Carmack, as a programmer, is aggressively pursuing static analysis. Only thanks to formal verification has a remote but pernicious bug in Java’s standard library sort been discovered. It’s clear that — slowly, one step at a time — the offerings of formal methods are becoming practically helpful.

Still, the transition from pure research to practical applications requires time, in formal verification as in other areas. Among the intermediate steps that have to be covered there’s dealing with instances that are realistic: not necessarily full-blown industrial-strength, but not discounting any features that are expected of a fully usable implementation. Realistic software also moves past the challenges of benchmarks, which typically focus on one or few complicated aspects while eliding features that are considered commonplace and hence not interestingly problematic.

Nadia Polikarpova‘s presentation at the Formal Methods Symposium in Oslo a few weeks ago was precisely about dealing with realistic software in verification: our paper describes the verification of EiffelBase2, a realistic general-purpose data-structure library, against its complete functional specification.

Before you roll your eyes thinking, “Seriously?! Verification of data structures in 2015? Wasn’t the problem solved in the 1970s? Or maybe earlier! And they gave them a best paper award for this stuff? Unbelievable!”, let me explain what’s new and, hopefully, interesting about this work. It is true that the amount of work on verifying data structures is overwhelming. Not only because the effort poses interesting challenges but also because data structures are a domain that is naturally amenable to formalization and analysis: everybody agrees on the expected behavior of, say, a linked stack, and it can be formalized succinctly. In other domains (reactive software for example) even specification is a challenge in terms of both “what” and “how”. Still and all, most data-structure verification work focuses on demonstrating certain aspects of a technique and is applicable to implementations written in a certain way, possibly with some restrictions, and not in others. In contrast, Nadia implemented and specified EiffelBase2 before verification (even if, naturally, some later adjustment were necessary to complete the verification effort) with the goal of providing a usable realistic container collection without compromising on any aspect (object-oriented design, API usability, implementation performance, …) that is to be expected in practice. In fact, EiffelBase2 now ships with the EiffelStudio compiler as an alternative to the old EiffelBase library. (It did not fully replace it only for backward compatibility reasons.)

Here are a few examples of EiffelBase2 features that you would expect in a usable data-structure library (think java.util or .NET Collections, or C++ STL), but which are often simplified or omitted altogether in verification research:

• External iterators, with the possibility of statically checking that multiple iterators operating on the same structure are safe, that is modify it consistently.
• Hash tables with arbitrary user-defined objects as keys and object-based key comparison, with the possibility of statically checking that modifying objects used as keys does not affect table consistency.
• Rich, convenient APIs offering operations for searching, replacing, inserting, and removing elements, merging containers, converting between different containers, object-based comparison (i.e., comparing objects according to their content rather than by references pointing to it), and copy constructors.
• Abstract specifications of data structures, consistent with the inheritance hierarchy, and provably complete for functional correctness.
• Implementations that do not trade performance off for verifiability: array-based lists use ring buffers, hash tables automatically resize their bucket arrays to avoid performance degradation, and so on.

If some of these features do not sound impressive it is because we are blasé, used as we are to having them in any realistic implementations, but also happy to ignore or abstract them away when performing verification because they do not seem to introduce complex, as opposed to complicated, novel challenges. However the devil’s in the details, and some sources of complexity are emergent from the combination of seemingly plain and innocuous features. Even verifying something as workaday as copy constructors brought its own set of challenges, in ensuring that the copy is consistent with the abstract class model and with its relatives by means of inheritance.

Part of the challenge of verifying EiffelBase2 was achieving all this with a good level of automation and a reasonable amount of annotations (besides specifications) — in other words, with an effort proportional to the magnitude of the challenge. To this end, we developed a verification methodology and supporting tool (AutoProof, which I presented in a previous post) with the goals of providing flexibility and proportionality: verifying simple stuff should be straightforward, verifying complex stuff (such as EiffelBase2) should still be attainable but require a bigger annotation overhead. While I personally think that being able to interact directly with the prover for the most challenging proofs may support even greater levels of flexibility, the auto-active interaction mode of AutoProof may have hit on a sweet spot in the landscape of annotations vs. automation — at least for full functional correctness in the domain of data structure libraries.

You’re welcome to add comments about features of realistic software that are often overlooked in practical verification, as well as about other success stories of formal methods that narrowed the gap between lab and field practice.

Breaking news! A recent study found that Barack Obama is, with high probability, not an American citizen! The study — destined to revive the controversy that emerged during the President’s first presidential campaign — is based on new evidence and a simple analysis using widely accepted statistical inference tools. I’ll leave it to the political pundits to analyze the grave effects that this shocking finding surely will have on the upcoming presidential campaign. This post focuses on the elegant technical machinery used to reach the unsettling conclusion.

The crux of the analysis applies, in a statistical setting, modus tollens, a basic inference rule of logic. Given two facts $X$ and $Y$ such that if $X$ is true then $Y$ is true, modus tollens derives the falsehood of $X$ from the falsehood of $Y$. In formal notation:

$\begin{matrix} X \Longrightarrow Y, \quad \neg Y \\ \hline \neg X \end{matrix}$

For example, take $X$ to be “It rains” and $Y$ to be “I have an umbrella with me”. From the fact that I am carrying no umbrella, by applying modus tollens, you can conclude that it’s not raining.

The next step introduces a simple generalization of modus tollens to the case where facts are true with some probability: if $X$ is true then $Y$ is true with high probability. Then, when $Y$ happens to be false, we conclude that $X$ is unlikely to be true. If I have an umbrella with me 99% of the times when it rains, there’s only a 1% chance that it rains if I have no umbrella with me.

All this is plain and simple, but it has surprising consequence when applied to the presidential case. A randomly sampled American citizen is quite unlikely to be the President; the odds are just 1 in 321-something millions. So we have that if “a person $p$ is American” (or $X$) is true then “$p$ is not the President” (or $Y$) is true with high probability. But Mr. Barack Obama happens to be the President, so he’s overwhelmingly unlikely to be American according to probabilistic modus tollens!

(The ironic part of the post ends here.)

Sure you’re thinking that this was a poor attempt at a joke. I would agree, were it not the case that the very same unsound inference rule is being applied willy-nilly in countless scientific papers in the form of statistical hypothesis testing. The basic statistical machinery, which I’ve discussed in a previous post, tells us that, under a null hypothesis $H_0$, a certain data $D$ is unlikely to happen. In other words: if “the null hypothesis $H_0$” is true then “the data is different than $D$” is true with high probability. So far so good. But then the way this fact is used in practice is the following: if we observe the unlikely $D$ in our experiments, we conclude that the null hypothesis is unlikely, and hence we reject it — unsoundly! How’s that for a joke?

Having seen for ourselves that modus tollens does not generalize to probabilistic inference, what is a correct inference from data to hypothesis testing? We can use Bayes’s theorem and phrase it in terms of conditional probabilities. $P(X \mid Y)$ is the probability that $X$ occurs given that $Y$ has occurred. Then $P(H_0 \mid D)$ — the probability that the null hypothesis $H_0$ is true given that we observed data $D$ — is computed as $P(D \mid H_0) \cdot P(H_0) / P(D)$. Even if we know that $D$ is unlikely under the null hypothesis — $P(D \mid H_0)$ is small — we cannot dismiss the null hypothesis with confidence unless we know something about the absolute prior probabilities of $H_0$ and $D$. To convince ourselves that Bayes’s rule leads to sound inference, we can apply it to the Barack Obama case: $H_0$ is “a person $p$ is American” and $D$ is “$p$ is the President”. We plug the numbers in and do the simple math to see that $P(H_0 \mid D)$, the probability that the President is American, is indeed one:

$(1 / A) \cdot (A / W) / (1 / W) = 1$, where $A$ is the population of the USA and $W$ is the world population. Bayes 1 – birthers 0.

Now you understand the fuss about statistical hypothesis testing that has emerged in numerous experimental sciences. Sadly, this blunder is not merely a possibility; it is quite likely that it has affected the validity of numerous published experimental “findings”. In fact, the inadequacy of statistical hypothesis testing is compounded by other statistical results such as the arbitrariness of a hard-and-fast confidence threshold, the false hypothesis paradox (when studying a rare phenomenon, that is a phenomenon with low base rates, most positive results are false positives), and self-selection (the few research efforts that detect some rare phenomenon will publish, whereas the overwhelming majority of “no effect” studies will not lead to publication). In an era of big data, these behaviors are only becoming more likely to emerge.

The take home message is simple yet important. Statistical hypothesis testing is insufficient, by itself, to derive sound conclusions about empirical observations. It must be complemented by other analysis techniques, such as data visualization, effect sizes, confidence intervals, and Bayesian analysis. Unless you remain convinced that Obama’s not American, Elvis is alive, and the Apollo moon landings were staged. In this case, this blog is not for you — with high probability.

Dante Alighieri, the doyen of Italian writers, was born somewhere in Florence, sometime between May and June, 750 years ago. To celebrate this anniversary as a long-standing fan of his work, I will illustrate some aspects of his writing that make his work so timeless and fascinating. I will cast them as “writing tips”, since you can hardly find a more convincing writer than Dante. Sure, he was more artistic than technical writer, but the basic principles of good writing are largely the same in either domain.

All of the references are from the Divine Comedy, where I cite Longfellow’s English translation next to the original Italian text (Petrocchi’s standard version). Both full texts are available with commentary here.

Good writing starts with good content, and great writing starts with great content. High-quality content is characterized by originality, variety, and clear connections with existing knowledge.

Dante is thoroughly familiar with the major cultural figures of the past, as well as with his illustrious contemporaries. The overarching plot of the Commedia is built on a comprehensive vision of the universe and unrolls a smorgasbord of styles, moods, topics, and characters neatly organized in a tripartite structure (Inferno, Purgatorio, and Paradiso), opening with an introductory canto followed by 33 cantos per part. As if he were writing a detailed Related work section, he pays homage to many a great people of the past in the Inferno’s Canto IV, including poets (Homer, Ovid, Virgil) and their characters (Hector, Aeneas), philosophers (Plato, Aristotle, Democritus), and scientists of their time (Ptolemy, Avicenna). Dante’s Related work dutifully compares against related approaches and gives credit. He gratefully acknowledges Virgil for his writing style:

and considers his own writing work sixth in “impact”, following Homer, Horace, Ovid, Lucan, Virgil:

### Abstract to generalize

When the subject matter allows so, good writing includes abstract parts that generalize its subjects. Abstraction is the key to producing content that is of general interest and whose message goes beyond the specific, contingent context in which it was originally produced.

Dante’s idea of the cosmos is irremediably outdated — not to mention flat-out wrong — based as it is on Ptolemy’s astronomical models, Aristotle’s view of the physical world, and a variety of arcane inconsistent theological arguments. Nonetheless, his writings are everlasting because his characters and images are underlain by general concepts that represent timeless mental categories. Showing a knowledge of geometry which is unusual for a poet, he describes the heavens as a series of concentric spheres of decreasing radius. The smallest sphere — symbolizing God — reduces to a point, which however is said to encompass all the larger spheres as well as the Earth. We can explain the paradox by considering the concentric spheres three-dimensional projections of four-dimensional objects of increasing radius, whose projections appear as spheres of decreasing size. The complex, abstract image makes it suggestive even if unrealistic, and is an unintended forerunner of speculative ideas in modern physics about higher-dimensional universes (whose poetic descriptions can draw inspiration from Dante!).

(For technical details, see Odifreddi‘s explanation in Italian. If you cannot read Italian, Unix to the rescue: download the Python library goslate.py and run

to get an English translation on the fly.)

### Make realistic — not necessarily real — examples

Abstraction is balanced by using realistic images as examples. Realistic means based on tangible, generally understandable situations, but not necessarily faithful reflection of the actual reality. Just like abstraction, exemplification is a means of expression rather than an end in itself.

Dante’s main subject matter is drastically removed from everyday experience, being concerned with a hypothetical afterworld populated by ghosts of long departed figures. To give such an abstract context a tangible flavor, he deploys a motley variety of metaphors and similes based on concrete physical images that everybody has experienced. Describing his leaving the “selva oscura” (itself a metaphor for moral perdition), Dante compares himself and his feelings to those of a shipwreck survivor who has just escaped with great difficulty from the treacherous waters onto the shore:

Every writer has their style, but the best writers are fluent in many different styles, which they will select according to what is most effective to convey the content at hand and to fit the context.

Dante called his masterpiece Commedia also to indicate its sweeping variety of styles and forms. Styles go from the gentle “dolce stilnovo” suitable to talk about love:

to the severe chastising of Italian political practices:

to the desperate grieving tone of human tragedy:

### Write memorable words

Good writing is unforgettable for its eloquence. It always picks the mots justes. It is penetrating, often strong; it does not hedge.

Dante peppered his work with indelible passages, which make for great quotations thanks to their succinctness and persuasive eloquence. Nine lines are all it takes his Ulysses to sweep up his crew and persuade them to embark in the ultimate ill-fated exploration expedition:

There are countless other aspects that make Dante’s work still so fascinating after over seven centuries. While some of them may appeal only to certain readers, or may reflect outdated cultural views, his style still goes a long way in terms of influence, interest, and inspiration.

Walter Tichy recently interviewed Westley Weimer and Martin Monperrus on the subject of Automated Bug Fixing; the interesting interview appears on the March 2015 issue of ACM Ubiquity and is available here. In a follow-up message, he invited the AutoFix team — which includes yours truly — to comment on the interview. In this post I take up his suggestion and briefly discuss the approach to automated bug fixing that we developed in the AutoFix project. Even if I’ll refer to the whole AutoFix team (which also includes Yu “Max” Pei, Yi “Jason” Wei, Martin Nordio, Bertrand Meyer, and Andreas Zeller), these comments reflect my own views — this is my blog, after all 😉 Nonetheless I believe my co-authors’ opinions are in line with mine; and everybody’s welcome to leave comments to this post if there’s anything I missed that they want to point out.

The term “automated bug fixing” — or, as it is more commonly referred to in software engineering research, “automated program repair” — denotes techniques and tools capable of suggesting changes that remove faulty behavior of programs. The fixes are often in the form of source-code patches but can also target binary executables or other representations. The most widespread approaches are driven by tests: failing tests identify incorrect behavior that must be corrected; a fix is a change that turns failing tests into passing tests (that is, it corrects some faulty behavior) without affecting the behavior on originally passing tests (that is, it preserves correct behavior).

AutoFix is our technique for automated program repair. Its main distinguishing feature — apparent from the titles of the main publications describing it and listed below — is that it takes advantage of contracts (assertions such as pre- and postconditions) available in the source code. Contracts, as they are normally used in Eiffel or in other contract-equipped languages, are simple specification elements embedded in the program text. The key word here is simple: don’t think quantifiers, first-order logic, and formal verification; think list /= Void, set.count > 0, and runtime checking.

How does AutoFix use contracts? The main usage is actually independent of the details of the program repair technique: AutoFix generates random unit tests (using another tool called AutoTest) and uses contracts as oracles to classify them into passing and failing. Contracts also remain available for runtime checking throughout the various steps of the AutoFix algorithm. For example, AutoFix uses that information to sharpen the precision of its fault localization algorithms. (As Weimer explains in the interview, fault localization answers the question: “where in the program should we edit?”, before fix generation decides “what should we edit it to?”.) Contract checking also helps regression testing of candidate fixes, and the Boolean expression of a violated assertion may suggest cases that require special handling, such as an empty list or an index on an array’s boundary.

In terms of final output, the greater precision in fault localization and fix generation enabled by having contracts helps build good-quality fixes with limited search. Indeed, emphasis on the quality and acceptability of fixes is an aspect that we stressed early on in our research on AutoFix, predating the growing concern with similar aspects that characterizes some recent related work (such as PAR, Monperrus’s, and Rinard’s). Since the fixes produced by automated program repair are validated against a finite set of tests, they come with no formal guarantees of correctness or even adequacy. The work with AutoFix is no exception here because it deals with simple, and hence glaringly incomplete, contracts that cannot be used for static verification. In our experiments, however, we have manually inspected the fixes produced by AutoFix to assess their quality from a programmer’s perspective. It turns out that over half of the valid fixes produced by AutoFix are what we call proper; that is, they not only pass the available tests (a requirement that may be trivial when the tests are few or of poor quality) but are ostensibly consistent with the program’s implicit specification and can be applied to the codebase without introducing undesired behavior.

Performance is another aspect indirectly impacted by improving precision through using contracts: since AutoFix’s repair process explores a search space that is better structured by means of the annotations, it needs limited computational resources. Running our experiments with AutoFix in a cloud computing environment would cost, on average, only few cents per produced fix.

In the quickly growing landscape of automated program repair techniques, AutoFix offers an interesting, particular trade-off that is different than other approaches. On the one hand, relying on contracts and related techniques restricts AutoFix’s applicability to programs annotated with some contracts; in exchange for this additional burden (which is not unreasonable!) AutoFix can produce, requiring limited computational resources, fixes that often are of high quality.

#### References

I hope nobody took the previous post too seriously. It was meant as a satirical joke, pointing out several shortcomings of the notorious LaTeX vs. Word comparison study. I still want to say a few things on the topic, though; and this time I’m completely serious.

I realize that LaTeX can be daunting for scientists who have little programming (or, more generally, computer science) background, to the point that they may well opt for user-friendlier alternatives. I cannot deny that knowing one’s way around LaTeX requires some time; and, until they reach a good level, LaTeX users may be slowed down and ineffective. However, I think it is important to at least understand why LaTeX remains predominant in scientific publishing; that is, what benefits come after a sufficient investment in learning to use it. If you have the same background I have, this post will probably sound completely obvious. Otherwise, read on.

The key features of LaTeX as a document preparation system are:

1. LaTeX is a markup language, that is it consists of annotations combined with the actual content to be displayed; the annotations define structure and appearance of the content, but both are expressed as plain text.
2. LaTeX is a set of commands that translate down to TeX and Metafont.

Both features contribute to the user unfriendliness of LaTeX compared to so-called WYSIWYG document preparation systems such as Word; but also are what makes LaTeX superior in practically every other aspect. To put it in concrete terms: I think it’s entirely possible that LaTeX will eventually be replaced, for scientific publishing, by a different document preparation system (Markdown extensions anyone?). But, if this will ever happen, the replacement system will certainly be another markup language; and will very likely translate down to TeX and Metafont.

### Why markup languages are better

Markup languages, where all information is encoded in plain text, are naturally amenable to editing using a whopping variety of tools that also operate on text files. Your favorite text editor — including Word itself, if you insist — is also a LaTeX editor, with spell checking and all other functionalities readily available. Text manipulation tools based on regular expressions — grep, sed, awk, perl, you name it — are directly applicable to LaTeX texts. Data processing environments — such as R for statistical analysis — can easily export data to LaTeX since all they have to produce is suitably encoded text output.

Version control tools, such as Subversion and Git, use lines of text as fundamental units. Hence, they work out of the box on LaTeX files to provide powerful support to collaborative editing and versioning. This goes well beyond the primitive capabilities of Word’s “track changes” function: authors can edit in parallel the same source file, inspect each other’s changes, and automatically merge individual variants into a master copy. Easy-to-read visual presentations of changes are still possible through a variety of graphical interfaces to version control software, or by means of other tools that work on text (try latexdiff for a presentation that looks like Word’s track changes).

Markup languages tend to enforce a clear separation between content and presentation. I wrote “tend to” because some markup languages do a better job than others at separating structural markup and presentation markup. LaTeX does not excel at this, since it does not always clearly distinguish between, say, an annotation that marks the beginning of a section (a structural detail) and one that marks a word to be displayed in a different font (a presentation detail). Nevertheless, writing documents in LaTeX lets you focus on content much better than WYSIWYG systems like Word, which are centered around commands to change presentation (font type and size, line spacing, page breaks, and so on) and offer limited or hard-to-use support to add complex structural information (section and subsection titles, cross references, and so on). In fact, I have the impression that only advanced Word users take advantage of functionalities such as to define styles and indexes. The situation in LaTeX is reversed: inexperienced users learn only very basic presentation-changing commands (such as to have text in bold and italic), but know well how to mark the beginning of sections, and how to introduce cross-references, bibliographic entries, and other structural information. Changing presentation details such as line spacing, font type, or the spacing of rows in tables is instead the domain of the LaTeX expert, who has hopefully also learned not to change them from the default, defined in a stylesheet, unless there is a compelling reason. In summary, markup languages help you focus on content, defer fixing presentation details to when the content is stable, and change presentation uniformly and robustly based on the intended structure of text.

Compared to intuitive WYSIWYG tools, using markup languages may slow you down: you may have to lookup the language’s syntax to do things you don’t know or remember from previous usage; and achieving the desired presentation form requires planning the organization of a document before starting writing, or however modifying it consistently at some point. But this kind of slowing down is a good thing: planning and paying attention to the structure beforehand is generally necessary to achieve a good document design.

### Why TeX is better

To understand the unique strengths of TeX (and its companion Metafont) we should have an idea of its history, whose protagonist is the legendary computer scientist Don Knuth. Working on a new edition of his magnum opus “The art of computer programming”, he became strongly dissatisfied with the typographic quality achieved by state-of-the-art technologies, in particular when it came to typesetting mathematics. To solve the problem for good, he started designing the TeX system in 1977, expecting to finish the project during his sabbatical in 1978; he was off only by a little over 10 years. Knuth tells the story in his own words in this interesting interview (the linked segment is about how much time and fretting took him to devise a suitable scalable algorithmic definition of the shape for the letter “S”: I’m pretty sure he would disagree with the LaTeX vs. Word study regarding “the appearance of the text [being] secondary to the scientific merit of an article”).

As a result, TeX is a top-notch thorough solution to the problem of providing a typesetting system that offers very high-quality output on every machine where it runs. Almost 30 years since its first stable release, and with all its idiosyncrasies, TeX’s core remains technically unsurpassed and a de facto standard.

Several features of LaTeX derive from using TeX as back-end. An important one is the possibility of defining powerful macros — a feature LaTeX directly inherits from TeX (in fact, LaTeX is essentially an extensive collection of TeX macro). The possibility of having complete control on the back-end through macros provides great flexibility to LaTeX users, if at the expense of having to deal with an occasionally abstruse syntax.

### Then: LaTeX?

If you care about typographic quality; if you think it is conducive to — not in contrast with — clarity of presentation and quality of ideas; if you’re willing to invest time now to become way more productive later; if you would like to take advantage of the outcome of decades of top-of-the-line tool development; and if you think microwaved TV dinners are a poor substitute for properly cooked food. Then, you may want to give LaTeX a serious try; or to start designing the next markup language that will replace it while remaining compatible with the TeX ecosystem.