An empirical study comparing LaTeX to Word for preparing scientific documents appeared on PLOS ONE in mid December, just in time to ignite an intense online discussion that took place over the holiday season. Among the many comments that I read, one tweet by Michele Lanza made me realize that the study was more general than I originally thought:

The Word vs. LaTeX efficiency argument is in line with microwave food vs. actual cooking

What a terrific suggestion for related research! Back from the winter break, I decided to adapt the design of the LaTeX vs. Word study in order to pit microwave food against proper cooking. The results are, I believe, quite conclusive; and my recommendations based on them compelling. While I submitted a detailed write-up for publication, I want to give you a summary of the essential findings in the rest of this post. In my empirical research, I stuck to the original study’s design as closely as possible since it seemed as much apropos for comparing cooking methods as it was for comparing document preparation systems. In the write-up that follows you will recognize passages that mirror closely the structure and even the words of the original LaTeX vs. Word paper, since those words speak for themselves — and imitation is the sincerest form of flattery.

### The study

This empirical study compares microwave usage to full-fledged cooking for food preparation. The experimental methodology was straightforward. We gathered 40 volunteers who use to cook their dinners — some using a microwave to defrost preprocessed food, some cooking from raw ingredients following a recipe. All participants used their own kitchen to run the experiments.

We set up three different sample meals: (1) a TV dinner with two food compartments; (2) a different brand of TV dinner with four food compartments; and (3) a dish consisting of 100 grams of spaghetti aglio e olio. Each participant had 30 minutes to cook each meal using his or her chosen food preparation technique, process, and equipment. Who chose full-fledged cooking was given access to a repository of raw ingredients; and who chose microwaving was given a supply of common TV dinners of different brands. The performance of each participant was measured for each sample meal by three variables: (1) the number of visual differences (food layout, appearance, color) between the cooked meal and the sample; (2) the number of differences in flavor between the cooked meal and the sample; and (3) the amount of hot edible mass produced within 30 minutes. Each participant also completed a questionnaire where they self-evaluated their performance.

The experimental results are unequivocal. Microwave users outperformed traditional cooks on most measures (p < 0.05[/latex], often even $p < 0.01$), with the only exception of the spaghetti dish. Even the most expert cooks were unable to reproduce, from raw ingredients, TV dinners that look and taste like the sample TV dinners, in contrast to the novice microwave users who managed to effortlessly heat to near perfection large amounts of prepackaged food.

These results suggest that full-fledged cooking should be adopted only to prepare complex dishes mainly consisting of oil and garlic. Experienced cooks may argue that the overall quality and healthiness of properly cooked food provides for a meal experience that is considerably more enjoyable than ingurgitating a tasteless, fat-laden, appalling microwave dinner. Although this argument may be true, the differences with the recently introduced versions of TV dinners may be a tad less spectacularly obvious than they were in the past. Thus, the results show that no reasons exist to use traditional means of cooking, except possibly for dishes that contain complex combinations of olive oil and pasta.

An unexpected result of the study was that, according to the questionnaire's answers, full-fledged cooks are highly satisfied with their experience. Despite incurring reduced usability and productivity, they assessed their work as less tiresome, less frustrating, and more enjoyable than microwave users. From a psychological perspective, the most reasonable explanation is that the cooks are delusional lunatics who are unwilling to reconsider their manifestly incorrect beliefs about their cooking ability in light of their actual poor results in faithfully reproducing low-grade industrial food.

The study's results also have implications in terms of costs of food preparation and consumption by the public. Individuals have a responsibility to act economically and efficiently, especially in cases in which their occupation is publicly funded. No reliable data is available about how many publicly-employed workers cook their own meals, and correspondingly it is unclear the amount of taxpayer's money that is spent worldwide by individuals who stubbornly insist on cooking food from raw ingredients over sticking to a more efficient and modern meal preparation system, which would free up their time to advance their respective field of occupation.

I therefore suggest that leading public institutions should consider accepting time-squandering food preparation practices by their employees only if this is justified by the prevalence of dishes involving spaghetti and garlic. In all other cases, said institutions should request employees to eat microwave food (or order take out). We believe that this would be a good policy for two reasons. First, the flavor and appearance of food is secondary to its nutritional values. And, second, preventing people from frittering away scarce culinary resources would save time and money to maximize the benefit of work and development for both individual institutions and the public.

P.S.: Some readers suggested two additional aspects in which microwave cooking is superior. First, adjusting the heating power is much easier with a microwave (where pushing a button immediately interrupts the flow of radiation) than with a traditional stove (where the stove's surface may remain hot for minutes even after power is turned off). And, second, crispy food can be properly cooked using the recently introduced hot air circulation system available in several high-end microwave ovens.

“Welcome to the first class of our course on formal verification. I’d like to start with some motivation: why verification is important in the real world, and what the consequences of lack of verification in software development are.” Lights go off. You hear a countdown recited in French, “Trois, deux, unité, feu!“. On the screen, a rocket blasts off its platform into the sky. Thirty-something seconds into its ascension, it awkwardly topples off and quickly disintegrates. “That was the Ariane 5 rocket’s first launch, which failed due to a software bug. The failure resulted in an estimated 300 million Euros loss. Software errors can even cost human lives, such as in the case of the Therac-25 machine for radiation therapy, whose malfunctioning resulted in several patients dying from radiation burns. Formal verification can ensure that software is free from defects before it’s deployed, thus avoiding catastrophic disasters such as those just mentioned. Now here’s the course syllabus…”

OK, I admit to slightly exaggerating the above for dramatic purposes. But, if you ever took a course on formal verification, chances are that it started with some graphic pictures of exploding rockets, unresponsive Mars rovers, or defective medical equipment, followed by stark tallies of the enormous losses, in money or human lives, resulting from the disasters. The bottom line: software quality is generally abysmal; and the systematic all-out application of formal verification is our only hope to avert further similar disasters from occurring in the future.

What’s the problem with this angle, to which I may have also been guilty of paying lip service in the past? I think this way of pitching formal verification points to motivations that are not fairly representative of the varied landscape in the research and practice of formal methods, and risks alienating software development practitioners, who have different concerns in their everyday’s work — concerns which conflict with the oversimplified picture painted above.

The first problem I see is that pointing to a handful of major disasters due to software failures may be construed as suggesting that formal verification is something only justified for mission critical applications. What about the myriad software projects that don’t cost hundreds of million dollars and whose failure does not result in ruinous losses? Is there a place for formal methods within such projects’ modest budgets? Related to these points is the misled vision that formal verification is an all-or-nothing business: either it enters every step of development or you might as well do things as you’ve always done them — but then disaster will impend. The flip side is that the only failures that matter are the catastrophic ones; everything else is not a big deal. In reality, everyday software developers generally remain genuinely concerned about software quality even when working on non mission-critical development; but also have to cope with all sorts of project constraints pulling in different directions.

A better way to frame the usage of formal verification is in terms of a continuum of trade-offs. Each trade-off balances some of the additional costs incurred by applying verification techniques against some gains resulting from higher reliability or confidence in the software’s correctness. This way, each project can pick the trade-off that best fits its requirements and budget. It’s easy to guess that the overwhelming majority of development projects occupy some intermediate position in the trade-off continuum: their failures won’t result in apocalyptic losses, nor are they of no consequence whatsoever. These are the projects that most practitioner developers are familiar with and can relate to. To each their own: Paris Métro Line 14 deployed the B method to build correct-by-construction modules through refinement; for my students learning Java and C#, writing a decent collection of unit tests and running a lint-like tool on their projects is good enough.

The second problem with the above pitching of formal methods is that it may suggest that they are a silver bullet. You want high reliability: just use formal methods (but it will cost you)! It doesn’t matter what your development process is, what programming language you’re using, whether your software is interactive, if you’re deploying in a legacy system, what your amount of technical debt is. Start using [your favorite verification technique] and your problems will evaporate!

But reliability, and practically every desirable quality property, is never the result of a single technique, tool, design choice, or individual talent. It can only follow from a sensible, well-organized process; one that is transparent, traceable, technologically up to date, based on a clear understanding of the application domain, and so on. The investigations following the failures of the examples mentioned above (Ariane 5, Therac-25, etc.) invariably pointed to communication and process shortcoming as well as, often more egregious than, technical ones. Correctness cannot be built as an afterthought; it requires planning and practice. Based on his experience at Microsoft, James Larus claimed that a root cause of software bugs is “missing, outdated, and incorrect knowledge“, which easily result from poorly organized processes. Conversely, the practical adoption of formal verification is very often slowed down by unintended process constraints; progress in this sense doesn’t necessarily require more powerful techniques but mainly integrating known techniques in a way that is not disruptive of existing practices — perhaps except for unfixable noxious practices whose disruption is desirable anyway.

The third problem I see with the “disaster-oriented” motivation for formal methods is that it seems to suggest that verification research is worthy only to the extent that it’s directly applicable to relevant software development issues and leads concrete gains, for example averting failure. This view encourages a view of science as a provider of cost-effective technology. To be sure, verification technology has had enough success stories that even long-standing skeptics must concede it can be useful in practice. (If you find the previous sentence controversial, I have blog posts planned to convince you!) But practical applicability need not be — should not be — the only goal of researchers. There must be room for speculative inquiries, also because long term progress invariably and crucially depends on them. Think, for example, of how the spectacular progress of SAT solvers has fueled advances in automatic verification in the last decade or so. The performance of SAT solvers hasn’t mushroomed overnight, but is the result of decades of research on a topic that seemed, at the time of its inception, quite removed from practicality. As Ed Clarke often recalls when presenting SAT-based model checking, if he had suggested using an NP-complete problem such as SAT as the basis for a practical verification technique at the time he was working on his Ph.D. thesis, his colleagues would have called his suggestion implausible because clearly clueless about the well-understood notion of intractable problem.

In spite of not always being pitched and motivated in the most convincing way, formal verification remains an exciting, quickly growing research topic that has made spectacular progress, which has been winning, in one form or another, the favors of more and more serious practitioners. One example among several: John Carmack‘s recent opinion that

The most important thing I have done as a programmer in recent years is to aggressively pursue static code analysis. Even more valuable than the hundreds of serious bugs I have prevented with it is the change in mindset about the way I view software reliability and code quality.

This is quite an encouraging endorsement, even if you work on verification techniques other than static analysis. Or perhaps all the “motivation” that verification research needs comes from Leibniz. In the next formal methods course I teach, I’ll try to open with his famous quote instead of discussing rockets and radiation machines:

The only way to rectify our reasonings is to make them as tangible as those of the mathematicians, so that we can find our error at a glance, and when there are disputes among persons, we can simply say: let us calculate without further ado to see who is right.

Tired of the standard textbook descriptions of the Quicksort algorithm? Here’s a rhyming presentation suitable for all ages 🙂

As a reading guide, here’s a matching pseudo-code partial implementation that follows the procedure described in the rhyme. Enjoy your holidays!

If you’re familiar with computer science, you’re also familiar with the notion of abstraction. Abstraction is ubiquitous in computing. Whereas it belongs to all of science in one way or another, it has a pivotal role in computing where everything abstracts something more concrete and concrete computations are made of abstractions.

If you’re familiar with music, you’re also familiar with the notion of abstraction. Abstraction is ubiquitous in music. Whereas it belongs to all of the arts in one way or another, it has a pivotal role in music where everything abstracts something more concrete and concrete music is made of abstractions.

The similarity between computing and music with respect to abstraction runs deeper. Abstraction is introduced to solve similar problems that originate in a tension between flexibility and stability, generality and concreteness, effectiveness and understandability. In this post, I illustrate the analogy with one concrete example in music and one concrete example in computing. To make sure that the discussion does not become too — ahem — abstract, I recommend you first read the part you’re more familiar with, so that the analogies in describing its counterpart will be easier to follow.

### Abstraction in music: the sonata form

A music composer has to reconcile his aspiration of creativity with the listener’s expectation of a structure familiar to her. Ideally, the composer’s be unencumbered, free to express his ideas by whatever musical items he deems appropriate; but such an approach has a high risk of alienating the listener, who may not be able to follow the ideas expressed in music without any structure for orientation. This is why music (like other arts) has developed forms, which are conventions that help achieve trade-offs between constraining the composer and convincing the listener. The sonata form is one of the most popular of such musical forms as proved by its malleability and longevity.

The sonata form is a prescription to organize the structure of a musical piece in a way that is regular but also leaves plenty of room for the outcomes of artistic creativity. A piece in this form consists of three main sequential parts: exposition, development, and recapitulation. Let’s illustrate them on the first movement of Beethoven’s Piano Sonata #9 (Op. 14 No. 1); you can listen to it played by Claudio Arrau in this YouTube video (I’ll link to specific segments as I introduce the sonata’s parts) and follow the music on one of these scores in the public domain.

The exposition (bars 1–60) introduces a number of themes (or subjects), which are melodies expressing the ground content of the whole piece. By opening with a presentation of unadulterated themes, the composer offers a convenient entry point to the listener: by becoming familiar with the primary ingredients of the sonata, she becomes capable of following the composer’s creation into unexpected territory without losing a general sense of direction. Beethoven’s sonata introduces three themes: the first theme (bars 1–21) begins with a first melodic line in the tonic (E major), which is then varied to construct a modulation into the dominant (B major); the second theme (bars 22–38) takes over in this key with a cantabile character, contrasted by the ensuing third theme (bars 39–56) which is more rhythmic in nature. A codetta (bars 57–60) takes up the last bars of the exposition and connects the third theme to a repetition of the exposition and then, after the repetition, to the second main sonata part: the development.

The development (bars 61–90) is a sonata’s central part, where the composer has substantial freedom to explore his ideas without major structural constraints. By varying and recombining the thematic material previously presented, he can lay out a compelling and imaginative musical argument while building atop the foundations laid in the exposition. The development in Beethoven’s sonata is fairly short, but it manages to explore some interesting fluid tonal contrasts between major and minor keys, and to expand the scant material of the codetta in a more dignified context (from bar 83).

The development is followed by the third sonata part: the recapitulation (bars 91–147). As the name suggests, the recapitulation is a concluding summary where the by-now familiar themes make their last appearance. The composer still has room for expressing his creative ideas in the recapitulation, as themes often change attributes such as presentation order, key, rhythm, and other melodic or harmonic details. Beethoven, for example, ventures into a minor key while transitioning from the first to the second theme in the recapitulation, and introduces other ingenious surprises that reward the attentive listener. The movement closes with a coda (from bar 148), which expands the material of the exposition’s codetta while restoring the awaited tonic key for an irenic conclusion.

The imperfectly symmetric exposition and recapitulation constitute an interface between the deep concrete musical ideas stirring in the composer’s mind and the ears and brains of the listener, who can so appropriate, in a generalized form, some of those ideas and enjoy them.

### Abstraction in computing: the modular routine

A computer programmer has to reconcile her creative aspiration of producing an efficient implementation with the user’s expectation of an interface easy to understand and exporting predictable behavior. Ideally, the programmer’s be free to resort to any technical means she deems appropriate to build an implementation that is fast, uses little memory, or is idiomatic in the chosen programming language; but too much freedom has a high risk of producing a program that is hard to use because its exact requirements and guarantees are arduous to fathom. This is why computing (like other scientific and engineering disciplines) has developed techniques for modularization, which are approaches that discipline how program code is structured in such a way that the effects of modular components can be understood in isolation as well as in combination. The modular routine is one of the most popular elements of modular design as proved by its ubiquity across programming languages.

The modular routine combines a piece of code (the body or implementation) with a well-defined interface. The interface consists of a signature, a precondition, and a postcondition. Let’s illustrate these parts on the algorithm for binary search in sorted arrays; here’s an implementation taken from Tim Bray’s, which I adapted to pseudo-Eiffel and annotated with a suitable complete functional specification.

The program text begins (line 1) with the signature, which declares the data the routine operates on. The input data consists of the integer array a, which includes a.count elements from a[0] to a[a.count - 1], and of the integer scalar target, whose value is searched among a‘s elements. The output data is also an integer scalar, referred to as Result in the rest of search_binary.

Routine search_binary works correctly under certain assumptions about properties of the input data passed to it by callers. These assumptions are encoded as the precondition (lines 2–4) which includes two clauses. First, a must refer to a valid array object (line 3). Second, a‘s content must be sorted in nondecreasing order (line 4). The second precondition clause is specific to binary search, whose algorithm leads meaningless results if applied to sequences that are not ordered.

The body (lines 5–27) is a routine’s central part, where the programmer has substantial freedom to work out a suitable implementation without major restrictions beyond those given by the programming language’s features. The body of search_binary begins with a declaration of three integer local variables (line 6), which will be used to keep track of intermediate states in the computation. A loop occupies the major part of the body and does the heavy lifting of searching for the target. As we understand from the loop invariant (lines 10–13), the interval that goes from low to high (excluded) marks, at any point during the computation, the range of values in a that have not been searched for yet. Conversely, the elements at indexes from 0 to low are not greater than target; and the elements at indexes from high to a.count - 1 are greater than it. The body of search_binary exits the loop only when the whole array a has been searched. At that point, the final conditional (lines 23–27) can conclude whether a value equal to target exists at all in a: if low is still -1 or points to an element not equal to — and hence less than because of the invariant — target, the routine returns the invalid index -1 to denote “element not found”; otherwise, a[low] is a valid array element equal to target, and the routine returns its index. The choice of decoupling searching from determining whether target is in the array is an example of the programmer’s freedom to design an implementation that achieves specific, advantageous trade-offs. Bray points out why this approach is preferable to performing two tests in the loop (one for less than, and one for equal to) in the hope of exiting the loop earlier if target is found: in a search where the search interval shrinks exponentially, “nearly all” values are found in the last iteration; hence minimizing the operations per iteration is the most efficient solution. Other programmers, given different constraints, may however follow other approaches; for example, Joshua Bloch’s implementation of binary search in Java’s OpenJDK uses a loop with three-way comparison.

Understanding all these details of the implementation would be burdensome to users who just want to call search_binary to avail of its functionality. This is why the body is followed by the last routine part: the postcondition (lines 28–31), which summarizes the routine’s output in relation to the value of the input arguments. The postcondition of search_binary includes three clauses, defining the output whether the value searched for has been found or not. If Result denotes a valid index of a (line 29), then it points to an element with value target; otherwise Result is -1 (line 30), and hence a has no such element. The last postcondition clause (line 31) specifies that these are the only possible values Result may take; that is, the postcondition is complete. The functional postcondition nicely abstracts not only the details of the specific implementation of binary search, but even the fact that a binary search algorithm has been executed. Any search routine, be it binary, sequential, or following any other technique, could export the same postcondition in its interface to communicate to the user its input/output behavior.

The mirroring precondition and postcondition constitute a rigorous interface between the ingenious ephemeral details of the efficient implementation engineered by the programmer and the general terse information that is available to the user, who can so reuse, in different contexts, the programmer’s code and take advantage of its reliable behavior.

### Coda

Although the analogy between sonatas and routines is imperfect, it unveils deep conceptual connections. As the key to managing complexity, abstraction underpins any efforts to further knowledge; and helps push to new horizons music and programs alike.

Perl’s motto “There’s more than one way to do it” applies to verification too. The proliferation of formal methods may seem haphazard to greenhorns; the claimed differences between variants of the same technique exaggerated. Granted, not every improvement is a groundbreaking breakthrough (anyway, not most of the time). But the space of solutions to the verification problem is huge and the theoretical limits outstanding; hence the interesting trade-offs are many, each sensible given different constraints. One size does not fit all.

Nadia‘s guest post about dimensions in verification covered the big picture. The follow-up point I’d like to make is that there’s a significant variability also between variants of the same verification technique. Francesco Logozzo‘s post about abstract interpretation triggered these thoughts of mine; in fact, I’m going to build on his example to illustrate my point.

Francesco presents the example of a routine that computes the maximum element in an array, using a for loop that scans the array from its first to its last element, to show that his CodeContracts static checker — based on abstract interpretation — is not limited to dealing with “simple” properties such as nonnullness: it is capable of inferring the complete functional postcondition of max, namely (using a more readable syntax):

\forall k\colon 0 \leq k < \texttt{a.Length} \Longrightarrow \texttt{a}[k] \leq \texttt{Result}[/latex] $\exists k\colon 0 \leq k < \texttt{a.Length} \wedge \texttt{a}[k] = \texttt{Result}$

This is remarkable, but let's try to modify the example to see where we can go from here. First, encode max as a function that operates on an attribute a by scanning it from last to first. (The following code is C#.)

Function max_downward and the original one max have the very same input/output behavior, and indeed they implement two variants of the same simple algorithm. Ran on max_downward, the CodeContracts checker correctly infers that Result is an element of a (the existentially quantified postcondition above), and that it is greater than or equal to a's last element, but it falls short of the complete functional postcondition (it misses the universally quantified postcondition).

I don't know the details of CodeContracts checker's implementation of an abstract interpreter, and hence I cannot track down precisely the source of this behavior. However, there is no reason to be surprised about this restriction: any sound verification technique edges on intractability; as users, it normally is not difficult to push it until undecidability rears its ugly head and, in this case, spills some precision. If anything, we should be surprised whenever a verification technique works flexibly in different conditions!

To continue our example, we accept to give up some automation in exchange for more flexibility. We turn to deductive verification; more precisely, auto-active verifiers. Rustan Leino and Michał Moskal introduced the term "auto-active" to denote tools that are partly automatic and partly interactive. (By the way, my proposal to call them "inter-matic" has yet to catch on.) They are automatic in that they do not require interactive guidance while they try to verify an input program. But they still indirectly interact with users through annotations, such as pre- and postconditions, invariants, and suggestions such as intermediate assertions, which are an additional burden to write and crucially influence their capabilities and performance.

Even among auto-active verifiers, there is room for considerable variations in terms of flexibility and simplicity. Let's first consider Dafny, a state-of-the-art verifier mainly developed by the SMT whisperer — a.k.a. Rustan Leino. Dafny has a relatively simple input language, with built-in support for specification models such as mathematical sequences, of which we're going to take advantage in our example. Here's a variant of the example that verifies in Dafny: have a glance at the code, and then I'll point out the interesting parts.

Besides the array a, class PSList ("Possibly Sorted" List) includes a Boolean attribute sorted, whose role is storing whether the content of a is sorted. One can imagine a complete implementation were this flag is set whenever the array is processed by a sorting algorithm or is populated by elements in order. The new variant of maximum, called max_ps, takes advantage of this additional piece of information: if sorted is true, it immediately returns the value a[a.Length - 1] which is initially assigned to Result. Indeed, the maximum of a sequence that is sorted in increasing order is its last element. The correctness of max_ps relies on this fact; which we encode, following Dafny's idiomatic approach to representation invariants, as part of predicate Valid: if sorted is true then the content a[..] of a is sorted in increasing order. Valid is supposed to be an invariant property, which methods of class PSList can rely on and have to preserve. In fact, method max_ps lists Valid in its pre- and postcondition; and Dafny uses its clause about sortedness in the proof of branch if sorted { ... }. (In case you're wondering: the assert there is needed to help Dafny convert between a's content and its sequence representation a[..].)

This example worked smoothly thanks to Dafny's transparent syntax, which explicitly represents all salient features of the code and its annotations. But suppose we make the example more realistic (and more complex) by adding max_ps to the interface of a full-fledged general-purpose list data structure. In this case, we may prefer an auto-active tool with more options and control knobs, which gives additional flexibility in exchange for a steeper learning curve. AutoProof is a verifier that tries to offer such a trade off; thus, the last version of our maximum example is in Eiffel — AutoProof's input language.

Class PS_LIST builds its functionalities by extending those of some verified library implementation of a list with all the bells and whistles. In practice, FULL_FLEDGED_LIST could be some list implementation among those in EiffelBase2. A couple of hints should suffice to explain the example's (imperfectly highlighted) syntax: across expressions represent universal (across ... all) and existential (across ... some) quantification over integer intervals x |..| y; sequence is an inherited model attribute that describes the list's content as a mathematical sequence (indexed from 1, unlike the C# and Dafny examples).

We encode the sortedness property as part of the class invariant, in a clause named when_sorted, which appears at the bottom of PS_LIST's text. Remember that we are inheriting all of the functionalities of a full-fledged list, including a detailed, complex class invariant to which we add clause when_sorted. Dealing with the kind of full functional specifications that are needed to reason about realistic implementations has some practical consequences: since the invariant is a lot of specification, carrying it around in the proof of every routine of the class is likely to bog down the verifier and to render the proof of even trivial properties a challenge. AutoProof, however, has been developed with these situations in mind. Routine max_ps can recall the only invariant element that is needed precisely where it is needed in the proof, using the assertion check inv_only ("when_sorted") end in the conditional branch if sorted then ... . The many other consistency properties inherited with the invariant are simply ignored to the benefit of a straightforward, agile proof.