Bug counting « A little specification can go a long way

I hope nobody took the previous post too seriously. It was meant as a satirical joke, pointing out several shortcomings of the notorious LaTeX vs. Word comparison study. I still want to say a few things on the topic, though; and this time I’m completely serious.

I realize that LaTeX can be daunting for scientists who have little programming (or, more generally, computer science) background, to the point that they may well opt for user-friendlier alternatives. I cannot deny that knowing one’s way around LaTeX requires some time; and, until they reach a good level, LaTeX users may be slowed down and ineffective. However, I think it is important to at least understand why LaTeX remains predominant in scientific publishing; that is, what benefits come after a sufficient investment in learning to use it. If you have the same background I have, this post will probably sound completely obvious. Otherwise, read on.

The key features of LaTeX as a document preparation system are:

LaTeX is a markup language, that is it consists of annotations combined with the actual content to be displayed; the annotations define structure and appearance of the content, but both are expressed as plain text.
LaTeX is a set of commands that translate down to TeX and Metafont.

Both features contribute to the user unfriendliness of LaTeX compared to so-called WYSIWYG document preparation systems such as Word; but also are what makes LaTeX superior in practically every other aspect. To put it in concrete terms: I think it’s entirely possible that LaTeX will eventually be replaced, for scientific publishing, by a different document preparation system (Markdown extensions anyone?). But, if this will ever happen, the replacement system will certainly be another markup language; and will very likely translate down to TeX and Metafont.

Why markup languages are better

Markup languages, where all information is encoded in plain text, are naturally amenable to editing using a whopping variety of tools that also operate on text files. Your favorite text editor — including Word itself, if you insist — is also a LaTeX editor, with spell checking and all other functionalities readily available. Text manipulation tools based on regular expressions — grep, sed, awk, perl, you name it — are directly applicable to LaTeX texts. Data processing environments — such as R for statistical analysis — can easily export data to LaTeX since all they have to produce is suitably encoded text output.

Version control tools, such as Subversion and Git, use lines of text as fundamental units. Hence, they work out of the box on LaTeX files to provide powerful support to collaborative editing and versioning. This goes well beyond the primitive capabilities of Word’s “track changes” function: authors can edit in parallel the same source file, inspect each other’s changes, and automatically merge individual variants into a master copy. Easy-to-read visual presentations of changes are still possible through a variety of graphical interfaces to version control software, or by means of other tools that work on text (try latexdiff for a presentation that looks like Word’s track changes).

Markup languages tend to enforce a clear separation between content and presentation. I wrote “tend to” because some markup languages do a better job than others at separating structural markup and presentation markup. LaTeX does not excel at this, since it does not always clearly distinguish between, say, an annotation that marks the beginning of a section (a structural detail) and one that marks a word to be displayed in a different font (a presentation detail). Nevertheless, writing documents in LaTeX lets you focus on content much better than WYSIWYG systems like Word, which are centered around commands to change presentation (font type and size, line spacing, page breaks, and so on) and offer limited or hard-to-use support to add complex structural information (section and subsection titles, cross references, and so on). In fact, I have the impression that only advanced Word users take advantage of functionalities such as to define styles and indexes. The situation in LaTeX is reversed: inexperienced users learn only very basic presentation-changing commands (such as to have text in bold and italic), but know well how to mark the beginning of sections, and how to introduce cross-references, bibliographic entries, and other structural information. Changing presentation details such as line spacing, font type, or the spacing of rows in tables is instead the domain of the LaTeX expert, who has hopefully also learned not to change them from the default, defined in a stylesheet, unless there is a compelling reason. In summary, markup languages help you focus on content, defer fixing presentation details to when the content is stable, and change presentation uniformly and robustly based on the intended structure of text.

Compared to intuitive WYSIWYG tools, using markup languages may slow you down: you may have to lookup the language’s syntax to do things you don’t know or remember from previous usage; and achieving the desired presentation form requires planning the organization of a document before starting writing, or however modifying it consistently at some point. But this kind of slowing down is a good thing: planning and paying attention to the structure beforehand is generally necessary to achieve a good document design.

Why TeX is better

To understand the unique strengths of TeX (and its companion Metafont) we should have an idea of its history, whose protagonist is the legendary computer scientist Don Knuth. Working on a new edition of his magnum opus “The art of computer programming”, he became strongly dissatisfied with the typographic quality achieved by state-of-the-art technologies, in particular when it came to typesetting mathematics. To solve the problem for good, he started designing the TeX system in 1977, expecting to finish the project during his sabbatical in 1978; he was off only by a little over 10 years. Knuth tells the story in his own words in this interesting interview (the linked segment is about how much time and fretting took him to devise a suitable scalable algorithmic definition of the shape for the letter “S”: I’m pretty sure he would disagree with the LaTeX vs. Word study regarding “the appearance of the text [being] secondary to the scientific merit of an article”).

As a result, TeX is a top-notch thorough solution to the problem of providing a typesetting system that offers very high-quality output on every machine where it runs. Almost 30 years since its first stable release, and with all its idiosyncrasies, TeX’s core remains technically unsurpassed and a de facto standard.

Several features of LaTeX derive from using TeX as back-end. An important one is the possibility of defining powerful macros — a feature LaTeX directly inherits from TeX (in fact, LaTeX is essentially an extensive collection of TeX macro). The possibility of having complete control on the back-end through macros provides great flexibility to LaTeX users, if at the expense of having to deal with an occasionally abstruse syntax.

Then: LaTeX?

If you care about typographic quality; if you think it is conducive to — not in contrast with — clarity of presentation and quality of ideas; if you’re willing to invest time now to become way more productive later; if you would like to take advantage of the outcome of decades of top-of-the-line tool development; and if you think microwaved TV dinners are a poor substitute for properly cooked food. Then, you may want to give LaTeX a serious try; or to start designing the next markup language that will replace it while remaining compatible with the TeX ecosystem.

An empirical study comparing LaTeX to Word for preparing scientific documents appeared on PLOS ONE in mid December, just in time to ignite an intense online discussion that took place over the holiday season. Among the many comments that I read, one tweet by Michele Lanza made me realize that the study was more general than I originally thought:

The Word vs. LaTeX efficiency argument is in line with microwave food vs. actual cooking

What a terrific suggestion for related research! Back from the winter break, I decided to adapt the design of the LaTeX vs. Word study in order to pit microwave food against proper cooking. The results are, I believe, quite conclusive; and my recommendations based on them compelling. While I submitted a detailed write-up for publication, I want to give you a summary of the essential findings in the rest of this post. In my empirical research, I stuck to the original study’s design as closely as possible since it seemed as much apropos for comparing cooking methods as it was for comparing document preparation systems. In the write-up that follows you will recognize passages that mirror closely the structure and even the words of the original LaTeX vs. Word paper, since those words speak for themselves — and imitation is the sincerest form of flattery.

The study

This empirical study compares microwave usage to full-fledged cooking for food preparation. The experimental methodology was straightforward. We gathered 40 volunteers who use to cook their dinners — some using a microwave to defrost preprocessed food, some cooking from raw ingredients following a recipe. All participants used their own kitchen to run the experiments.

We set up three different sample meals: (1) a TV dinner with two food compartments; (2) a different brand of TV dinner with four food compartments; and (3) a dish consisting of 100 grams of spaghetti aglio e olio. Each participant had 30 minutes to cook each meal using his or her chosen food preparation technique, process, and equipment. Who chose full-fledged cooking was given access to a repository of raw ingredients; and who chose microwaving was given a supply of common TV dinners of different brands. The performance of each participant was measured for each sample meal by three variables: (1) the number of visual differences (food layout, appearance, color) between the cooked meal and the sample; (2) the number of differences in flavor between the cooked meal and the sample; and (3) the amount of hot edible mass produced within 30 minutes. Each participant also completed a questionnaire where they self-evaluated their performance.

The experimental results are unequivocal. Microwave users outperformed traditional cooks on most measures ( $\text{[math]}$ p < 0.05[/latex], often even [latex]p < 0.01[/latex]), with the only exception of the spaghetti dish. Even the most expert cooks were unable to reproduce, from raw ingredients, TV dinners that look and taste like the sample TV dinners, in contrast to the novice microwave users who managed to effortlessly heat to near perfection large amounts of prepackaged food.

These results suggest that full-fledged cooking should be adopted only to prepare complex dishes mainly consisting of oil and garlic. Experienced cooks may argue that the overall quality and healthiness of properly cooked food provides for a meal experience that is considerably more enjoyable than ingurgitating a tasteless, fat-laden, appalling microwave dinner. Although this argument may be true, the differences with the recently introduced versions of TV dinners may be a tad less spectacularly obvious than they were in the past. Thus, the results show that no reasons exist to use traditional means of cooking, except possibly for dishes that contain complex combinations of olive oil and pasta.

An unexpected result of the study was that, according to the questionnaire’s answers, full-fledged cooks are highly satisfied with their experience. Despite incurring reduced usability and productivity, they assessed their work as less tiresome, less frustrating, and more enjoyable than microwave users. From a psychological perspective, the most reasonable explanation is that the cooks are delusional lunatics who are unwilling to reconsider their manifestly incorrect beliefs about their cooking ability in light of their actual poor results in faithfully reproducing low-grade industrial food.

The study’s results also have implications in terms of costs of food preparation and consumption by the public. Individuals have a responsibility to act economically and efficiently, especially in cases in which their occupation is publicly funded. No reliable data is available about how many publicly-employed workers cook their own meals, and correspondingly it is unclear the amount of taxpayer’s money that is spent worldwide by individuals who stubbornly insist on cooking food from raw ingredients over sticking to a more efficient and modern meal preparation system, which would free up their time to advance their respective field of occupation.

I therefore suggest that leading public institutions should consider accepting time-squandering food preparation practices by their employees only if this is justified by the prevalence of dishes involving spaghetti and garlic. In all other cases, said institutions should request employees to eat microwave food (or order take out). We believe that this would be a good policy for two reasons. First, the flavor and appearance of food is secondary to its nutritional values. And, second, preventing people from frittering away scarce culinary resources would save time and money to maximize the benefit of work and development for both individual institutions and the public.

P.S.: Some readers suggested two additional aspects in which microwave cooking is superior. First, adjusting the heating power is much easier with a microwave (where pushing a button immediately interrupts the flow of radiation) than with a traditional stove (where the stove’s surface may remain hot for minutes even after power is turned off). And, second, crispy food can be properly cooked using the recently introduced hot air circulation system available in several high-end microwave ovens.

“Welcome to the first class of our course on formal verification. I’d like to start with some motivation: why verification is important in the real world, and what the consequences of lack of verification in software development are.” Lights go off. You hear a countdown recited in French, “Trois, deux, unité, feu!“. On the screen, a rocket blasts off its platform into the sky. Thirty-something seconds into its ascension, it awkwardly topples off and quickly disintegrates. “That was the Ariane 5 rocket’s first launch, which failed due to a software bug. The failure resulted in an estimated 300 million Euros loss. Software errors can even cost human lives, such as in the case of the Therac-25 machine for radiation therapy, whose malfunctioning resulted in several patients dying from radiation burns. Formal verification can ensure that software is free from defects before it’s deployed, thus avoiding catastrophic disasters such as those just mentioned. Now here’s the course syllabus…”

OK, I admit to slightly exaggerating the above for dramatic purposes. But, if you ever took a course on formal verification, chances are that it started with some graphic pictures of exploding rockets, unresponsive Mars rovers, or defective medical equipment, followed by stark tallies of the enormous losses, in money or human lives, resulting from the disasters. The bottom line: software quality is generally abysmal; and the systematic all-out application of formal verification is our only hope to avert further similar disasters from occurring in the future.

What’s the problem with this angle, to which I may have also been guilty of paying lip service in the past? I think this way of pitching formal verification points to motivations that are not fairly representative of the varied landscape in the research and practice of formal methods, and risks alienating software development practitioners, who have different concerns in their everyday’s work — concerns which conflict with the oversimplified picture painted above.

The first problem I see is that pointing to a handful of major disasters due to software failures may be construed as suggesting that formal verification is something only justified for mission critical applications. What about the myriad software projects that don’t cost hundreds of million dollars and whose failure does not result in ruinous losses? Is there a place for formal methods within such projects’ modest budgets? Related to these points is the misled vision that formal verification is an all-or-nothing business: either it enters every step of development or you might as well do things as you’ve always done them — but then disaster will impend. The flip side is that the only failures that matter are the catastrophic ones; everything else is not a big deal. In reality, everyday software developers generally remain genuinely concerned about software quality even when working on non mission-critical development; but also have to cope with all sorts of project constraints pulling in different directions.

A better way to frame the usage of formal verification is in terms of a continuum of trade-offs. Each trade-off balances some of the additional costs incurred by applying verification techniques against some gains resulting from higher reliability or confidence in the software’s correctness. This way, each project can pick the trade-off that best fits its requirements and budget. It’s easy to guess that the overwhelming majority of development projects occupy some intermediate position in the trade-off continuum: their failures won’t result in apocalyptic losses, nor are they of no consequence whatsoever. These are the projects that most practitioner developers are familiar with and can relate to. To each their own: Paris Métro Line 14 deployed the B method to build correct-by-construction modules through refinement; for my students learning Java and C#, writing a decent collection of unit tests and running a lint-like tool on their projects is good enough.

The second problem with the above pitching of formal methods is that it may suggest that they are a silver bullet. You want high reliability: just use formal methods (but it will cost you)! It doesn’t matter what your development process is, what programming language you’re using, whether your software is interactive, if you’re deploying in a legacy system, what your amount of technical debt is. Start using [your favorite verification technique] and your problems will evaporate!

But reliability, and practically every desirable quality property, is never the result of a single technique, tool, design choice, or individual talent. It can only follow from a sensible, well-organized process; one that is transparent, traceable, technologically up to date, based on a clear understanding of the application domain, and so on. The investigations following the failures of the examples mentioned above (Ariane 5, Therac-25, etc.) invariably pointed to communication and process shortcoming as well as, often more egregious than, technical ones. Correctness cannot be built as an afterthought; it requires planning and practice. Based on his experience at Microsoft, James Larus claimed that a root cause of software bugs is “missing, outdated, and incorrect knowledge“, which easily result from poorly organized processes. Conversely, the practical adoption of formal verification is very often slowed down by unintended process constraints; progress in this sense doesn’t necessarily require more powerful techniques but mainly integrating known techniques in a way that is not disruptive of existing practices — perhaps except for unfixable noxious practices whose disruption is desirable anyway.

The third problem I see with the “disaster-oriented” motivation for formal methods is that it seems to suggest that verification research is worthy only to the extent that it’s directly applicable to relevant software development issues and leads concrete gains, for example averting failure. This view encourages a view of science as a provider of cost-effective technology. To be sure, verification technology has had enough success stories that even long-standing skeptics must concede it can be useful in practice. (If you find the previous sentence controversial, I have blog posts planned to convince you!) But practical applicability need not be — should not be — the only goal of researchers. There must be room for speculative inquiries, also because long term progress invariably and crucially depends on them. Think, for example, of how the spectacular progress of SAT solvers has fueled advances in automatic verification in the last decade or so. The performance of SAT solvers hasn’t mushroomed overnight, but is the result of decades of research on a topic that seemed, at the time of its inception, quite removed from practicality. As Ed Clarke often recalls when presenting SAT-based model checking, if he had suggested using an NP-complete problem such as SAT as the basis for a practical verification technique at the time he was working on his Ph.D. thesis, his colleagues would have called his suggestion implausible because clearly clueless about the well-understood notion of intractable problem.

In spite of not always being pitched and motivated in the most convincing way, formal verification remains an exciting, quickly growing research topic that has made spectacular progress, which has been winning, in one form or another, the favors of more and more serious practitioners. One example among several: John Carmack‘s recent opinion that

The most important thing I have done as a programmer in recent years is to aggressively pursue static code analysis. Even more valuable than the hundreds of serious bugs I have prevented with it is the change in mindset about the way I view software reliability and code quality.

This is quite an encouraging endorsement, even if you work on verification techniques other than static analysis. Or perhaps all the “motivation” that verification research needs comes from Leibniz. In the next formal methods course I teach, I’ll try to open with his famous quote instead of discussing rockets and radiation machines:

The only way to rectify our reasonings is to make them as tangible as those of the mathematicians, so that we can find our error at a glance, and when there are disputes among persons, we can simply say: let us calculate without further ado to see who is right.

Tired of the standard textbook descriptions of the Quicksort algorithm? Here’s a rhyming presentation suitable for all ages 🙂

A base-ball and a truck, 
a large shawl with a puck
lie in the dormitory
in order desultory.

I should clean it up,
I ought to tidy up,
but sorting takes time,
it can cost many a dime!

"It's untrue that's expensive", 
replied Sam a bit pensive. 
"Tony Hoare and other men,
showed that sorting's N log N."

"All right then", I said, 
"Let's do it. Go ahead."
The result will be prime:
sorting in quasilinear time!

A quick sort is driven
by an item named pivot.
Our primary ambition
is build a partition 
where pivot is placed
with items well spaced:
where small ones go sinister;
but big ones, right administer.

As pivot "puck" I choose,
so no one is confused.
The items I partition
with only one omission:
partitioning in-place
can be a tricky case;
I will not flesh it out,
go read to clear any doubt!

There's no way we get stuck
while partitioning with "puck",
after switching it with "truck".

A base-ball and a puck, 
a large shawl with a truck,
partition the dormitory
in order mandatory.
A ball's smaller than a puck,
which is smaller than shawls or trucks.
If you think this is heretical,
just assume order alphabetical.

Sam said, "With your move,
of which I fully approve,
we're in a good position
to progress in our mission.
Based on the built partition,
we now have the permission
to split the work in two —
what an excellent breakthrough!"

A base-ball and a puck, 
partitioning we will;
from the shawl with the truck, 
a partition we'll distill.
Processing a pair,
even with great care,
won't be too much work —
no reason for a irk.

But wait, there's more —
to the Quicksort algorithm 
by Sir C. A. R. Hoare.

Partitioning a pair
is just the same, I swear,
as ordering the two —
correctness will ensue.
By binary splitting, 
in logarithmic sittings, 
we jump to the base case
of the sorted ranking chase.

The base-ball and the puck, 
partitioned they are;
a large shawl and a truck, 
are well ordered so far.
A room that was chaotic,
a task a tad robotic,
the dormitory is neat:
we can all feel upbeat.

There's still a final catch
to ensure you win the match.
Be careful what method you're using
whenever the pivot you're choosing.
A badly picked pivot
can really, truly limit
performance of Quicksort:
runtime will not be short,
you will be disappointed,
when N log N you expect,
but N times N you get!

100

101

102

A base-ball and a truck,

a large shawl with a puck

lie in the dormitory

in order desultory.

I should clean it up,

I ought to tidy up,

but sorting takes time,

it can cost many a dime!

"It's untrue that's expensive",

replied Sam a bit pensive.

"Tony Hoare and other men,

showed that sorting's N log N."

"All right then", I said,

"Let's do it. Go ahead."

The result will be prime:

sorting in quasilinear time!

A quick sort is driven

by an item named pivot.

Our primary ambition

is build a partition

where pivot is placed

with items well spaced:

where small ones go sinister;

but big ones, right administer.

As pivot "puck" I choose,

so no one is confused.

The items I partition

with only one omission:

partitioning in-place

can be a tricky case;

I will not flesh it out,

go read to clear any doubt!

There's no way we get stuck

while partitioning with "puck",

after switching it with "truck".

A base-ball and a puck,

a large shawl with a truck,

partition the dormitory

in order mandatory.

A ball's smaller than a puck,

which is smaller than shawls or trucks.

If you think this is heretical,

just assume order alphabetical.

Sam said, "With your move,

of which I fully approve,

we're in a good position

to progress in our mission.

Based on the built partition,

we now have the permission

to split the work in two —

what an excellent breakthrough!"

A base-ball and a puck,

partitioning we will;

from the shawl with the truck,

a partition we'll distill.

Processing a pair,

even with great care,

won't be too much work —

no reason for a irk.

But wait, there's more —

to the Quicksort algorithm

by Sir C. A. R. Hoare.

Partitioning a pair

is just the same, I swear,

as ordering the two —

correctness will ensue.

By binary splitting,

in logarithmic sittings,

we jump to the base case

of the sorted ranking chase.

The base-ball and the puck,

partitioned they are;

a large shawl and a truck,

are well ordered so far.

A room that was chaotic,

a task a tad robotic,

the dormitory is neat:

we can all feel upbeat.

There's still a final catch

to ensure you win the match.

Be careful what method you're using

whenever the pivot you're choosing.

A badly picked pivot

can really, truly limit

performance of Quicksort:

runtime will not be short,

you will be disappointed,

when N log N you expect,

but N times N you get!

As a reading guide, here’s a matching pseudo-code partial implementation that follows the procedure described in the rhyme. Enjoy your holidays!

quicksort (a: ARRAY[G])
  local
    pivot: G
    p: INTEGER
  do
    if a.count > 1 then
      pivot := -- pick an element in 'a'
      p := partition (a, pivot)
      quicksort (a[0..p))
      quicksort (a[p..a.count))
    end -- empty and singleton arrays are trivially sorted
  ensure
    is_sorted (a)
    elements (a) = old elements (a)
  end


partition (a: ARRAY[G], pivot: G): (p: INTEGER)
  require
    a.count > 0
  ensure
    0 <= p <= a.count
    forall i :: 0 <= i < p ==> a[i] <= pivot
    forall i :: p <= i < a.count ==> pivot <= a[i]
    elements (a) = old elements (a)
  end

quicksort (a: ARRAY[G])

local

pivot: G

p: INTEGER

if a.count > 1 then

pivot := -- pick an element in 'a'

p := partition (a, pivot)

quicksort (a[0..p))

quicksort (a[p..a.count))

end -- empty and singleton arrays are trivially sorted

ensure

is_sorted (a)

elements (a) = old elements (a)

end

partition (a: ARRAY[G], pivot: G): (p: INTEGER)

require

a.count > 0

ensure

0 <= p <= a.count

forall i :: 0 <= i < p ==> a[i] <= pivot

forall i :: p <= i < a.count ==> pivot <= a[i]

elements (a) = old elements (a)

end

Bug counting

A little specification can go a long way

Seriously, why LaTeX?

Why markup languages are better

Why TeX is better

Then: LaTeX?

An efficiency comparison of food preparation systems

The study

What is verification for?

The quick-sort rhyming report

Abstraction in music as in computing