February « 2015 « Bug counting

I hope nobody took the previous post too seriously. It was meant as a satirical joke, pointing out several shortcomings of the notorious LaTeX vs. Word comparison study. I still want to say a few things on the topic, though; and this time I’m completely serious.

I realize that LaTeX can be daunting for scientists who have little programming (or, more generally, computer science) background, to the point that they may well opt for user-friendlier alternatives. I cannot deny that knowing one’s way around LaTeX requires some time; and, until they reach a good level, LaTeX users may be slowed down and ineffective. However, I think it is important to at least understand why LaTeX remains predominant in scientific publishing; that is, what benefits come after a sufficient investment in learning to use it. If you have the same background I have, this post will probably sound completely obvious. Otherwise, read on.

The key features of LaTeX as a document preparation system are:

LaTeX is a markup language, that is it consists of annotations combined with the actual content to be displayed; the annotations define structure and appearance of the content, but both are expressed as plain text.
LaTeX is a set of commands that translate down to TeX and Metafont.

Both features contribute to the user unfriendliness of LaTeX compared to so-called WYSIWYG document preparation systems such as Word; but also are what makes LaTeX superior in practically every other aspect. To put it in concrete terms: I think it’s entirely possible that LaTeX will eventually be replaced, for scientific publishing, by a different document preparation system (Markdown extensions anyone?). But, if this will ever happen, the replacement system will certainly be another markup language; and will very likely translate down to TeX and Metafont.

Why markup languages are better

Markup languages, where all information is encoded in plain text, are naturally amenable to editing using a whopping variety of tools that also operate on text files. Your favorite text editor — including Word itself, if you insist — is also a LaTeX editor, with spell checking and all other functionalities readily available. Text manipulation tools based on regular expressions — grep, sed, awk, perl, you name it — are directly applicable to LaTeX texts. Data processing environments — such as R for statistical analysis — can easily export data to LaTeX since all they have to produce is suitably encoded text output.

Version control tools, such as Subversion and Git, use lines of text as fundamental units. Hence, they work out of the box on LaTeX files to provide powerful support to collaborative editing and versioning. This goes well beyond the primitive capabilities of Word’s “track changes” function: authors can edit in parallel the same source file, inspect each other’s changes, and automatically merge individual variants into a master copy. Easy-to-read visual presentations of changes are still possible through a variety of graphical interfaces to version control software, or by means of other tools that work on text (try latexdiff for a presentation that looks like Word’s track changes).

Markup languages tend to enforce a clear separation between content and presentation. I wrote “tend to” because some markup languages do a better job than others at separating structural markup and presentation markup. LaTeX does not excel at this, since it does not always clearly distinguish between, say, an annotation that marks the beginning of a section (a structural detail) and one that marks a word to be displayed in a different font (a presentation detail). Nevertheless, writing documents in LaTeX lets you focus on content much better than WYSIWYG systems like Word, which are centered around commands to change presentation (font type and size, line spacing, page breaks, and so on) and offer limited or hard-to-use support to add complex structural information (section and subsection titles, cross references, and so on). In fact, I have the impression that only advanced Word users take advantage of functionalities such as to define styles and indexes. The situation in LaTeX is reversed: inexperienced users learn only very basic presentation-changing commands (such as to have text in bold and italic), but know well how to mark the beginning of sections, and how to introduce cross-references, bibliographic entries, and other structural information. Changing presentation details such as line spacing, font type, or the spacing of rows in tables is instead the domain of the LaTeX expert, who has hopefully also learned not to change them from the default, defined in a stylesheet, unless there is a compelling reason. In summary, markup languages help you focus on content, defer fixing presentation details to when the content is stable, and change presentation uniformly and robustly based on the intended structure of text.

Compared to intuitive WYSIWYG tools, using markup languages may slow you down: you may have to lookup the language’s syntax to do things you don’t know or remember from previous usage; and achieving the desired presentation form requires planning the organization of a document before starting writing, or however modifying it consistently at some point. But this kind of slowing down is a good thing: planning and paying attention to the structure beforehand is generally necessary to achieve a good document design.

Why TeX is better

To understand the unique strengths of TeX (and its companion Metafont) we should have an idea of its history, whose protagonist is the legendary computer scientist Don Knuth. Working on a new edition of his magnum opus “The art of computer programming”, he became strongly dissatisfied with the typographic quality achieved by state-of-the-art technologies, in particular when it came to typesetting mathematics. To solve the problem for good, he started designing the TeX system in 1977, expecting to finish the project during his sabbatical in 1978; he was off only by a little over 10 years. Knuth tells the story in his own words in this interesting interview (the linked segment is about how much time and fretting took him to devise a suitable scalable algorithmic definition of the shape for the letter “S”: I’m pretty sure he would disagree with the LaTeX vs. Word study regarding “the appearance of the text [being] secondary to the scientific merit of an article”).

As a result, TeX is a top-notch thorough solution to the problem of providing a typesetting system that offers very high-quality output on every machine where it runs. Almost 30 years since its first stable release, and with all its idiosyncrasies, TeX’s core remains technically unsurpassed and a de facto standard.

Several features of LaTeX derive from using TeX as back-end. An important one is the possibility of defining powerful macros — a feature LaTeX directly inherits from TeX (in fact, LaTeX is essentially an extensive collection of TeX macro). The possibility of having complete control on the back-end through macros provides great flexibility to LaTeX users, if at the expense of having to deal with an occasionally abstruse syntax.

Then: LaTeX?

If you care about typographic quality; if you think it is conducive to — not in contrast with — clarity of presentation and quality of ideas; if you’re willing to invest time now to become way more productive later; if you would like to take advantage of the outcome of decades of top-of-the-line tool development; and if you think microwaved TV dinners are a poor substitute for properly cooked food. Then, you may want to give LaTeX a serious try; or to start designing the next markup language that will replace it while remaining compatible with the TeX ecosystem.

An empirical study comparing LaTeX to Word for preparing scientific documents appeared on PLOS ONE in mid December, just in time to ignite an intense online discussion that took place over the holiday season. Among the many comments that I read, one tweet by Michele Lanza made me realize that the study was more general than I originally thought:

The Word vs. LaTeX efficiency argument is in line with microwave food vs. actual cooking

What a terrific suggestion for related research! Back from the winter break, I decided to adapt the design of the LaTeX vs. Word study in order to pit microwave food against proper cooking. The results are, I believe, quite conclusive; and my recommendations based on them compelling. While I submitted a detailed write-up for publication, I want to give you a summary of the essential findings in the rest of this post. In my empirical research, I stuck to the original study’s design as closely as possible since it seemed as much apropos for comparing cooking methods as it was for comparing document preparation systems. In the write-up that follows you will recognize passages that mirror closely the structure and even the words of the original LaTeX vs. Word paper, since those words speak for themselves — and imitation is the sincerest form of flattery.

The study

This empirical study compares microwave usage to full-fledged cooking for food preparation. The experimental methodology was straightforward. We gathered 40 volunteers who use to cook their dinners — some using a microwave to defrost preprocessed food, some cooking from raw ingredients following a recipe. All participants used their own kitchen to run the experiments.

We set up three different sample meals: (1) a TV dinner with two food compartments; (2) a different brand of TV dinner with four food compartments; and (3) a dish consisting of 100 grams of spaghetti aglio e olio. Each participant had 30 minutes to cook each meal using his or her chosen food preparation technique, process, and equipment. Who chose full-fledged cooking was given access to a repository of raw ingredients; and who chose microwaving was given a supply of common TV dinners of different brands. The performance of each participant was measured for each sample meal by three variables: (1) the number of visual differences (food layout, appearance, color) between the cooked meal and the sample; (2) the number of differences in flavor between the cooked meal and the sample; and (3) the amount of hot edible mass produced within 30 minutes. Each participant also completed a questionnaire where they self-evaluated their performance.

The experimental results are unequivocal. Microwave users outperformed traditional cooks on most measures ( $\text{[math]}$ p < 0.05[/latex], often even [latex]p < 0.01[/latex]), with the only exception of the spaghetti dish. Even the most expert cooks were unable to reproduce, from raw ingredients, TV dinners that look and taste like the sample TV dinners, in contrast to the novice microwave users who managed to effortlessly heat to near perfection large amounts of prepackaged food.

These results suggest that full-fledged cooking should be adopted only to prepare complex dishes mainly consisting of oil and garlic. Experienced cooks may argue that the overall quality and healthiness of properly cooked food provides for a meal experience that is considerably more enjoyable than ingurgitating a tasteless, fat-laden, appalling microwave dinner. Although this argument may be true, the differences with the recently introduced versions of TV dinners may be a tad less spectacularly obvious than they were in the past. Thus, the results show that no reasons exist to use traditional means of cooking, except possibly for dishes that contain complex combinations of olive oil and pasta.

An unexpected result of the study was that, according to the questionnaire’s answers, full-fledged cooks are highly satisfied with their experience. Despite incurring reduced usability and productivity, they assessed their work as less tiresome, less frustrating, and more enjoyable than microwave users. From a psychological perspective, the most reasonable explanation is that the cooks are delusional lunatics who are unwilling to reconsider their manifestly incorrect beliefs about their cooking ability in light of their actual poor results in faithfully reproducing low-grade industrial food.

The study’s results also have implications in terms of costs of food preparation and consumption by the public. Individuals have a responsibility to act economically and efficiently, especially in cases in which their occupation is publicly funded. No reliable data is available about how many publicly-employed workers cook their own meals, and correspondingly it is unclear the amount of taxpayer’s money that is spent worldwide by individuals who stubbornly insist on cooking food from raw ingredients over sticking to a more efficient and modern meal preparation system, which would free up their time to advance their respective field of occupation.

I therefore suggest that leading public institutions should consider accepting time-squandering food preparation practices by their employees only if this is justified by the prevalence of dishes involving spaghetti and garlic. In all other cases, said institutions should request employees to eat microwave food (or order take out). We believe that this would be a good policy for two reasons. First, the flavor and appearance of food is secondary to its nutritional values. And, second, preventing people from frittering away scarce culinary resources would save time and money to maximize the benefit of work and development for both individual institutions and the public.

P.S.: Some readers suggested two additional aspects in which microwave cooking is superior. First, adjusting the heating power is much easier with a microwave (where pushing a button immediately interrupts the flow of radiation) than with a traditional stove (where the stove’s surface may remain hot for minutes even after power is turned off). And, second, crispy food can be properly cooked using the recently introduced hot air circulation system available in several high-end microwave ovens.

Bug counting

A little specification can go a long way

Monthly Archives: February 2015

Seriously, why LaTeX?

Why markup languages are better

Why TeX is better

Then: LaTeX?

An efficiency comparison of food preparation systems

The study