programming languages « Bug counting

It was the best of times, it was the worst of times, it was the age of fastness, it was the age of sluggishness, it was the epoch of abstraction, it was the epoch of detail, it was the season of generality, it was the season of specificity, it was the spring of right, it was the winter of wrong, we had improving performance before us, we had degrading performance before us, we were all going direct to where Moore predicted, we were all going direct the other way — in short, the period was so far like the present period, that some of its most influential experts insisted on its being received, for good or for evil, in the superlative degree of comparison only.

Sometimes I have the impression we’re experiencing a collective delusion regarding how we’re dealing with the end of Moore’s law — the prediction that the density of transistors in integrated circuits grows exponentially over time.

The growth predicted by Moore’s law has gone on for decades, providing a veritable, Lucullan free lunch to the entire information technology industry. But now this bonanza is running out: the density growth is slowing down, and improvements in miniaturization do not directly translate into faster clock rates.

That exponential growth is not sustainable in the long term should come as no surprise to any trained engineer. In Moore’s words: The nature of exponentials is that you push them out and eventually disaster happens. Or, using Linus Torvald’s unmistakably caustic style (talking about a different exponential growth): Unending exponential growth? What drugs are those people on? I mean, really.

Rather than coming to terms with the harsh reality — the party is over — we prefer to sugar the pill and tell a less dramatic story. Sure, the clock rate of microprocessors has not increased significantly in a while, but the progress of electronics technology is still spectacular, and is giving us chips with more and more computational units crammed onto the same circuit. So, the updated story goes, the new challenge is exploiting the increasingly parallel hardware by writing increasingly concurrent software. Of course, concurrent software is notoriously hard to get right, but software technology is up to the challenge. More of Moore through concurrency.

But can concurrent programming really let us keep on riding the exponential speedup wave? There’s no question that improving the state of concurrent programming is a fascinating research challenge, both intellectually and practically. But I’m afraid the end of Moore’s law is more disruptive than we are willing to admit. And its long-term consequences cannot be countered simply by better programming.

The elephant in the room is that we know of only a handful of algorithms and programs that can be massively parallelized. In most cases, there is a significant fraction of the computation that is intrinsically sequential, which drastically limits, per Amdahl’s law, the speedup that can be obtained by adding computing cores. You can have the best concurrency model in the world, a flawless implementation, and as many cores as you like, but still be nowhere near anything like exponential speedup (in fact, not even linear speedup!).

Here’s what some Don Knuth said about it in a 2008 interview that should be quoted more often.

To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! […]

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. […]

[Hardware vendors] think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. […]

From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation.

I’m inclined to give him the benefit of the doubt and assume he’s right, as I have the impression the guy knows a thing or two about algorithms and programming.

I look forward, by all means, to the challenges of combining abstraction and efficiency in concurrency research. But perhaps we should not be selling concurrent programming as a potential “fix” to the end of Moore’s law. While every exponential growth eventually stops (I mean, really), we can have the more down-to-earth goal of making the most out of the new exotic hardware architectures that are being developed. May we live in interesting times.

I hope nobody took the previous post too seriously. It was meant as a satirical joke, pointing out several shortcomings of the notorious LaTeX vs. Word comparison study. I still want to say a few things on the topic, though; and this time I’m completely serious.

I realize that LaTeX can be daunting for scientists who have little programming (or, more generally, computer science) background, to the point that they may well opt for user-friendlier alternatives. I cannot deny that knowing one’s way around LaTeX requires some time; and, until they reach a good level, LaTeX users may be slowed down and ineffective. However, I think it is important to at least understand why LaTeX remains predominant in scientific publishing; that is, what benefits come after a sufficient investment in learning to use it. If you have the same background I have, this post will probably sound completely obvious. Otherwise, read on.

The key features of LaTeX as a document preparation system are:

LaTeX is a markup language, that is it consists of annotations combined with the actual content to be displayed; the annotations define structure and appearance of the content, but both are expressed as plain text.
LaTeX is a set of commands that translate down to TeX and Metafont.

Both features contribute to the user unfriendliness of LaTeX compared to so-called WYSIWYG document preparation systems such as Word; but also are what makes LaTeX superior in practically every other aspect. To put it in concrete terms: I think it’s entirely possible that LaTeX will eventually be replaced, for scientific publishing, by a different document preparation system (Markdown extensions anyone?). But, if this will ever happen, the replacement system will certainly be another markup language; and will very likely translate down to TeX and Metafont.

Why markup languages are better

Markup languages, where all information is encoded in plain text, are naturally amenable to editing using a whopping variety of tools that also operate on text files. Your favorite text editor — including Word itself, if you insist — is also a LaTeX editor, with spell checking and all other functionalities readily available. Text manipulation tools based on regular expressions — grep, sed, awk, perl, you name it — are directly applicable to LaTeX texts. Data processing environments — such as R for statistical analysis — can easily export data to LaTeX since all they have to produce is suitably encoded text output.

Version control tools, such as Subversion and Git, use lines of text as fundamental units. Hence, they work out of the box on LaTeX files to provide powerful support to collaborative editing and versioning. This goes well beyond the primitive capabilities of Word’s “track changes” function: authors can edit in parallel the same source file, inspect each other’s changes, and automatically merge individual variants into a master copy. Easy-to-read visual presentations of changes are still possible through a variety of graphical interfaces to version control software, or by means of other tools that work on text (try latexdiff for a presentation that looks like Word’s track changes).

Markup languages tend to enforce a clear separation between content and presentation. I wrote “tend to” because some markup languages do a better job than others at separating structural markup and presentation markup. LaTeX does not excel at this, since it does not always clearly distinguish between, say, an annotation that marks the beginning of a section (a structural detail) and one that marks a word to be displayed in a different font (a presentation detail). Nevertheless, writing documents in LaTeX lets you focus on content much better than WYSIWYG systems like Word, which are centered around commands to change presentation (font type and size, line spacing, page breaks, and so on) and offer limited or hard-to-use support to add complex structural information (section and subsection titles, cross references, and so on). In fact, I have the impression that only advanced Word users take advantage of functionalities such as to define styles and indexes. The situation in LaTeX is reversed: inexperienced users learn only very basic presentation-changing commands (such as to have text in bold and italic), but know well how to mark the beginning of sections, and how to introduce cross-references, bibliographic entries, and other structural information. Changing presentation details such as line spacing, font type, or the spacing of rows in tables is instead the domain of the LaTeX expert, who has hopefully also learned not to change them from the default, defined in a stylesheet, unless there is a compelling reason. In summary, markup languages help you focus on content, defer fixing presentation details to when the content is stable, and change presentation uniformly and robustly based on the intended structure of text.

Compared to intuitive WYSIWYG tools, using markup languages may slow you down: you may have to lookup the language’s syntax to do things you don’t know or remember from previous usage; and achieving the desired presentation form requires planning the organization of a document before starting writing, or however modifying it consistently at some point. But this kind of slowing down is a good thing: planning and paying attention to the structure beforehand is generally necessary to achieve a good document design.

Why TeX is better

To understand the unique strengths of TeX (and its companion Metafont) we should have an idea of its history, whose protagonist is the legendary computer scientist Don Knuth. Working on a new edition of his magnum opus “The art of computer programming”, he became strongly dissatisfied with the typographic quality achieved by state-of-the-art technologies, in particular when it came to typesetting mathematics. To solve the problem for good, he started designing the TeX system in 1977, expecting to finish the project during his sabbatical in 1978; he was off only by a little over 10 years. Knuth tells the story in his own words in this interesting interview (the linked segment is about how much time and fretting took him to devise a suitable scalable algorithmic definition of the shape for the letter “S”: I’m pretty sure he would disagree with the LaTeX vs. Word study regarding “the appearance of the text [being] secondary to the scientific merit of an article”).

As a result, TeX is a top-notch thorough solution to the problem of providing a typesetting system that offers very high-quality output on every machine where it runs. Almost 30 years since its first stable release, and with all its idiosyncrasies, TeX’s core remains technically unsurpassed and a de facto standard.

Several features of LaTeX derive from using TeX as back-end. An important one is the possibility of defining powerful macros — a feature LaTeX directly inherits from TeX (in fact, LaTeX is essentially an extensive collection of TeX macro). The possibility of having complete control on the back-end through macros provides great flexibility to LaTeX users, if at the expense of having to deal with an occasionally abstruse syntax.

Then: LaTeX?

If you care about typographic quality; if you think it is conducive to — not in contrast with — clarity of presentation and quality of ideas; if you’re willing to invest time now to become way more productive later; if you would like to take advantage of the outcome of decades of top-of-the-line tool development; and if you think microwaved TV dinners are a poor substitute for properly cooked food. Then, you may want to give LaTeX a serious try; or to start designing the next markup language that will replace it while remaining compatible with the TeX ecosystem.

Bug counting

A little specification can go a long way

Category Archives: programming languages

Not any-Moore

Seriously, why LaTeX?

Why markup languages are better

Why TeX is better

Then: LaTeX?

Abstraction in music as in computing