It was the best of times, it was the worst of times, it was the age of fastness, it was the age of sluggishness, it was the epoch of abstraction, it was the epoch of detail, it was the season of generality, it was the season of specificity, it was the spring of right, it was the winter of wrong, we had improving performance before us, we had degrading performance before us, we were all going direct to where Moore predicted, we were all going direct the other way — in short, the period was so far like the present period, that some of its most influential experts insisted on its being received, for good or for evil, in the superlative degree of comparison only.

Sometimes I have the impression we’re experiencing a collective delusion regarding how we’re dealing with the end of Moore’s law — the prediction that the density of transistors in integrated circuits grows exponentially over time.

The growth predicted by Moore’s law has gone on for decades, providing a veritable, Lucullan free lunch to the entire information technology industry. But now this bonanza is running out: the density growth is slowing down, and improvements in miniaturization do not directly translate into faster clock rates.

That exponential growth is not sustainable in the long term should come as no surprise to any trained engineer. In Moore’s words: The nature of exponentials is that you push them out and eventually disaster happens. Or, using Linus Torvald’s unmistakably caustic style (talking about a different exponential growth): Unending exponential growth? What drugs are those people on? I mean, really.

Rather than coming to terms with the harsh reality — the party is over — we prefer to sugar the pill and tell a less dramatic story. Sure, the clock rate of microprocessors has not increased significantly in a while, but the progress of electronics technology is still spectacular, and is giving us chips with more and more computational units crammed onto the same circuit. So, the updated story goes, the new challenge is exploiting the increasingly parallel hardware by writing increasingly concurrent software. Of course, concurrent software is notoriously hard to get right, but software technology is up to the challenge. More of Moore through concurrency.

But can concurrent programming really let us keep on riding the exponential speedup wave? There’s no question that improving the state of concurrent programming is a fascinating research challenge, both intellectually and practically. But I’m afraid the end of Moore’s law is more disruptive than we are willing to admit. And its long-term consequences cannot be countered simply by better programming.

The elephant in the room is that we know of only a handful of algorithms and programs that can be massively parallelized. In most cases, there is a significant fraction of the computation that is intrinsically sequential, which drastically limits, per Amdahl’s law, the speedup that can be obtained by adding computing cores. You can have the best concurrency model in the world, a flawless implementation, and as many cores as you like, but still be nowhere near anything like exponential speedup (in fact, not even linear speedup!).

Here’s what some Don Knuth said about it in a 2008 interview that should be quoted more often.

To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! […]

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. […]

[Hardware vendors] think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. […]

From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation.

I’m inclined to give him the benefit of the doubt and assume he’s right, as I have the impression the guy knows a thing or two about algorithms and programming.

I look forward, by all means, to the challenges of combining abstraction and efficiency in concurrency research. But perhaps we should not be selling concurrent programming as a potential “fix” to the end of Moore’s law. While every exponential growth eventually stops (I mean, really), we can have the more down-to-earth goal of making the most out of the new exotic hardware architectures that are being developed. May we live in interesting times.

I hope nobody took the previous post too seriously. It was meant as a satirical joke, pointing out several shortcomings of the notorious LaTeX vs. Word comparison study. I still want to say a few things on the topic, though; and this time I’m completely serious.

I realize that LaTeX can be daunting for scientists who have little programming (or, more generally, computer science) background, to the point that they may well opt for user-friendlier alternatives. I cannot deny that knowing one’s way around LaTeX requires some time; and, until they reach a good level, LaTeX users may be slowed down and ineffective. However, I think it is important to at least understand why LaTeX remains predominant in scientific publishing; that is, what benefits come after a sufficient investment in learning to use it. If you have the same background I have, this post will probably sound completely obvious. Otherwise, read on.

The key features of LaTeX as a document preparation system are:

  1. LaTeX is a markup language, that is it consists of annotations combined with the actual content to be displayed; the annotations define structure and appearance of the content, but both are expressed as plain text.
  2. LaTeX is a set of commands that translate down to TeX and Metafont.

Both features contribute to the user unfriendliness of LaTeX compared to so-called WYSIWYG document preparation systems such as Word; but also are what makes LaTeX superior in practically every other aspect. To put it in concrete terms: I think it’s entirely possible that LaTeX will eventually be replaced, for scientific publishing, by a different document preparation system (Markdown extensions anyone?). But, if this will ever happen, the replacement system will certainly be another markup language; and will very likely translate down to TeX and Metafont.

Why markup languages are better

Markup languages, where all information is encoded in plain text, are naturally amenable to editing using a whopping variety of tools that also operate on text files. Your favorite text editor — including Word itself, if you insist — is also a LaTeX editor, with spell checking and all other functionalities readily available. Text manipulation tools based on regular expressions — grep, sed, awk, perl, you name it — are directly applicable to LaTeX texts. Data processing environments — such as R for statistical analysis — can easily export data to LaTeX since all they have to produce is suitably encoded text output.

Version control tools, such as Subversion and Git, use lines of text as fundamental units. Hence, they work out of the box on LaTeX files to provide powerful support to collaborative editing and versioning. This goes well beyond the primitive capabilities of Word’s “track changes” function: authors can edit in parallel the same source file, inspect each other’s changes, and automatically merge individual variants into a master copy. Easy-to-read visual presentations of changes are still possible through a variety of graphical interfaces to version control software, or by means of other tools that work on text (try latexdiff for a presentation that looks like Word’s track changes).

Markup languages tend to enforce a clear separation between content and presentation. I wrote “tend to” because some markup languages do a better job than others at separating structural markup and presentation markup. LaTeX does not excel at this, since it does not always clearly distinguish between, say, an annotation that marks the beginning of a section (a structural detail) and one that marks a word to be displayed in a different font (a presentation detail). Nevertheless, writing documents in LaTeX lets you focus on content much better than WYSIWYG systems like Word, which are centered around commands to change presentation (font type and size, line spacing, page breaks, and so on) and offer limited or hard-to-use support to add complex structural information (section and subsection titles, cross references, and so on). In fact, I have the impression that only advanced Word users take advantage of functionalities such as to define styles and indexes. The situation in LaTeX is reversed: inexperienced users learn only very basic presentation-changing commands (such as to have text in bold and italic), but know well how to mark the beginning of sections, and how to introduce cross-references, bibliographic entries, and other structural information. Changing presentation details such as line spacing, font type, or the spacing of rows in tables is instead the domain of the LaTeX expert, who has hopefully also learned not to change them from the default, defined in a stylesheet, unless there is a compelling reason. In summary, markup languages help you focus on content, defer fixing presentation details to when the content is stable, and change presentation uniformly and robustly based on the intended structure of text.

Compared to intuitive WYSIWYG tools, using markup languages may slow you down: you may have to lookup the language’s syntax to do things you don’t know or remember from previous usage; and achieving the desired presentation form requires planning the organization of a document before starting writing, or however modifying it consistently at some point. But this kind of slowing down is a good thing: planning and paying attention to the structure beforehand is generally necessary to achieve a good document design.

Why TeX is better

To understand the unique strengths of TeX (and its companion Metafont) we should have an idea of its history, whose protagonist is the legendary computer scientist Don Knuth. Working on a new edition of his magnum opus “The art of computer programming”, he became strongly dissatisfied with the typographic quality achieved by state-of-the-art technologies, in particular when it came to typesetting mathematics. To solve the problem for good, he started designing the TeX system in 1977, expecting to finish the project during his sabbatical in 1978; he was off only by a little over 10 years. Knuth tells the story in his own words in this interesting interview (the linked segment is about how much time and fretting took him to devise a suitable scalable algorithmic definition of the shape for the letter “S”: I’m pretty sure he would disagree with the LaTeX vs. Word study regarding “the appearance of the text [being] secondary to the scientific merit of an article”).

As a result, TeX is a top-notch thorough solution to the problem of providing a typesetting system that offers very high-quality output on every machine where it runs. Almost 30 years since its first stable release, and with all its idiosyncrasies, TeX’s core remains technically unsurpassed and a de facto standard.

Several features of LaTeX derive from using TeX as back-end. An important one is the possibility of defining powerful macros — a feature LaTeX directly inherits from TeX (in fact, LaTeX is essentially an extensive collection of TeX macro). The possibility of having complete control on the back-end through macros provides great flexibility to LaTeX users, if at the expense of having to deal with an occasionally abstruse syntax.

Then: LaTeX?

If you care about typographic quality; if you think it is conducive to — not in contrast with — clarity of presentation and quality of ideas; if you’re willing to invest time now to become way more productive later; if you would like to take advantage of the outcome of decades of top-of-the-line tool development; and if you think microwaved TV dinners are a poor substitute for properly cooked food. Then, you may want to give LaTeX a serious try; or to start designing the next markup language that will replace it while remaining compatible with the TeX ecosystem.

If you’re familiar with computer science, you’re also familiar with the notion of abstraction. Abstraction is ubiquitous in computing. Whereas it belongs to all of science in one way or another, it has a pivotal role in computing where everything abstracts something more concrete and concrete computations are made of abstractions.

If you’re familiar with music, you’re also familiar with the notion of abstraction. Abstraction is ubiquitous in music. Whereas it belongs to all of the arts in one way or another, it has a pivotal role in music where everything abstracts something more concrete and concrete music is made of abstractions.

The similarity between computing and music with respect to abstraction runs deeper. Abstraction is introduced to solve similar problems that originate in a tension between flexibility and stability, generality and concreteness, effectiveness and understandability. In this post, I illustrate the analogy with one concrete example in music and one concrete example in computing. To make sure that the discussion does not become too — ahem — abstract, I recommend you first read the part you’re more familiar with, so that the analogies in describing its counterpart will be easier to follow.

Abstraction in music: the sonata form

A music composer has to reconcile his aspiration of creativity with the listener’s expectation of a structure familiar to her. Ideally, the composer’s be unencumbered, free to express his ideas by whatever musical items he deems appropriate; but such an approach has a high risk of alienating the listener, who may not be able to follow the ideas expressed in music without any structure for orientation. This is why music (like other arts) has developed forms, which are conventions that help achieve trade-offs between constraining the composer and convincing the listener. The sonata form is one of the most popular of such musical forms as proved by its malleability and longevity.

The sonata form is a prescription to organize the structure of a musical piece in a way that is regular but also leaves plenty of room for the outcomes of artistic creativity. A piece in this form consists of three main sequential parts: exposition, development, and recapitulation. Let’s illustrate them on the first movement of Beethoven’s Piano Sonata #9 (Op. 14 No. 1); you can listen to it played by Claudio Arrau in this YouTube video (I’ll link to specific segments as I introduce the sonata’s parts) and follow the music on one of these scores in the public domain.

The exposition (bars 1–60) introduces a number of themes (or subjects), which are melodies expressing the ground content of the whole piece. By opening with a presentation of unadulterated themes, the composer offers a convenient entry point to the listener: by becoming familiar with the primary ingredients of the sonata, she becomes capable of following the composer’s creation into unexpected territory without losing a general sense of direction. Beethoven’s sonata introduces three themes: the first theme (bars 1–21) begins with a first melodic line in the tonic (E major), which is then varied to construct a modulation into the dominant (B major); the second theme (bars 22–38) takes over in this key with a cantabile character, contrasted by the ensuing third theme (bars 39–56) which is more rhythmic in nature. A codetta (bars 57–60) takes up the last bars of the exposition and connects the third theme to a repetition of the exposition and then, after the repetition, to the second main sonata part: the development.

The development (bars 61–90) is a sonata’s central part, where the composer has substantial freedom to explore his ideas without major structural constraints. By varying and recombining the thematic material previously presented, he can lay out a compelling and imaginative musical argument while building atop the foundations laid in the exposition. The development in Beethoven’s sonata is fairly short, but it manages to explore some interesting fluid tonal contrasts between major and minor keys, and to expand the scant material of the codetta in a more dignified context (from bar 83).

The development is followed by the third sonata part: the recapitulation (bars 91–147). As the name suggests, the recapitulation is a concluding summary where the by-now familiar themes make their last appearance. The composer still has room for expressing his creative ideas in the recapitulation, as themes often change attributes such as presentation order, key, rhythm, and other melodic or harmonic details. Beethoven, for example, ventures into a minor key while transitioning from the first to the second theme in the recapitulation, and introduces other ingenious surprises that reward the attentive listener. The movement closes with a coda (from bar 148), which expands the material of the exposition’s codetta while restoring the awaited tonic key for an irenic conclusion.

The imperfectly symmetric exposition and recapitulation constitute an interface between the deep concrete musical ideas stirring in the composer’s mind and the ears and brains of the listener, who can so appropriate, in a generalized form, some of those ideas and enjoy them.

Abstraction in computing: the modular routine

A computer programmer has to reconcile her creative aspiration of producing an efficient implementation with the user’s expectation of an interface easy to understand and exporting predictable behavior. Ideally, the programmer’s be free to resort to any technical means she deems appropriate to build an implementation that is fast, uses little memory, or is idiomatic in the chosen programming language; but too much freedom has a high risk of producing a program that is hard to use because its exact requirements and guarantees are arduous to fathom. This is why computing (like other scientific and engineering disciplines) has developed techniques for modularization, which are approaches that discipline how program code is structured in such a way that the effects of modular components can be understood in isolation as well as in combination. The modular routine is one of the most popular elements of modular design as proved by its ubiquity across programming languages.

The modular routine combines a piece of code (the body or implementation) with a well-defined interface. The interface consists of a signature, a precondition, and a postcondition. Let’s illustrate these parts on the algorithm for binary search in sorted arrays; here’s an implementation taken from Tim Bray’s, which I adapted to pseudo-Eiffel and annotated with a suitable complete functional specification.

The program text begins (line 1) with the signature, which declares the data the routine operates on. The input data consists of the integer array a, which includes a.count elements from a[0] to a[a.count - 1], and of the integer scalar target, whose value is searched among a‘s elements. The output data is also an integer scalar, referred to as Result in the rest of search_binary.

Routine search_binary works correctly under certain assumptions about properties of the input data passed to it by callers. These assumptions are encoded as the precondition (lines 2–4) which includes two clauses. First, a must refer to a valid array object (line 3). Second, a‘s content must be sorted in nondecreasing order (line 4). The second precondition clause is specific to binary search, whose algorithm leads meaningless results if applied to sequences that are not ordered.

The body (lines 5–27) is a routine’s central part, where the programmer has substantial freedom to work out a suitable implementation without major restrictions beyond those given by the programming language’s features. The body of search_binary begins with a declaration of three integer local variables (line 6), which will be used to keep track of intermediate states in the computation. A loop occupies the major part of the body and does the heavy lifting of searching for the target. As we understand from the loop invariant (lines 10–13), the interval that goes from low to high (excluded) marks, at any point during the computation, the range of values in a that have not been searched for yet. Conversely, the elements at indexes from 0 to low are not greater than target; and the elements at indexes from high to a.count - 1 are greater than it. The body of search_binary exits the loop only when the whole array a has been searched. At that point, the final conditional (lines 23–27) can conclude whether a value equal to target exists at all in a: if low is still -1 or points to an element not equal to — and hence less than because of the invariant — target, the routine returns the invalid index -1 to denote “element not found”; otherwise, a[low] is a valid array element equal to target, and the routine returns its index. The choice of decoupling searching from determining whether target is in the array is an example of the programmer’s freedom to design an implementation that achieves specific, advantageous trade-offs. Bray points out why this approach is preferable to performing two tests in the loop (one for less than, and one for equal to) in the hope of exiting the loop earlier if target is found: in a search where the search interval shrinks exponentially, “nearly all” values are found in the last iteration; hence minimizing the operations per iteration is the most efficient solution. Other programmers, given different constraints, may however follow other approaches; for example, Joshua Bloch’s implementation of binary search in Java’s OpenJDK uses a loop with three-way comparison.

Understanding all these details of the implementation would be burdensome to users who just want to call search_binary to avail of its functionality. This is why the body is followed by the last routine part: the postcondition (lines 28–31), which summarizes the routine’s output in relation to the value of the input arguments. The postcondition of search_binary includes three clauses, defining the output whether the value searched for has been found or not. If Result denotes a valid index of a (line 29), then it points to an element with value target; otherwise Result is -1 (line 30), and hence a has no such element. The last postcondition clause (line 31) specifies that these are the only possible values Result may take; that is, the postcondition is complete. The functional postcondition nicely abstracts not only the details of the specific implementation of binary search, but even the fact that a binary search algorithm has been executed. Any search routine, be it binary, sequential, or following any other technique, could export the same postcondition in its interface to communicate to the user its input/output behavior.

The mirroring precondition and postcondition constitute a rigorous interface between the ingenious ephemeral details of the efficient implementation engineered by the programmer and the general terse information that is available to the user, who can so reuse, in different contexts, the programmer’s code and take advantage of its reliable behavior.

Coda

Although the analogy between sonatas and routines is imperfect, it unveils deep conceptual connections. As the key to managing complexity, abstraction underpins any efforts to further knowledge; and helps push to new horizons music and programs alike.

Last week I attended the second Workshop on Software Correctness and Reliability organized by Peter Müller and Martin Vechev here at ETH. As in the first edition one year ago, the workshop featured an impressive collection of speakers who talked about diverse topics in verification. Videos of the talks will be publicly available, so there’s no need to write detailed summaries. Instead, I give my telegraphic impressions of each talk in the form of one-liners. If I used Twitter these could’ve been live tweets; except that I, like Machete, don’t tweet 🙂 (not at the moment anyway). As the name suggests, the impressions do not necessarily talk about the main intended message of each talk; but at least they’re short: if anything tickles your fancy go and watch the videos!

James Larus
Improving the software development culture is not the job of academics; their job is solving technical problems.
Martin Rinard
Just because it passes all tests, it doesn’t mean the program is correct.
David Basin
Security is a form of enforceable safety.
Rajeev Alur
Regular functions or: the power of automata with strings.
Armando Solar-Lezama
Synthesis to improve the code of programmers who use loops where they could use SQL.
Anders Møller
Static analysis to check the code of programmers who use jQuery where they could use loops.
Martin Vechev
Machine learning to guess the meaning of minified (or otherwise obscure) programs.
Peter Müller
Teenage mobile app developers benefit from abstract interpretation.
Vijay Saraswat
Failing asynchronously is OK, as long as someone cleans up.
Ranjit Jhala
Piggybacking type systems to prove properties of (post-)functional languages.
(Rustan Leino featured in absentia as the one who knows what to do with universal quantifiers — the “SMT whisperer”.)
Ahmed Bouajjani
Checking linearizability may be hard, but checking trace inclusion need not be.

Since I skipped the last session, I cannot offer my impressions of Mayur Naik‘s and Manfred Broy‘s talks. If you were there or have watched the videos, feel free to do so in the comments.

This summer, Sebastian Nanz and I have finally figured out what the best programming language is. The answer is…

Of course you immediately understood that the incipit is a joke. When it comes to complex feature-laden artifacts like general-purpose programming languages there is no such thing as the best tool for the job. In the reality of software development, different programmers with different skills, different attitudes, and different mindsets solve different problems following different methods, different practices, and different processes in different environments, using different tools and different programming languages. As a result, each programming language design strives to find trade-offs that are convenient to someone in the motley crowd of programmers.

Still, the trade-offs of different languages should be demonstrable by impacting measurable features of programs written in those languages. In this recent work [Nanz and Furia, 2014], we have tried to contribute empirical evidence to better our understanding of how programming languages can be used in practice. One of the aspects we were interested in investigating was whether one can find empirical evidence to justify some of the folk knowledge about programming languages, which is very often passed on as a series of ipse dixit that should be self-evident — except that sometimes different authorities have dissenting opinions!

Before summarizing the results of our study, here’s something about the methodology. An important decision was to use Rosetta Code as source of raw data in the form of programs. Rather than hosting full projects — a service provided by other sites such as GitHub and Bitbucket — Rosetta Code focuses on well-defined programming tasks that can be implemented by small programs (the average program in our study is around 30 lines of code); this makes implementations of the same task in different programming languages directly comparable. The revision history and submission practices of Rosetta Code also suggest that programs are often revised by multiple programmers, and hence likely have a good quality on average; and the task list includes many relevant problems that are often part of large real-world projects. This setup helped make sound inter-language comparisons based on proper language usage, thus reducing dispersion and bias in the data. Based on a combination of their popularity in TIOBE and Rosetta Code, we selected 8 languages in four categories: C and Go as procedural languages; C# and Java as object-oriented languages; F# and Haskell as functional languages; and Python and Ruby as object-oriented languages. If your favorite language is not there do not despair: let us know in the comments why you think it deserves to be included; we might consider it for future work.

Let’s have a look at the main results (see the paper for all the details). The biggest surprise is that there are no huge surprises: well-trained programmers and software engineers will recognize several well-known adages about the advantages of certain programming language features over others. To make this apparent, I’ve collected excerpts from classics texts on programming languages that somewhat match our empirical findings.

Conciseness

It is generally understood that practical expressiveness boils down to conciseness:

The main benefit of the use of expressive languages seems to be the ability to abstract from programming patterns with simple statements and to state the purpose of a program in the concisest possible manner.

We have come to believe that the major negative consequence of a lack of expressiveness is the abundance of programming patterns to make up for the missing, non-expressible constructs.

[Felleisen, 1991]

Higher-order features such as list comprehensions, reflection, higher-order functions, and idiomatic support for lists and maps should increase the level of practical expressiveness, and hence conciseness:

Higher-order procedures can serve as powerful abstraction mechanisms, vastly increasing the expressive power of our language. [Pg. 75]

[…] expressive power […] is attained […] by accumulation and filtering [on lists]. [Pg. 81]

Elevate the conceptual level at which we can design our programs [means enhancing] the expressive power of our language. [Pg. 108]

[Abelson and Sussman, 1996]

Such higher-order features are more readily available in functional and scripting languages than imperative languages:

Against Java, we can say that (compared to, say, Python) some parts of it appear over-complex and others deficient.

[Pg. 340 in Raymond, 2003]

We measured conciseness in terms of lines of code, comparing solutions in each language against those in other languages. Our numbers draw a picture that is largely consistent with the above quotations: functional and scripting languages provide significantly more concise code than procedural and object-oriented languages. Their higher-order features increase practical expressiveness, to wit, conciseness. While in principle one can follow a functional style using object-oriented languages, it is idiomatic support that seems to make a tangible difference in practice.

Performance

Performance is another often controversial programming language feature. We tried to contribute to the understanding of performance in practice by distinguishing between two kinds of tests. Much of the controversy when discussing performance may derive from conflating these two kinds of problems, which represent very different conditions.

The first kind of performance comparison targets raw execution speed on large inputs; for example, sorting million-element arrays or compressing tens of megabytes of data. The outcome of our experiments using Rosetta Code tasks on such problems is what most people would expect: C is indisputably the fastest — if it was a race, it’d lap all other languages. A bit more generally, language features cost raw speed, and more features tend to cost more speed. In fact, the only runner-up (still from a distance) is Go, a language that is richer than C — it offers automatic memory management and strong typing — but deliberately renounces other expressive features, such as inheritance and genericity, that have become commonplace in modern high-level programming languages.

Programs that require maximum speed […] are good candidates for C. [Pg. 326]

Python cannot compete with C or C++ on raw execution speed. [Pg. 337]

[Raymond, 2003]

[The] main problem [of automatic memory management], however, is that “useful” processing time is lost when the garbage collector is invoked.

[Pg. 168 in Ghezzi and Jazayeri, 1997]

Most of the time, however, the extreme differences in raw speed that emerge with algorithmically-intensive programs on large inputs do not matter much because such jobs are absent or extremely infrequent in the overwhelming majority of applications, which hardly ever have to deal with number crunching. How many million-element arrays did your web browser have to sort while you were browsing the news? To understand performance differences more commonly occurring in everyday conditions, we identified a second kind of targets for comparison, consisting of well-defined problems on input of moderate size, such as checksum algorithms and string manipulation tasks. The results are quite different when we consider this second kind of everyday problems. Scripting and functional languages tend to emerge as the fastest, even surpassing C. More generally, the absolute differences between languages are smallish, which means that every language is usable, and engineering concerns other than raw speed emerge as more relevant.

Most applications do not actually need better performance than Python offers.

[Pg. 337 in Raymond, 2003]

To sum up, the most significant, and somehow neglected, finding that surfaced from our performance comparisons is this distinction between “raw speed” and “everyday” performance requirements, and the corresponding emphasis on the latter for most workaday programming tasks.

Failure proneness

Counting faults (or, more appropriately for this blog, bugs) is often used to measure the quality or success of software projects. The errors that are of principal interest in that context are those resulting from program behavior that diverges from specification; for example, a banking application that increases your balance when you withdraw money (although this is a bug most of us could live with 🙂 ). In contrast, our comparison of programming languages looked for errors that are independent of a program’s specification and have to do with what we might characterize as well-formedness, such as typing and compatibility errors. This is an admittedly restricted notion of error, but it lends itself to effective detection and analysis. The classic view on the problem of detecting such errors is clear:

[Checking for the presence of well-formedness errors] can be accomplished in different ways that can be classified in two broad categories: static and dynamic. […] In general, if a check can be performed statically, it is preferable to do so instead of delaying the check to run time […].

[Pg. 137 in Ghezzi and Jazayeri, 1997]

To see which languages follow this prescription and tend to perform more checks statically, we counted what fraction of programs in each language that compile correctly terminate without error (exit status zero). The compiled strongly-typed languages (that is, all compiled languages but C which is weakly typed) clearly emerged as those with the fewest runtime failures triggered in our experiments; their compilers do a fairly good job at catching errors at compile time by type checking and other static analyses. In contrast, the interpreted languages triggered runtime failures more frequently; nearly all checks but syntactic ones are done at runtime, when things can go wrong in many different ways.

Go was the least failure prone of the compiled strongly-typed languages. Given that we analyzed a large number of programs written by different contributors, we are reluctant to explain this difference mainly by attributing better programming skills to the authors of Go programs in our sample. Instead, this results refines the picture about what compiled strongly-typed languages can achieve: Go’s type system is more restricted than that of functional or object-oriented languages, which may help achieve type safety by minimizing dark corners where errors may originate.

Our results on failure proneness cannot set the debate about static vs. dynamic checks. This is another issue where a multitude of technical and organizational concern concur to creating several different local optima.

Be fruitful and replicate

No single study, no matter how carefully designed and executed, can claim to provide conclusive evidence about complex issues such as the relative merits and practical impact of programming language features. Scientific inductive knowledge grows slowly, one little piece of evidence at a time. Technological factors further complicate the picture, since they may shift the importance of certain features and render some obsolete (for example, the exponential growth of processor speed in past decades has favored the development of feature-rich graphical user interfaces that were previously impractical). Yet, it is encouraging that we have found significant concordance between our empirical results and others’ point of view. A lot of what you learned from the classics about programming language was right — as long as you picked the right classics!

References

  1. Sebastian Nanz and Carlo A. Furia: A comparative study of programming languages in Rosetta Code. Technical report arXiv.org:1409.0252, September 2014.
  2. Matthias Felleisen: On the expressive power of programming languages. Science of Computer Programming, 17(1-3):35-75, 1991.
  3. Harold Abelson, Gerald Jay Sussman, and Julie Sussman: Structure and interpretation of computer programs. 2nd edition, MIT Press, 1996.
  4. Eric S. Raymond: The art of UNIX programming. Addison-Wesley, 2003. Available online.
  5. Carlo Ghezzi and Mehdi Jazayeri: Programming language concepts. 3rd edition, Wiley & Sons, 1997.