It was the best of times, it was the worst of times, it was the age of fastness, it was the age of sluggishness, it was the epoch of abstraction, it was the epoch of detail, it was the season of generality, it was the season of specificity, it was the spring of right, it was the winter of wrong, we had improving performance before us, we had degrading performance before us, we were all going direct to where Moore predicted, we were all going direct the other way — in short, the period was so far like the present period, that some of its most influential experts insisted on its being received, for good or for evil, in the superlative degree of comparison only.

Sometimes I have the impression we’re experiencing a collective delusion regarding how we’re dealing with the end of Moore’s law — the prediction that the density of transistors in integrated circuits grows exponentially over time.

The growth predicted by Moore’s law has gone on for decades, providing a veritable, Lucullan free lunch to the entire information technology industry. But now this bonanza is running out: the density growth is slowing down, and improvements in miniaturization do not directly translate into faster clock rates.

That exponential growth is not sustainable in the long term should come as no surprise to any trained engineer. In Moore’s words: The nature of exponentials is that you push them out and eventually disaster happens. Or, using Linus Torvald’s unmistakably caustic style (talking about a different exponential growth): Unending exponential growth? What drugs are those people on? I mean, really.

Rather than coming to terms with the harsh reality — the party is over — we prefer to sugar the pill and tell a less dramatic story. Sure, the clock rate of microprocessors has not increased significantly in a while, but the progress of electronics technology is still spectacular, and is giving us chips with more and more computational units crammed onto the same circuit. So, the updated story goes, the new challenge is exploiting the increasingly parallel hardware by writing increasingly concurrent software. Of course, concurrent software is notoriously hard to get right, but software technology is up to the challenge. More of Moore through concurrency.

But can concurrent programming really let us keep on riding the exponential speedup wave? There’s no question that improving the state of concurrent programming is a fascinating research challenge, both intellectually and practically. But I’m afraid the end of Moore’s law is more disruptive than we are willing to admit. And its long-term consequences cannot be countered simply by better programming.

The elephant in the room is that we know of only a handful of algorithms and programs that can be massively parallelized. In most cases, there is a significant fraction of the computation that is intrinsically sequential, which drastically limits, per Amdahl’s law, the speedup that can be obtained by adding computing cores. You can have the best concurrency model in the world, a flawless implementation, and as many cores as you like, but still be nowhere near anything like exponential speedup (in fact, not even linear speedup!).

Here’s what some Don Knuth said about it in a 2008 interview that should be quoted more often.

To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! […]

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. […]

[Hardware vendors] think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. […]

From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation.

I’m inclined to give him the benefit of the doubt and assume he’s right, as I have the impression the guy knows a thing or two about algorithms and programming.

I look forward, by all means, to the challenges of combining abstraction and efficiency in concurrency research. But perhaps we should not be selling concurrent programming as a potential “fix” to the end of Moore’s law. While every exponential growth eventually stops (I mean, really), we can have the more down-to-earth goal of making the most out of the new exotic hardware architectures that are being developed. May we live in interesting times.

Dante Alighieri, the doyen of Italian writers, was born somewhere in Florence, sometime between May and June, 750 years ago. To celebrate this anniversary as a long-standing fan of his work, I will illustrate some aspects of his writing that make his work so timeless and fascinating. I will cast them as “writing tips”, since you can hardly find a more convincing writer than Dante. Sure, he was more artistic than technical writer, but the basic principles of good writing are largely the same in either domain.

All of the references are from the Divine Comedy, where I cite Longfellow’s English translation next to the original Italian text (Petrocchi’s standard version). Both full texts are available with commentary here.

Start with content

Good writing starts with good content, and great writing starts with great content. High-quality content is characterized by originality, variety, and clear connections with existing knowledge.

Dante is thoroughly familiar with the major cultural figures of the past, as well as with his illustrious contemporaries. The overarching plot of the Commedia is built on a comprehensive vision of the universe and unrolls a smorgasbord of styles, moods, topics, and characters neatly organized in a tripartite structure (Inferno, Purgatorio, and Paradiso), opening with an introductory canto followed by 33 cantos per part. As if he were writing a detailed Related work section, he pays homage to many a great people of the past in the Inferno’s Canto IV, including poets (Homer, Ovid, Virgil) and their characters (Hector, Aeneas), philosophers (Plato, Aristotle, Democritus), and scientists of their time (Ptolemy, Avicenna). Dante’s Related work dutifully compares against related approaches and gives credit. He gratefully acknowledges Virgil for his writing style:

   

and considers his own writing work sixth in “impact”, following Homer, Horace, Ovid, Lucan, Virgil:

   

Abstract to generalize

When the subject matter allows so, good writing includes abstract parts that generalize its subjects. Abstraction is the key to producing content that is of general interest and whose message goes beyond the specific, contingent context in which it was originally produced.

Dante’s idea of the cosmos is irremediably outdated — not to mention flat-out wrong — based as it is on Ptolemy’s astronomical models, Aristotle’s view of the physical world, and a variety of arcane inconsistent theological arguments. Nonetheless, his writings are everlasting because his characters and images are underlain by general concepts that represent timeless mental categories. Showing a knowledge of geometry which is unusual for a poet, he describes the heavens as a series of concentric spheres of decreasing radius. The smallest sphere — symbolizing God — reduces to a point, which however is said to encompass all the larger spheres as well as the Earth. We can explain the paradox by considering the concentric spheres three-dimensional projections of four-dimensional objects of increasing radius, whose projections appear as spheres of decreasing size. The complex, abstract image makes it suggestive even if unrealistic, and is an unintended forerunner of speculative ideas in modern physics about higher-dimensional universes (whose poetic descriptions can draw inspiration from Dante!).

(For technical details, see Odifreddi‘s explanation in Italian. If you cannot read Italian, Unix to the rescue: download the Python library goslate.py and run

to get an English translation on the fly.)

Make realistic — not necessarily real — examples

Abstraction is balanced by using realistic images as examples. Realistic means based on tangible, generally understandable situations, but not necessarily faithful reflection of the actual reality. Just like abstraction, exemplification is a means of expression rather than an end in itself.

Dante’s main subject matter is drastically removed from everyday experience, being concerned with a hypothetical afterworld populated by ghosts of long departed figures. To give such an abstract context a tangible flavor, he deploys a motley variety of metaphors and similes based on concrete physical images that everybody has experienced. Describing his leaving the “selva oscura” (itself a metaphor for moral perdition), Dante compares himself and his feelings to those of a shipwreck survivor who has just escaped with great difficulty from the treacherous waters onto the shore:

   

Adapt style to content

Every writer has their style, but the best writers are fluent in many different styles, which they will select according to what is most effective to convey the content at hand and to fit the context.

Dante called his masterpiece Commedia also to indicate its sweeping variety of styles and forms. Styles go from the gentle “dolce stilnovo” suitable to talk about love:

   

to the severe chastising of Italian political practices:

   

to the desperate grieving tone of human tragedy:

   

Write memorable words

Good writing is unforgettable for its eloquence. It always picks the mots justes. It is penetrating, often strong; it does not hedge.

Dante peppered his work with indelible passages, which make for great quotations thanks to their succinctness and persuasive eloquence. Nine lines are all it takes his Ulysses to sweep up his crew and persuade them to embark in the ultimate ill-fated exploration expedition:

   

There are countless other aspects that make Dante’s work still so fascinating after over seven centuries. While some of them may appeal only to certain readers, or may reflect outdated cultural views, his style still goes a long way in terms of influence, interest, and inspiration.

An empirical study comparing LaTeX to Word for preparing scientific documents appeared on PLOS ONE in mid December, just in time to ignite an intense online discussion that took place over the holiday season. Among the many comments that I read, one tweet by Michele Lanza made me realize that the study was more general than I originally thought:

The Word vs. LaTeX efficiency argument is in line with microwave food vs. actual cooking

What a terrific suggestion for related research! Back from the winter break, I decided to adapt the design of the LaTeX vs. Word study in order to pit microwave food against proper cooking. The results are, I believe, quite conclusive; and my recommendations based on them compelling. While I submitted a detailed write-up for publication, I want to give you a summary of the essential findings in the rest of this post. In my empirical research, I stuck to the original study’s design as closely as possible since it seemed as much apropos for comparing cooking methods as it was for comparing document preparation systems. In the write-up that follows you will recognize passages that mirror closely the structure and even the words of the original LaTeX vs. Word paper, since those words speak for themselves — and imitation is the sincerest form of flattery.

The study

This empirical study compares microwave usage to full-fledged cooking for food preparation. The experimental methodology was straightforward. We gathered 40 volunteers who use to cook their dinners — some using a microwave to defrost preprocessed food, some cooking from raw ingredients following a recipe. All participants used their own kitchen to run the experiments.

We set up three different sample meals: (1) a TV dinner with two food compartments; (2) a different brand of TV dinner with four food compartments; and (3) a dish consisting of 100 grams of spaghetti aglio e olio. Each participant had 30 minutes to cook each meal using his or her chosen food preparation technique, process, and equipment. Who chose full-fledged cooking was given access to a repository of raw ingredients; and who chose microwaving was given a supply of common TV dinners of different brands. The performance of each participant was measured for each sample meal by three variables: (1) the number of visual differences (food layout, appearance, color) between the cooked meal and the sample; (2) the number of differences in flavor between the cooked meal and the sample; and (3) the amount of hot edible mass produced within 30 minutes. Each participant also completed a questionnaire where they self-evaluated their performance.

The experimental results are unequivocal. Microwave users outperformed traditional cooks on most measures (p < 0.05[/latex], often even [latex]p < 0.01[/latex]), with the only exception of the spaghetti dish. Even the most expert cooks were unable to reproduce, from raw ingredients, TV dinners that look and taste like the sample TV dinners, in contrast to the novice microwave users who managed to effortlessly heat to near perfection large amounts of prepackaged food.

These results suggest that full-fledged cooking should be adopted only to prepare complex dishes mainly consisting of oil and garlic. Experienced cooks may argue that the overall quality and healthiness of properly cooked food provides for a meal experience that is considerably more enjoyable than ingurgitating a tasteless, fat-laden, appalling microwave dinner. Although this argument may be true, the differences with the recently introduced versions of TV dinners may be a tad less spectacularly obvious than they were in the past. Thus, the results show that no reasons exist to use traditional means of cooking, except possibly for dishes that contain complex combinations of olive oil and pasta.

An unexpected result of the study was that, according to the questionnaire's answers, full-fledged cooks are highly satisfied with their experience. Despite incurring reduced usability and productivity, they assessed their work as less tiresome, less frustrating, and more enjoyable than microwave users. From a psychological perspective, the most reasonable explanation is that the cooks are delusional lunatics who are unwilling to reconsider their manifestly incorrect beliefs about their cooking ability in light of their actual poor results in faithfully reproducing low-grade industrial food.

The study's results also have implications in terms of costs of food preparation and consumption by the public. Individuals have a responsibility to act economically and efficiently, especially in cases in which their occupation is publicly funded. No reliable data is available about how many publicly-employed workers cook their own meals, and correspondingly it is unclear the amount of taxpayer's money that is spent worldwide by individuals who stubbornly insist on cooking food from raw ingredients over sticking to a more efficient and modern meal preparation system, which would free up their time to advance their respective field of occupation.

I therefore suggest that leading public institutions should consider accepting time-squandering food preparation practices by their employees only if this is justified by the prevalence of dishes involving spaghetti and garlic. In all other cases, said institutions should request employees to eat microwave food (or order take out). We believe that this would be a good policy for two reasons. First, the flavor and appearance of food is secondary to its nutritional values. And, second, preventing people from frittering away scarce culinary resources would save time and money to maximize the benefit of work and development for both individual institutions and the public.

P.S.: Some readers suggested two additional aspects in which microwave cooking is superior. First, adjusting the heating power is much easier with a microwave (where pushing a button immediately interrupts the flow of radiation) than with a traditional stove (where the stove's surface may remain hot for minutes even after power is turned off). And, second, crispy food can be properly cooked using the recently introduced hot air circulation system available in several high-end microwave ovens.

Tired of the standard textbook descriptions of the Quicksort algorithm? Here’s a rhyming presentation suitable for all ages 🙂

As a reading guide, here’s a matching pseudo-code partial implementation that follows the procedure described in the rhyme. Enjoy your holidays!

It’s really a combination of all three. More questions? 🙂

Seriously, the issue of the real “nature” of informatics (I don’t call it “computer science” only to avoid adding to the terminological confusion) — applied mathematics, science of information, or engineering discipline — remains a topic of intense debate in more than one circle. A related discussion is the relationship between informatics theoreticians and practitioners, and whether they can get along (yes they can and do, as Scott Aaronson patiently explains).

I must admit that discussions about these topics leave me mostly indifferent, if not mildly baffled. What we call “informatics” or “computer science” has grown from roots in different branches of science and engineering, so that its true “nature” is not in any single one of them but in their amalgam. This should be evident by looking at the history of modern informatics and its eminent pioneers. Its eclectic origins are a tremendous source of vitality and part of the reason why computer science and its applications have become ubiquitous in pretty much any other science to provide conceptual and practical analytical tools. And this is also a great deal of what makes computer science so interesting!

But perhaps I can make my position more convincing with an example. In the rest of this post, I select a computational problem and follow the train of thought that leads to a possible solution and implementation. My point is demonstrating how solving the problem satisfactorily requires a combination of theoretical and practical tools whose intimate connection cannot be severed without ruining their efficacy.

Take the problem of enumerating prime numbers: a number is prime if it is only divisible by itself and by one; a number that is not prime is called composite. Prompted by this blog’s motto (“A little specification can go a long way”), let’s approach the problem descriptively: we provide a simple characterization of prime numbers, and then use standard tools to match the characterization against any integer that we want to test for primality. The characterization is going to use regular expressions.

Primes is not regular

Wait a minute! Is this even possible? This was my reaction too the first time I read (I can’t remember where or when, but there are several articles exploring this idea) about using regular expression to characterize primality. The notion of prime numbers involves divisibility and integer arithmetic, which are beyond the capabilities of regular expressions or their operational counterparts (finite-state automata), whose expressiveness defines the so-called regular languages.

In order to analyze the problem rigorously, we have to cast it into the framework of formal languages. Using a simple unary alphabet of letters x, the language P of prime numbers can be defined as the set of all strings of x’s whose length is a prime number:

P = \{ x^p \mid p \text{ is prime} \}.

Let’s show that the language P of prime numbers is not regular. We could use the well-known pumping lemma, but a direct proof by contradiction is even more convincing. Assume that P is regular; then there exists a deterministic finite state automaton A_P that accepts precisely strings in P. Automaton A_P consists of a finite number N of states; take any prime p > N. (Side note: this is possible because there are infinitely many primes — if you’re not familiar with Euclid’s neat proof of this fact go check it out.) Consider the accepting path of x^p when it is processed by A_P: each transition consumes exactly one x; since there are more x’s in input than states in A_P, the path must include a loop that goes from some state S back to it. Call \ell the length of such loop, that is the number of x’s consumed while traversing it. We can see that adding a number of x’s multiple of \ell leads to an input that A_P still accepts. This is because \ell more x’s traverse the loop one more time, and the behavior of the automaton from S on does not change if the trailing input is the same because A_P is deterministic. Going back to the definition of P, we have that A_P accepts all strings of length p + k \cdot \ell for any nonnegative integer k. In particular, it accepts the string of p + p \cdot \ell x’s. But this is incorrect behavior: p + p \cdot \ell is not prime because it is divisible by p. This contradicts the hypothesis that A_P recognizes language P — the language of primes is not regular.

While we’re at it, let’s also prove that the language P of primes is not even context free. We can leverage Parikh’s theorem for that. A convenient way to state the theorem is: every context-free language is commutatively equivalent to some regular language. This roughly means that, if P is context-free, there exists a regular language P_r whose strings have the same number of letters as those in P irrespective of order. But order is immaterial for a language over unary alphabet such as P — length is the only distinguishing feature — which entails that P is context-free only if it is also regular. Since we have seen that P is not regular, it’s not even context-free.

Primes as a regex

The trick to circumvent the non-regularity of the language P of primes is switching from regular expressions — à la Kleene, used in formal language theory — to regexes — à la Thompson, used in classic Unix text processing utilities and part of the POSIX standard. Perl’s Larry Wall championed this useful terminological distinction (regular expressions vs. regexes), which we’ll consistently use in this presentation. Crucially, the regex feature that greatly increases the expressive power of regular expressions are backreferences, which can express repetitions of previously captured substrings. In fact, it’s not hard to see that all other features of POSIX regexes are just syntactic sugar expressible using Kleene’s regular expressions.

Back to the problem of expressing the language P of primes using regexes and backreferences. It’s actually easier to look at the complement problem: defining the language \overline{P} of composite numbers expressed as unary strings. A composite number c has at least one divisor (other than one and itself); that is, it can be written as a product c_1 \cdot c_2 for 1 < c_1, c_2 < c[/latex] or, equivalently, as a sum of [latex]c_2[/latex] equal terms [latex]\underbrace{c_1 + \cdots + c_1}_{c_2 \text{ terms}}[/latex]. Visualizing [latex]c[/latex] in unary, we can write it as partitioned into [latex]c_2[/latex] groups, each consisting of [latex]c_1[/latex] letters [latex]x[/latex]. This is a pattern we can express using a regex: a leading substring of two or more (that is, [latex]c_1[/latex]) [latex]x[/latex]’s (<code>xx+</code>, with <code>+</code> denoting one or more repetitions) followed by one or more (that is, [latex]c_2 - 1) of its replicas (\1+, with \1 denoting a backreference to the first matched substring). In all, regex (xx+)\1+ matches the composite numbers greater than one in unary notation. If we want to handle all natural numbers, we can use alternation (the | operator) to specify the special cases of 0 and 1 as composite: (xx+)\1+ | x? (with ? denoting zero or one occurrences, and assuming | has lower precedence than the other operators). We'd better do that as no mathematician since Lebsgue considers 1 a prime according to Wikipedia 🙂

The power of regexes

What is then the expressive power of regexes? Our primes example indicates that backreferences can express languages in classes beyond context-free. Precisely, we've shown that language P is not context-free and that regexes can express the complement language \overline{P}, but regexes are not explicitly closed under complement. However, it's clear that \overline{P} is also neither regular (because regular languages are closed under complement) nor context-free (because unary context-free languages boil down to regular languages). Alternatively, we can consider languages whose strings have the form w\,w for any string w, which are clearly expressible by means of regexes with backreferences ((.*)\1) but are neither regular nor context free (as one can show using, say, the pumping lemmas).

On the other hand, regexes seems incapable of expressing all context-free languages; in particular, they cannot express the languages of balanced characters such as parentheses — known as Dyck languages in formal language theory —, which are paradigmatic examples of context-free languages. Consider D = \{ a^n b^n \mid n > 0 \}, which is well-known to be context-free but not regular. Short of a rigorous proof, an argument that D is inexpressible by regexes could work as follows. As we match a regex against strings in D, each pair of captured pattern and backreference to it must occur either entirely within the a^n substring or entirely within the b^n substring. Otherwise, the backreference would deviate from the structure of strings in D where all b’s follow all a’s. This constraint, however, means that backreferences cannot carry any information between the a^n and the b^n parts of the string. All that is left is then limited to regular languages; hence it cannot express language D.

I have to remark again that this discussion applies to classic Unix/POSIX regexes, whose only non-regular operators are backreferences. Modern scripting languages often support a much larger superset of "Perl Compatible Regular Expressions" (PCREs) whose expressiveness far exceeds that of Unix/POSIX regexes as they include fully recursive patterns and even the possibility of invoking arbitrary code as subpatterns are matched. See this article for a nice overview of the expressiveness of PCREs as they are available in PHP; and this Stack Overflow discussion about how Java regexes can match patterns such a^n b^n.

Enumerating primes in Bash

After this long detour it's time to go back to the original problem of enumerating primes. Equipped with a descriptive characterization of composite numbers — the regex (xx+)\1+ | x? — let's turn ourselves to Unix command line tools that can interpret it. As an additional challenge in the original spirit of solving the problem descriptively, we'll try to avoid explicit loops entirely in our program of sorts for enumerating primes.

First, since our regex works on unary strings, we need a way to convert numbers from decimal to unary. We can use printf's format strings to this end: the format specifier %*s reads a numeric argument followed by a string argument and prints the latter as a space-padded string whose length is the numeric argument. If the string argument is absent, it just prints as many spaces as the value of the numeric argument. Then, we use tr to turn spaces into x's and voilà: our unary representation. For example, to print 24 in unary:

There's an alternative that uses printf's format specifier %.0s without requiring tr: printf 'x%.0s' {1..24}. See printf's documentation for an explanation of how it works.

Second, we have to call the above snippet involving printf for all numbers that we want to test. Instead of an explicit loop, we can generate a sequence {0..N} of nonnegative integers up to N and pass its values on to printf using xargs's option -n1, which dispatches one argument value per invocation of printf. However, printf's format string now needs a trailing newspace character to separate unary numbers by line.

It's finally time to use our regex. There's a minor issue here regarding the distinction between POSIX basic and extended regexes. The former include backreferences but no alternation; the latter include alternation but no backreferences. This is hardly a problem in practice, as most GNU tools (the one I'm assuming we're using) support backreferences using both syntaxes. As for the alternation, we don't really need to include it explicitly in the regex; instead, we use grep's option -e to specify multiple regexes and have the tool match any of them. We need to know about grep's options -E — to use extended regex syntax (just because it's nicer as it doesn't require to slash-escape metacharacters) — and -v to invert the match (because we want to keep the primes and discard the composites).

Don't try the previous snippet yet: it doesn't work! The problem is that grep matches the regexes against any portion of the input string. Since x? trivially matches the empty substring, the inverted match is the empty match. Let's introduce the ^ and $ markers to require that the regexes match the whole substring (which, by the way, is the standard interpretation of regular expressions in formal language theory). You can try and see that the following snippet enumerates primes up to 24 in unary.

The output in unary form is not very readable. We convert it back (without loops) using the expr utility, again with arguments dispatched one at a time using xargs. This is the only non-POSIX compliant part of the snippet. I couldn't find another way that doesn't use loops or variables; if you know one, please leave a comment. Anyway, we have our loopless prime enumerator!

A little empirical performance analysis

We have reasons to suspect that our prime enumerator is not very efficient. Enumerating primes up to 41 takes over 14 seconds on my desktop machine. This is not very good given that primality testing is in P and randomized algorithms such as Miller-Rabin are very fast in practice. It is also unsurprising: regex matching with backreferences uses backtracking to implement nondeterminism, which nullifies the otherwise linear-time performance of matching using finite-state automata. For the same reasons, memory usage should instead be roughly constant or grow very slowly: the size of the automaton used for the matching only depends on the regex to be matched and, linearly, on the length of the captured substring that is backreferenced.

Let's see if we can get some empirical data to corroborate this intuition. If the running time T(n) of our enumerator is exponential in the input, the ratio T(n) / 10^n should remain constant as the input size n increases. Input normally is measured in number of bits (or digits) for numerical algorithms; hence let n = \log(k), where k is the number we test for primality. Let's measure T(k) / k for each input k and see what it looks like. We allow ourselves to use loops this time, so that we can simplify the enumeration snippet using a for Bash loop.

We enumerate from 30 because this is when grep's running time starts to become non-negligible. It's time to use another command-line utility. Not to be confused with the homonym Bash command, utility time is normally located in /usr/bin/time and offers options to customize output using a format string. For each invocation of grep, we print the current value of k, and the CPU user time (%U) and maximum RAM resident set size (%M) taken by the process running grep. Since we now only care about these measures, we add the -q option to grep, which suppresses output. Finally, we redirect these statistics to awk, which will print on each line a triple k, T(k)/k, M(k) (T and M are time and memory). In all, run the command:

In case you're wondering, the 2>&1 is needed because time outputs to standard error, so a normal pipe won't work. In Bash version 4 and later you can write |& instead. Running the command on my machine gives the output:

It seems we were on the right track. Memory consumption doesn't noticeably grow. Time grows instead even more quickly than exponentially in the number of digits. If we play around a bit more, it seems that time approximately goes like a polynomial in k, and certainly asymptotically more slowly than a double exponential. See this article for a detailed discussion of the efficiency of regex matching implementations.

Conclusion

Was this just math? Just science? Just engineering? The sensible conclusion seems that it was a combination of all three of them: the theory of regular languages has mathematical flavor, their practical implementations in Unix tools is an engineering feat, and empirical analysis smells like experimental science. And that's the beauty of it!