Good technical writing is largely an acquired skill. Like other elements of expertise necessary for research, one learns it by reading the classics, following expert‘s advice, and practicing a lot. After sufficiently extensive training and experience, one interiorizes rules and tips as second nature, and applies them whenever they are writing a technical piece.

In this post, I want to discuss what sometimes may happen as a side effect of extensively practicing the necessary imitative efforts that one goes through to master technical writing. Some assimilated rules may survive that are mere relics of outdated practices; others misinterpret the original intentions behind sound advice, or its right scope, and become unnecessary constraints that make for duller rather than more convincing writing. I call these “myths” of technical writing, because they are uncritically assumed in spite of lacking a convincing justification only because they appear, or are believed, to be common accepted practice. Some of them are about writing style, others are about typographical conventions.

Not all the myths I discuss have the same relevance or widespread usage. The selection is also somewhat a matter of personal opinion. My ultimate intent is to revisit some knee-jerk habits rather than uncritically perpetuating them without a convincing understanding of their rationale — as I certainly have done on occasion too. Since good writing is also a matter of taste, I believe that for every rule, and for every anti-rule, there are plausible exceptions. The bottom line really is: let’s be critical in choosing our writing style as we should be in our research activity.

Two columns are better than one.
A specter is haunting scientific publishing — the specter of two-column layouts. Multi-column page layouts were introduced at times when printing was expensive with the goal of cramming as much content as possible in the fewest pages. Compared to single-column layouts, two-column layouts require text in smaller fonts, complicate the placing of figures and tables, and make formatting of special text (such as equations and program code) and text justification more troublesome (a narrow horizontal space for text can accommodate few extra spaces for justification before it becomes ugly). All these features translate into less readable texts. I suspect the reason many publishers of scientific articles still insist on multi-column layouts is largely out of outdated practices rather than conscious decisions. I say we try to use single-column layouts whenever we have a choice, such as for technical reports and extended versions.
You can’t use contracted forms of verbs.
When I started writing research papers, I was told to absolutely avoid contracted verbal forms (can’t, won’t, aren’t, …); the reason given for this restriction was that they are not acceptable in formal, rigorous writing. You may have heard and followed similar rules; indeed, it seems that a large part of scientific writing avoids contracted forms. However, the justification for this recommendation is questionable: there’s nothing inherently informal, let alone sloppy, about contracted forms. Modern English writing, including technical texts, admits contracted forms as legitimate; up-to-date copyeditors do not correct them away. The recommendation to avoid them is still sensible when a true ambiguity may arise. For instance, “it’s” is the contracted form of both “it is” and “it has”: if the intended meaning is not clear from the context, it’s advisable to disambiguate by conjugating the verb in full. When ambiguity is not an issue, using contracted or full forms is a matter of flow and pace that each sentence should have. Not abusing contracted forms is the real rule to follow; but there is no need to ban them.
Italicizing foreign words is de rigueur.
Maybe in Elizabethan English; not anymore in the twenty-first century. Italic should mainly be used to emphasize words; just because a word has foreign origin it does not mean one should emphasize it. I recommend the following rule of thumb to decide whether a foreign word should be typeset in italic: if it’s included in a standard English dictionary, do not italicize; if it’s not included in the dictionary, are you sure you want to use the word at all?
The passive voice is to be avoided: we always avoid it.
I believe passive forms used to be quite common in scientific writing (they may still be so in some scientific fields other than computer science). The rationale: scientific content should be objective; since the passive voice can deemphasize the subject by omitting it, it makes it easy to achieve an impersonal style that can be mistaken for an objective one. However, a sweeping usage of passive forms tends to communicate indecisiveness and vagueness; therefore it has lost popularity in technical writing, where clarity and assertiveness are paramount.
Unfortunately, these considerations sometimes are turned into a ban of the passive voice, whereas they should be indications regarding when and how to use it to the best effect. Another incongruous reaction to the suggestion of limiting the usage of passive is overusing the first person plural — we — as sentence subject. Ironically, an indiscriminate usage of “we” has the same problem as an indiscriminate usage of passive voice: the subject becomes unclear and impersonality prevails. Indeed, some papers may use the first person plural for such diverse subjects as:

  • the paper’s text: “we describe in this paragraph” — this paragraph’s text describes;
  • the paper’s authors: “we hope that the results are useful” — the authors hope;
  • a subset of the paper’s authors: “we ran the experiments” — the graduate student ran;
  • an algorithm, program, or technique: “we generate a set of unit tests” — the test-case generation algorithm generates;
  • the paper’s authors together with readers: “we can see that the correlation is strong” — anyone looking at the figures can see.

The last form is the preferred meaning to be attributed to the first person plural. In all other cases, we’d better clarify the subject or use the passive voice to deemphasize it.

Different floats should be numbered independently: is Figure 2 before or after Table 1?
Another mainstream practice of typography is the habit of using independent numberings for different kinds of floats (such as figures, tables, and listings) and environments (such as numbered theorems, definitions, and remarks). For example, if a document contains two tables and two figures, they are referred to as Table 1, Table 2, Figure 1, and Figure 2. The problem with this practice is that the numbering does not carry any information about the relative order of the various floats: the first figure is Figure 1 regardless of whether other floats appeared before it. If it were readily available, the missing information would help readability by suggesting whether one should search forward or backward for a certain float given any current position in the paper. A similar argument applies to numbered environments. There are few authors who dare to change the common practice and adopt more practical numbering schemes. The classic Concrete mathematics [Graham, 1994], for example, numbers each theorem by the page of the book where it appears; it looks outlandish the first time you encounter it, but then it’s easy to understand and very convenient for browsing. I try to override the numbering scheme to one that is more reasonable whenever I write documents past a certain length — even if some copyeditors have occasionally enforced the standard, inconvenient scheme.
Never use citations as nouns; see [7] for an example.
Using citations as nouns — writing “see [3]” instead of “see Graham et al. [3]” — is possibly the only writing practice among those introduced to save every bit of space to comply with the strict page limits common to conference publishing that I find attractive for general usage even if it is normally shunned by copyeditors. It may not be very elegant, but there are cases in which none of the alternatives seems any better.
Of course, if we have no space constraints whatsoever, and are citing few references by well-known authors, writing their names out in full is clearer. But things are very different when we’re writing a “Related work” section full of citations and we are trying to combine readability and brevity. As a simple example, imagine three references [A], [B], and [C] all originating in the same research group, lead by Prof. Smith, but with slightly different (and long) authors lists. We would like to give a general introduction to the three works, and then perhaps single out [C] as the most closely related to our own. Which solution do you prefer?

  • Smith’s group has worked on baz in [A], [B], and [C]. [B] describes technique foo which is most similar to ours.
  • Prancer et al. [A], Donner et al. [B], and Blitzen et al. [C] have worked on baz. Donner et al. [B] describe technique foo which is most similar to ours.

I find the first option much more readable; I’m willing to accept a tad of informality in exchange for that.

Avoid repetitions; do not reiterate but use synonyms.
Of course superfluous repetitions should be avoided. But the suggestion to use synonyms is easily misunderstood as a requirement to uncritically use up the thesaurus as extensively as possible.
Italian journalist Cesare Marchi told a funny story [Quartu, 1986] to illustrate the perils of overdoing searching for synonyms. A student overly committed to avoiding repetitions had to write a passage on Hannibal’s armed expedition across the Alps during the second Punic war. The student mentioned that the Carthaginian general “crossed the Alps with elephants“, in the hope that Romans would be “shocked by those behemoths“; he went on telling the challenges of “climbing mountains with those mastodons“; he continued by describing the “death of many ungulate trunked herbivore mammals“. As you can see, avoiding repetitions is not a protection against unintended comedy.
Strunk and White [1999] give a great recommendation about repetitions: express coordinate ideas in coordinate form. Indiscriminately using synonyms may blur the connections between parts and thus render a text less clear and less cogent. There is another point to make in favor of repetition that is particularly relevant to technical writing: using a synonym for words that should have a precise well-defined meaning incurs the risk of insinuating the doubt regarding what the real writer’s intentions were. If I write of bugs in section 2 of my paper and go on about faults in section 3, am I referring to the same things in both sections? A little repetitiveness is worth having for greater clarity.
Paper structure is rigid; in particular, the first section is Introduction, the last section is Conclusion(s).
Have you noticed that the first section of most papers, invariably titled “Introduction”, is often not really introductory in character? More commonly, it contains motivations for the work presented in the paper and, more extensively, an overview or summary of the main content — more detailed than the abstract but still not requiring too much background or definitions. If this is the case, how come we always title that first section “Introduction”? This stale convention misses the opportunity to give it a more informative title, preferably connected to the actual content rather than completely generic. Or, if it really requires no title, why not omitting the title altogether? Though less strongly, similar observations apply to other standard sections of scientific articles. For example, it’s not compulsory to have a “Conclusion(s)” section unless there is some interesting material to be put there. In this respect, I welcome a practice of fields such as mathematics (as well as theoretical computer science) where there’s no stigma in ending a paper abruptly with a theorem or a proof. Provided, of course, that is the best usage of the available space.
IOKTUAP (i.e., It’s OK To Use Abbreviations Profusely)
Computer science jargon is already rife with acronyms and abbreviations, possibly more than any other technical discipline; this should be enough a reason already to limit the introduction of new abbreviations to few cases of necessity. As a minimum, every abbreviation or acronym but the most widely used ones should be spelled out in full the first time it is introduced in a paper. However, if an abbreviation is used only once or twice in a whole document, it’s advisable to avoid introducing it in the first place.
A related problem when using abbreviations is how to choose indefinite articles: “an LTL formula” or “a LTL formula”? The rule of thumb I follow chooses according to whether the acronym is usually pronounced as such or read out in full. “LTL” is usually read “/ell tee ell/”, and hence the indefinite article should be “an”; in contrast, I prefer “a FOL formula” to “an FOL (/eff oh ell/) formula” since “FOL” is normally read out in full as “First Order Logic”.
As usual, transgressions against any of these rules may be excusable if trying to save space to conform to rigid page limits.
Sentences should be kept short.
Unqualified criticism about sentences being “too long” reminds me of the (possibly apocryphal, but popularized by the movie Amadeus) comment “too many notes” attributed to Emperor Joseph II after watching Mozart’s Die Entführung aus dem Serail (by the way, it’s in italic because it’s a title, not because it’s German ;-). I do not condone, let alone encourage, habits of writing long, convoluted, obscure sentences, but I object to the notion that there is a limit to how long a sentence should be that cannot be trespassed without sacrificing clarity. Sentence length, like many other issues discussed in this post, is a matter of style and content. Dry, somewhat minimalistic writing à la Dalton Trumbo does not best suit every text.
If we diligently polish our writing aiming for clarity and crispness and calibrate our style to maximize expressiveness, we will be able to quip in response to nonspecific criticism about excessive sentence length, like Mozart quipped in response to the Emperor, that, in our sentences, there are just as many words as there should be.

If you have more myths that you’d like to discuss, or if you disagree with parts of my assessment, you are welcome to leave a comment.

References

  1. B. M. Quartu, editor: Dizionario dei sinonimi e dei contrari, Rizzoli, 1986. Foreword by Cesare Marchi.
  2. William Strunk Jr. and E. B. White: The elements of style, 4th edition, Longman, 1986.
  3. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik: Concrete mathematics: A foundation for computer science, 2nd edition, Addison-Wesley, 1994.

According to the lecture notes for his course on “Mathematical Writing”, Don Knuth thought about a contest for the best program “that is also a sonnet” (page 13 in the notes). I don’t know if he ever turned the idea into practice — I couldn’t find any example of this kind by him or by other authors — but I liked the idea the first time I read about it. With this post, I decided to give it a shot myself.

Knuth’s quote specifically referred to Pascal as the programming language to be used. I first considered Eiffel, another keyword-oriented language I’m familiar with, but I realized that a programming language with many keywords that are English words tends to impose more constraints, and hence makes the challenge harder. I picked the C programming language instead, also hoping to rely on the permissiveness of its type system. (This permissiveness has successfully been used to obfuscate, but our goals here are somewhat opposite as we’re trying to achieve some kind of readability.) Remember this for the next debate about programming language features: a weak type system may be better for poetry.

Going for a sonnet seemed too much for a first attempt; I opted for a short limerick instead. I made up a bunch of rules to make the task feasible but also to avoid solutions that merely exploit loopholes of the language or the compiler:

  1. Punctuation and word and line breaks are inserted as needed, and extraneous characters are ignored: what matters is the sequence of alphanumeric characters in the program text.
  2. The program must compile (say, with gcc) without errors and produce a valid executable; compilation warnings are acceptable.
  3. Comments are disallowed or, equivalently, do not contribute to the limerick’s text.
  4. Literal strings are allowed only sparingly. In particular, the trick of using a literal string as an expression used as statement is discouraged. (Consider this a loophole of C: we could write any poem that begins with the letters “main” by putting the rest in quotation marks as a literal string.)
  5. Bonus points for using as many keywords as possible, and for having as many verses as possible whose beginning coincides with the beginning of a statement or whose ending coincides with the ending of a statement.

Finally here’s the limerick: appropriately for this blog, it’s about bugs. As a C program (compiles with one warning due to the missing include for printf):

And formatted as a limerick:

“Define what bugs are once and for all,”
type def interrogates an eight ball.
   ”Main double has meaning,
    long painfully seeking!”
While one short ran: print, fail, and return.

I made some concessions on the meter, but I guess that’s acceptable: call it an anti-limerick if you prefer. As a C program, it’s not very interesting, but at least it prints something incorrect before returning, just like the buggy program described in the limerick 🙂

If you have other examples of poetical programming please leave a comment!