I’m very glad to host a guest post by Nadia Polikarpova, a recent PhD graduate of our group on her way to a post-doctoral position at MIT. Nadia is definitely into bug counting, even though her ultimate ambitions are more of exterminating them (for the casual reader: we’re talking about software bugs — no animals were harmed in the making of her research). Thanks to this guest post of hers, the blog finally goes back to its core interests!

      — Carlo A.

Dimensions in program verification

Model checking, abstract interpretation, program logics, symbolic execution — if you are new to program verification, the variety of techniques it offers might be intimidating. Each area seems to start from a different set of basic principles and use its own terminology, which makes classification and comparison difficult.

To make matters worse, verification experts are often partisans of their school of thought: for an example, see a slide from a recent talk by Patrick Cousot. Abstract interpretation devotees may look down on other approaches, such as auto-active verification, that do not easily scale to systems with million lines of code. Auto-active verification researches, like myself, may make fun of abstract interpretation folks for being stuck with array-out-of-bounds errors, while we can prove full functional correctness (which, in turn, is uninteresting to model-checking experts, who care about temporal properties). The acolytes of the dark art of interactive theorem proving (such as Coq and Isabelle) don’t take any other approach seriously unless it comes with an air-tight proof of soundness. Meanwhile testing people consider all sound techniques pure intellectual exercise, since real programmers are not going to put up with false positives…

How do you make sense of all this and learn to appreciate relative merits and fundamental limitations of different techniques? Here is my short guide to program verification, which I worked out after studying the area for nearly six years.

There are three dimensions to a program verification technique: soundness, flexibility, and automation. Soundness (level of assurance) determines how certain we are that a program is correct if verification succeeds. Flexibility (expressiveness) measures the diversity of programs and correctness properties that the approach can handle. Automation accounts for the amount of human effort — beyond writing the program — that is needed to carry out verification. As is often the case, you can pick any two, but you cannot have all three.

The verification cube

For example, abstract interpretation is sound and fully automatic, but the kind of properties and programs it can check effectively is limited by a predefined set of abstract domains encoded in the analyzer. Bounded model checking, on the other hand, is flexible and fairly automatic — users still have to formalize requirements — but in general cannot provide a correctness guarantee for all inputs (note that we can also trade flexibility for soundness by limiting ourselves to analyzing programs with a finite state space). Deductive verification techniques are flexible and sound, but require a fair amount of manual guidance; among deductive tools, interactive proof assistants are pushing the first two dimensions to the extreme (with higher-order specification languages and tiny trusted cores), while auto-active tools are prepared to sacrifice a bit of both to retain some automation. Unbounded (CEGAR-based) model checking is sound, automatic, and inflexible; symbolic execution is flexible, automatic, and unsound.

I’m not aware of any technique that doesn’t lie close to one of those three corners of the “verification cube”. Why is this the case? Intuitively, the reason is simple: any program analysis can only consider a finite number of states, so you get the choice between only analyzing programs with finitely many states (and lose flexibility), considering only a subset of all states (and lose soundness), or over-approximating an infinite set of states by a finite set of abstract states. Abstraction, in turn, can be performed manually (this is what loop invariants are for!) or automatically (like in abstract interpretation); the former kills automation, while the latter is limited to a restricted set of properties, which kills flexibility. Naturally, acing all three dimensions has always been the holy grail of program verification. This is where all the approaches are converging, albeit taking very different routes, with each technique improving along its weak dimension little by little. Static analyzers are enriched with new abstract domains; model checkers can handle ever larger state spaces; auto-active tools compete in economy of required user annotations; proof assistants boast more and more sophisticated automated tactics. They are all working towards the same goal — still far away from the finish line — and they still differ vastly in terms of strengths and weaknesses, application domains, and user experience.

So, next time you hear someone say that their program verification technique is better that all the rest, just remember the cube.

      — Nadia

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url=""> 

required

This site uses Akismet to reduce spam. Learn how your comment data is processed.