So what is a “pure programming language” anyway?

This is one of those questions that is inflammatory enough to start a flame war, so it should make a good title for a blog post. But instead of feeding the flames I want to pour some water or whatever appropriate flame-retardant material is appropriate. This post is largely related to a recent post of my friend Neel Krishnaswami who outrageously called Agda an impure functional programming language — to add to the confusion “pure” and “functional” are often unhelpfully conflated.

So what is “pure”? Languages which are not pure are commonly said to have “side-effects”. So conversely we could ask, what are “side-effects”? Changing the values of memory locations is perhaps universally considered a side-effect, and, more generally, interacting with the “outside” world is pretty much considered a side-effect (see Haskell’s IO Monad). But there are other kinds of programming language features which do not involve, strictly speaking, an interaction with the outside world but are largely seen as effects, perhaps most importantly continuations. Finally, there are yet another kind of programming language features which researchers disagree on whether they are effects or not, for example non-determinism but especially non-termination. In his post on Agda Neel has pointed out yet another feature which can be considered an effect, the nominal generativity of type declarations in languages such as Agda, also encountered in other languages.

This debate on what is and what is not an effect, and implicitly what is and what is not a pure language, is not entirely silly because the badge of purity is wore with pride by languages such as Haskell and Agda, so attributing effects to them is an insult, really. As an insult to Haskell, someone once said that it is not a pure language but it operates in the “blessed monad” of non-termination and uncaught pattern-matching failures — which is actually 100% true.

The problem is that this endless is-it-an-effect-or-not controversy doesn’t bring us closer to an understanding of what is pure and what is effectful, so I will propose a definition of pure vs. impure which is very general, in fact it transcends lambda calculus or even programming languages in general. It is a systemic proposal of purity vs. impurity — in the process I also propose to discard the term “effects” which is silly, really, because any computation has some effect — as someone pointed out, any computation will get your computer slightly warmer. And if you think this is silly, let me tell you that there are side-channel attacks based on detecting temperature changes in processors. My proposal is also in line with my greater project to give semantics to systems rather than just syntax.

So here it goes: suppose you design a system, which as all well-designed systems, has a certain equational theory. If your system is “the lambda calculus” then the key equations are alpha, beta and eta. These equations are of course essential in reasoning about your system. Now suppose that you realise that your system must be connected to other systems which do not observe the same equational theory — you have no choice. As I have argued elsewhere, any programming language to be useful requires a foreign function interface which allows it to interface with code written in a different, unknown, language. Your original system is the “pure” system and the external systems are the “impure” parts.

Programming languages (the civilised ones) are such extensions of the simply-typed lambda calculus. Add a fix-point operator, arithmetic and an if statement and you get PCF, but you also get non-termination and a previously-absent distinction between calling conventions (call-by-name and call-by-value have different equational theories). Is PCF pure? Relative to the simply-typed lambda calculus it is not, since it breaks its equational theory, but if you give it its own equational theory then it is pure. To PCF add local variables and you get Algol which, again, breaks PCF’s equational theory. Yet Algol is a very civilised language, with a very good equational theory.

So “pure” is an entirely relative term which we can meaningfully use to describe a system as equationally understood, and “impure” to denote an extension which doesn’t obey the same equations, making reasoning difficult or impossible. So I would say no language is “pure” but we have pure and impure extensions to any given reference language, which is always pure. Another way to think about impure extensions is simply as extrinsics or foreign functions.

An important role that type systems could (and should) serve is to delimit “pure” and “impure” code in the same program. By “impure” code I mean code that is observably influenced by impure features. As far as I know the only comprehensive attempt at this is John Reynolds’s Syntactic Control of Interference type system, which works on top of Algol syntax, to delimit “pure” code (i.e. without assignments) which he calls “passive” from “impure” code (i.e. which relies on memory locations) which he calls “active”. I find this a much more reasonable approach at dealing with “effects” by simply isolating them, rather than trying to give types to these effects as is the standard “monadic” approach. As I argued elsewhere this is simply too ambitious and it is unlikely to work when we interface with arbitrary systems, just because arbitrary systems are too messy and we have no right to expect them to be uniform enough to be typeable.

About Dan Ghica

Reader in Semantics of Programming Languages // University of Birmingham // //
This entry was posted in programming languages, system level semantics. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>