Why is lambda calculus named after that specific Greek letter? Do not believe the rumours!

A common myth in theoretical computer science is that the ‘λ’ in λ-calculus comes from some kind of typographical error, where the intended notation was a circumflex accent ŷ.M, which became a caret ^y.M then finally a lambda λy.M. At least on the math stack-exchange this is the most popular explanation [link].

We recently had λ-calculus pioneer Prof. Dana Scott give a series of lectures in our School. He was a PhD student of Alonzo Church. In the introduction of one of his lectures he told us the story of the Greek letters used in the λ-calculus: the α of α-conversion, the β of β-expansion, the η of η-expansion and of course λ itself.

He says that when Church was asked what the meaning of the λ was, he just replied “Eeny, meeny, miny, moe.“, which can only mean one thing. It was a random, meaningless choice. Prof. Scott claimed that the typographical origin myth was mainly propagated by Henk Barendregt and is just pure whimsy. He asked us to stop perpetrating this silly story.

I am attaching the link to the YouTube video with apologies for the low quality.

Posted in General, programming languages, teaching | Leave a comment

So what is a “pure programming language” anyway?

This is one of those questions that is inflammatory enough to start a flame war, so it should make a good title for a blog post. But instead of feeding the flames I want to pour some water or whatever appropriate flame-retardant material is appropriate. This post is largely related to a recent post of my friend Neel Krishnaswami who outrageously called Agda an impure functional programming language — to add to the confusion “pure” and “functional” are often unhelpfully conflated.

So what is “pure”? Languages which are not pure are commonly said to have “side-effects”. So conversely we could ask, what are “side-effects”? Changing the values of memory locations is perhaps universally considered a side-effect, and, more generally, interacting with the “outside” world is pretty much considered a side-effect (see Haskell’s IO Monad). But there are other kinds of programming language features which do not involve, strictly speaking, an interaction with the outside world but are largely seen as effects, perhaps most importantly continuations. Finally, there are yet another kind of programming language features which researchers disagree on whether they are effects or not, for example non-determinism but especially non-termination. In his post on Agda Neel has pointed out yet another feature which can be considered an effect, the nominal generativity of type declarations in languages such as Agda, also encountered in other languages.

This debate on what is and what is not an effect, and implicitly what is and what is not a pure language, is not entirely silly because the badge of purity is wore with pride by languages such as Haskell and Agda, so attributing effects to them is an insult, really. As an insult to Haskell, someone once said that it is not a pure language but it operates in the “blessed monad” of non-termination and uncaught pattern-matching failures — which is actually 100% true.

The problem is that this endless is-it-an-effect-or-not controversy doesn’t bring us closer to an understanding of what is pure and what is effectful, so I will propose a definition of pure vs. impure which is very general, in fact it transcends lambda calculus or even programming languages in general. It is a systemic proposal of purity vs. impurity — in the process I also propose to discard the term “effects” which is silly, really, because any computation has some effect — as someone pointed out, any computation will get your computer slightly warmer. And if you think this is silly, let me tell you that there are side-channel attacks based on detecting temperature changes in processors. My proposal is also in line with my greater project to give semantics to systems rather than just syntax.

So here it goes: suppose you design a system, which as all well-designed systems, has a certain equational theory. If your system is “the lambda calculus” then the key equations are alpha, beta and eta. These equations are of course essential in reasoning about your system. Now suppose that you realise that your system must be connected to other systems which do not observe the same equational theory — you have no choice. As I have argued elsewhere, any programming language to be useful requires a foreign function interface which allows it to interface with code written in a different, unknown, language. Your original system is the “pure” system and the external systems are the “impure” parts.

Programming languages (the civilised ones) are such extensions of the simply-typed lambda calculus. Add a fix-point operator, arithmetic and an if statement and you get PCF, but you also get non-termination and a previously-absent distinction between calling conventions (call-by-name and call-by-value have different equational theories). Is PCF pure? Relative to the simply-typed lambda calculus it is not, since it breaks its equational theory, but if you give it its own equational theory then it is pure. To PCF add local variables and you get Algol which, again, breaks PCF’s equational theory. Yet Algol is a very civilised language, with a very good equational theory.

So “pure” is an entirely relative term which we can meaningfully use to describe a system as equationally understood, and “impure” to denote an extension which doesn’t obey the same equations, making reasoning difficult or impossible. So I would say no language is “pure” but we have pure and impure extensions to any given reference language, which is always pure. Another way to think about impure extensions is simply as extrinsics or foreign functions.

An important role that type systems could (and should) serve is to delimit “pure” and “impure” code in the same program. By “impure” code I mean code that is observably influenced by impure features. As far as I know the only comprehensive attempt at this is John Reynolds’s Syntactic Control of Interference type system, which works on top of Algol syntax, to delimit “pure” code (i.e. without assignments) which he calls “passive” from “impure” code (i.e. which relies on memory locations) which he calls “active”. I find this a much more reasonable approach at dealing with “effects” by simply isolating them, rather than trying to give types to these effects as is the standard “monadic” approach. As I argued elsewhere this is simply too ambitious and it is unlikely to work when we interface with arbitrary systems, just because arbitrary systems are too messy and we have no right to expect them to be uniform enough to be typeable.

Posted in programming languages, system level semantics | Leave a comment

Learn OCaml!

I have been teaching OCaml to first year undergraduate students for a few years now. Why not share it with the world? To access the course follow the link at the bottom of the page.

Here is what you will learn:

  1. The pure fragment of OCaml: functions, polymorphism, patterns, variants, lists, (tail) recursion, map and fold
  2. Basic algorithms and data types: list algorithms, searching, sorting, queues
  3. Basic proofs of program correctness using structural induction, applied to Peano numbers and lists and supported by our home-brewed proof assistant Yup
  4. Some general theoretical concepts: isomorphisms, the Curry-Howard correspondence

On the web site you will find: lecture notes, lecture slides, lecture videos and over 30 programming assignments. The assignments come with automatic test harnesses so you can test your answers for correctness.

The course web site is accessible from https://birmingham.instructure.com/courses/15627.

Posted in General | Leave a comment

Compilers as lazy denotational interpreters

This year I had the pleasure to teach, for the first time, a course on compilers. I have been doing research on compilers for a while now but I suppose I never really asked myself seriously “what is a compiler?”. I could recognize a compiler if I saw one, but this is not good enough for teaching. One paper that gives an excellent conceptual explanation is Ager et. al.’s [PDF] but it might be a little hard to read if you are not familiar with abstract machines (Krivine, SECD) and with Standard ML syntax. In this post I will follow the same methodology, but as it applies to a very simple expression language which has only arithmetic, input, output and variable binding and a very simple abstract machine. This will make it, I hope, clearer what are an evaluator, interpreter and compiler, and how they are connected to the key notions of abstract and virtual machine.

I should specify that this is not a recipe for writing efficient or realistic compilers. This is an exercise in understanding what a compiler is. The second caveat is that the perspective presented here is quite common for functional or other higher-level programming languages but not quite as relevant for C, which is close enough to the concrete machine that an abstract machine is not really necessary. (Unless we consider LLVM-based compilation as using virtual machine.)

The code used in this post is OCaml. Why? Because OCaml is the best programming language bar none for implementing compilers. If you don’t believe me feel free to translate or reimplement my code in your favourite language and see how it goes.

The object language

Let us consider this simple expression language:

e ::= n | x | e ⊙ e | let x = e in e | print e | read

where n stands for numeric constants, x for variable identifiers, and ⊙ for arbitrary arithmetic operators. We shall not deal with parsing at all as it is a totally separate topic, but we start from the abstract syntax tree, specified by this OCaml type:

type expression =
 | Const of int
 | Var of string
 | Binary of string * expression * expression
 | Let of string * expression * expression
 | Output of expression
 | Input

For example, the following expression

let z = (let x = read in let y = read in x + y) in print z

has the syntax tree:

Let ("z", 
    Let ("x", Input, (Let ("y", Input, 
        (Binary ("+", Var "x", Var "y"))))), 
Output (Var "z"))

The evaluator

The first step in the formal definition of a programming language is giving its evaluation rules in an effective way. There are many ways to do this and they fall under the general topic of operational semantics, which can have several flavours (e.g. big stepsmall step, etc.) of which you can learn more in Andy Pitts’s excellent tutorial [link]. Without too much further theorising here is an obvious evaluator for our expression language:

let rec evaluate = function
 | Const n -> n
 | Binary (op, e, e') ->
   let v = evaluate e in
   let v' = evaluate e' in
   op_of_opcode op v v'
 | Let (x, e, e') ->
   let v = evaluate e in 
   let e' = substitute x v e' in
   evaluate e'
 | Var x -> failwith ("undefined: " ^ x)
 | Output e ->
   let v = evaluate e in
   print_int v; print_newline ();
   v
 | Input -> read_int ()

Let’s discuss the code:

  • A constant expression Const n is evaluated to just n.
  • An operator Binary evaluates the two arguments then applies the operator. The helper function op_of_opcode decodes the operator symbol into an actual (OCaml) operation:
    let op_of_opcode = function
     | "+" -> ( + ) 
     | "*" -> ( * )
     | ";" -> (fun x y -> ignore x; y)
     | (* and so on *)
  • The Let binder evaluates the definition of x then replaces x with it in the body expression e’ using a helper function to do the substitution:
    let rec substitute (x : string) (v : int) = function
     | Const n -> Const n
     | Binary (op, e1, e2) -> Binary (op, substitute x v e1, substitute x v e2)
     | Let (y, e1, e2) when x = y -> Let (y, substitute x v e1, e2)
     | Let (y, e1, e2) -> Let (y, substitute x v e1, substitute x v e2)
     | Var y when x = y -> Const v
     | Var y -> Var y
     | Output e -> Output (substitute x v e)
     | Input -> Input

    Note that substituting in a Let expression must ensure that the defined variable is never substituted as it would violate the obvious scoping rules of a local definition, hence the special case.

  • As a consequence, all Variables in a program are substituted at the point of definition se we shouldn’t encounter them. If we do, they must have been undefined so we raise a failure.
  • Output evaluates its argument, prints it and returns it as a result. Since we are in an expression language we always need to produce some kind of integer result and this seems sensible.
  • Input reads an integer and returns it.

This is a reasonable definition of our language’s semantics but it is rather impractical for two reasons:

  1. In order for this evaluator to execute the program it needs to modify the abstract syntax tree during the evaluation. This is clearly a bad idea for several reasons:
    1. Substitution or any other kind of AST manipulation will be a complicated operation which is unlikely to be easily mapped into smaller, machine-friendly operations.
    2. Substitution or any other kind of AST manipulation will always be inefficient in the extreme.
    3. Substitution is incompatible with using libraries, foreign-function interfaces, or runtime services.
  2. In order for the evaluator to execute a program it relies on the full power of the meta-language, in our case OCaml. Why is this a problem? We know that eventually we want our program to be executed by some machine, and it seems like a bad idea to assume that machine will be able to run OCaml, which is a big and complicated language requiring a sophisticated run-time environment. We need something simpler.

A compositional evaluator

We first remedy the problem of modifying the AST by giving a compositional evaluator in which the meaning of a term is defined from the meanings of its subterms. This is the key principle of denotational semantics which goes back to Scott and Strachey for programming languages and to Tarski for logics. To implement it we will add an environment to our evaluator, which is a dictionary mapping variable identifiers to values:

let rec cevaluate env = function
 | Const n -> n
 | Binary (op, e, e') ->
   let n = cevaluate env e in
   let n' = cevaluate env e' in
   op_of_opcode op n n'
 | Let (x, e, e') ->
   let env = (x, cevaluate env e) :: env in 
   cevaluate env e'
 | Var x -> lookup x env
 | Output e ->
   let v = cevaluate env e in
   print_int v; print_newline ();
   v
 | Input -> read_int ()

The helper lookup function is:

let rec lookup x = function
 | [] -> failwith ("undefined: " ^ x)
 | (x', n) :: env when x = x' -> n
 | _ :: env -> lookup x env

Except for the treatment of variables, which are looked up rather than substituted, the two evaluators are identical. A mathematically-minded reader could prove (e.g. in Agda or Coq) some soundness results, e.g.

∀M.∀n. evaluate M = n ⇒ cevaluate [] M = n

Note that this is not entirely a trivial exercise because of the variable scoping rules! Also, keeping track of effects (input and output) presents some further formal challenges.

Abstract machine and interpretation

The second step in our process is to try to write an evaluator restrained to particular operations. What operations? That depends on what our ultimate target architecture might be. We don’t necessarily want to use at this stage a genuine concrete machine specification (e.g. x86) because it would be extremely complicated, albeit doable — barely. What we want is an intermediate layer of abstraction. What we want is a set of operations that we believe can be refined efficiently into concrete machine operations. What we want is an abstract machine.

For functional languages the standard abstract machines are Krivine for call-by-name evaluation and SECD for call-by-value, but other abstract machines exist, for example the Token abstract machine used to interpret linear logic [link]. Designing abstract machines is a fascinating and fun exercise, a beautiful mix of engineering and mathematics. They truly are the nexus between language and machine which make compilation of higher-level languages possible. 

For purpose of illustration I will use a totally made-up and very simple abstract machine, the PSM (pure stack machine). It is a stack equipped with some operations:

  • push a  new element onto the stack
  • pop the top of the stack
  • peek at the n-th element of the stack
  • apply an operation to the top two elements of the stack and leave the result instead
  • swap the top two elements of the stack
  • ouput the top of the stack
  • input at the top of the stack.
The OCaml implementation is this:
let stack = ref []
let push x = stack := (x :: !stack)
let pop () = stack := List.tl !stack
let peek n = stack := (List.nth !stack n :: !stack)
let apply op = stack := (function x :: x' :: xs -> (op x x') :: xs) !stack
let swap () = stack := (function x :: x' :: xs -> x' :: x :: xs) !stack
let output () = print_int (List.hd !stack); print_newline ()
let input () = stack := (read_int () :: !stack)

When we implement the abstract machine we purposefully move into the imperative world because machines are naturally imperative. Machines have a state which is changed by its operations. In our case this state is simply a global stack. This presentation can be made even nicer, at the cost of some sophistication, by packaging the stack as a functor which would hide all the implementation details from the user. I leave this as an exercise for the sake of a more elementary description.

We are now in a position to write an interpreter for this abstract machine, i.e. a compositional evaluator which will only use the abstract machine operations:

let rec interpret symt = function 
 | Const n -> push n
 | Binary (op, e, e') -> 
   interpret symt e;
   interpret (inc symt) e';
   apply (op_of_opcode op)
 | Let (x, e, e') -> 
   interpret symt e;
   interpret ((x, 0) :: (inc symt)) e';
   swap ();
   pop ()
 | Var x -> 
   let n = lookup x symt in
   peek n
 | Output e ->
   interpret symt e;
   output ()
 | Input ->
   input ()

Lets discuss the code:

  • A constant expression is interpreted by pushing the constant onto the stack.
  • A binary expression will interpret the operands and, assuming that the two values are left onto the top of the stack, will apply the operation.
  • A let expression will interpret the definition e, which will be left at the top of the stack then it will interpret the body e’ remembering the offset of the value for x. We want the value of the body only to remain onto the stack se we swap the value of x, which is on the second position at the top, then we pop it out.
  • Variables are looked up at their known stack offset.
  • Input and output are obvious.

The interpreter uses an extra argument, a symbol table mapping variable identifiers into stack offset, with helper function

let inc = List.map (fun (x, n) -> (x, n+1))

All the “execution” is now done in terms of abstract machine operations, even though we are using a few other operations for managing the symbol table. That is not a serious problem. In fact this mapping between variable identifiers and stack indices could be considered part of the parsing process.

At this point we have our interpreter!

The mathematically minded reader could now think about proving some soundness results about this interpreter, which even for a small language such as this one is quite a (formal, if not mathematical) effort:

∀M.∀n. evaluate M = n ⇒ run M = [n]

where

let run m = stack := []; interpret [] m; !stack

which means running the interpreter leaves a singleton stack containing the evaluated value.

Compilation, virtual machines and native code generation

Once we have the interpreter, the compiler is surprisingly easy to implement. The interpreter runs the abstract machine instructions as it process the program. A compiler will simply collect, in a lazy way, the instructions (and save them to a file)!

let rec compile symt = function 
 | Const n -> "push " ^ (string_of_int n) ^ "\n"
 | Binary (op, e, e') -> 
   (compile symt e) ^
   (compile (inc symt) e') ^
   "apply " ^ op ^ "\n"
 | Let (x, e, e') -> 
   (compile symt e) ^ 
   (compile ((x, 0) :: (inc symt)) e') ^
   "swap\n" ^
   "pop\n" 
 | Var x -> 
   let n = lookup x symt in
   "peek " ^ (string_of_int n) ^ "\n"
 | Output e ->
   (compile symt e) ^
   "output\n"
 | Input ->
   "input\n"

The code is strikingly similar. We could (almost) textually replace sequential composition with concatenation and the call to the function implementing the abstract machine instructions to the name of the instructions themselves. I have renamed both functions to f and placed the interpreter and the compiler side-by-side:

For illustration let’s look at one test expression, m = let x = 6 in let y = 3 in x * y:

# m;;
- : expression =
Let ("x", Const 6, 
Let ("y", Const 3, 
Binary ("*", Var "y", Var "x")))
# evaluate m;;
- : int = 18
# cevaluate [] m;;
- : int = 18
# run m;;
- : int list = [18]
# compile [] m |> print_string;;
push 6
push 3
peek 0
peek 2
apply *
swap
pop
swap
pop
- : unit = ()

The very last step is processing the output of the compiler, the bytecode. Here we have two choices:

  1. The bytecode can be interpreted by what is usually called a virtual machine, which runs the bytecode on some concrete machine in the same way that our interpreter ran the original program on the abstract machine. Note that the bytecode is just a simpler programming language, but the principle remains the same.
  2. The bytecode can be itself compiled just by tracing the execution of the virtual machine, in the same way in which we traced the execution of the abstract machine to derive the object language compiler. Again, the principle is the same.

Conclusion

Compiler development is still, unfortunately, somewhat of a black art. I hope this post was informative, and especially I hope that the methodological importance of the abstract/virtual machine came through. This is indeed the key concept that makes compilers work. A good abstract/virtual machine must be such that the object language can be interpreted/compiled by/to it efficiently and its own instructions, the bytecode, must be interpreted/compiled efficiently by/to the concrete hardware platform targeted.

I also wanted to illustrate the role that operational semantics (the non-compositional evaluator) and denotational semantics (the compositional evaluator) can play in rational compiler development, and how they can be gradually refined into interpreters and compilers. I hope that made sense!

Posted in teaching | Leave a comment

A short fable of software engineering vs. regular engineering

An acquaintance recently took up an assembly line job with McLaren automotive — the people who make F1 and luxury sports cars. Before this gig, he used to work from his garage doing custom body and paint work, and developed quite a reputation for skill and quality. Working on the assembly line pays better but he is not very pleased with the monotony of it: all he does all day long is fit one small body panel on the chassis. The panel is part of the front wheel arch and it costs about £7,000.

One day a panel didn’t fit properly. It was off by a few millimeters, nothing that my guy could not fix in a few minutes of careful sawing and sanding. To his surprise what happened was very different. That chassis was taken to one side and work on it was suspended for days while a whole bunch of engineers tried to figure out not how to fix the problem but why the problem happened in the first place. My friend, the assembly line worker, put it down to the stupidity of engineers and the bureaucracy of management. In the end he was not informed how the problem was solved and what the investigation uncovered.

My friend was quite obviously an accomplished hacker. His hacker mentality would have fared quite well in software production, where debugging is part of the normal workflow, but was out of place in a conventional engineering setting. His story made me reflect on the huge gap between car and software engineering.

I wonder how many years until software production will reach the level of maturity where we don’t just fix bugs but, when bugs happen, we stop in our tracks, shocked that bugs happened at all in the first place, and only proceed after we had a serious look at our processes. I wonder how many years until our software will look like this

rather than like this

Posted in General | 2 Comments

Computability: The Greatest Law of Physics

Wigner famously declared mathematics to be “unreasonable effective” in the natural sciences. It is indeed something magical about the fact that if we have a mathematical theory and some mathematically representable aspect of reality then we can carry out abstract, conceptual, mathematical reasoning and formulate some new facts about reality which will turn out to be true. Even more amazing, such theoretical reasoning can lead to genuine discoveries about reality — new observations. According to Lakatos this is the most important thing about a scientific theory. Indeed Eddington would have never ventured to the island of Principe to take measurements of the Hyades during a solar eclipse unless Relativity predicted that he would see something interesting. Haley would have never predicted the return of the comet that now bears his name.

The validity of a scientific theory rests on the correspondence between a mathematical model and measurements made of the physical world. The same relationship between an abstract, essentially mathematical model, and the physical world, usually a special-purpose device, lies also at the heart of computation. This relation, called the implementation relation by Chalmers, was first introduced in the context of trying to figure out what things can be said to compute. For most Computer Scientists this is not particularly exciting or interesting because of our Engineering-minded approach. We build computers so that they satisfy an implementation relation. In fact we rarely distinguish between “abstract” mathematical computation and “concrete” physical computation. We casually deal in “abstract machines”, such as the Turing Machine, without worrying too much whether we are talking about its mathematical definition or an actual physical instance. We just know that the latter can be easily built so that it implements the former.

However for those interested in Natural Computing, the implementation relation is far more interesting because an implementation relation is discovered rather than constructed. Some natural occurring phenomenon (the behaviour of slime mold, for example) can be used to calculate with because it implements a complex computation. Natural Computing works because such phenomena, implementing complex computations, exist. In some trivial sense any physical system can be used to compute facts about its own mathematical model, although that is hardly interesting. Natural Computing is interesting because particular physical systems can compute fast solutions to problems of general interest, which can actually encode the evolutions of other physical systems that evolve much slower or that are complicated or expensive to build.

What is truly fascinating is that no physical system can implement an abstract model which is mathematically uncomputable. For me this is indeed the greatest law of physics in terms of its generality, as it relates any abstract mathematical model to any physical system. The non-implementability of uncomputable models is an impossibility result of a staggering magnitude. And it is truly a falsifiable (in the logical, Popperian sense) scientific law. Scientists, most notably Penrose, suggest that some neuronal processes in the brain may be physical yet uncomputable, but the evidence for this is very sketchy. More generally, philosophers propose that the brain is an instance of a physical yet non-computable process, mainly based on interpretations of Goedel’s incompleteness theorem which I find rather naive. But at least this shows that we can logically conceive of such systems.

In fact I would say that not just computability in the sense of Church-Turing, but the whole of computability theory, including complexity analysis, is essentially formulating extremely general laws of physics between whole classes of formal models and any physical system which might implement them. Computer Science is truly deserving of its name!

Posted in armchair philosophy | 1 Comment

A simple proof checker for teaching

For the last few years I have been teaching functional programming to first year students. One of the main benefits of using functional programming is that you can prove simple, but non-trivial, properties of simple programs, for example, that the empty list is right-neutral for append:

append xs [] = xs

The practical problem with teaching a large group of students is that you end up with a large pile of assignments to check, which means a large number of teaching assistants of various degrees of competence in reading and understanding proofs. Note that for programming assignments we didn’t have this problem because we could use automated random unit testing. Pedagogically, having inconsistent marking standards is not only irritating to students but potentially devastating in the final exam if TAs habitually accept wrong proofs as correct in the assignments. The final exam is marked by me, and I may not be as blind to mistakes.

One possibility is to use proof assistants such as Coq, Agda or Isabelle, but they are a bit too intricate for first year students who are just learning the basics of proofs, logic, recursion and induction. What we want is a proof checker tailored to the needs of the beginning student. These were our design principles:

  1. The syntax of production-level proof assistants aims for concision, elegance and expressiveness. Our aim was a syntax that matched as much as possible the informal mathematical vernacular that students are used to from pen-and-paper proofs.
  2. Production-level proof assistants have various levels of interaction and automation which are not only useless to the beginning student but actively get in the way of learning proof techniques because they may encourage a trial-and-error approach to proofs. Also, very simple fragments such as pure propositional may be entirely automated in such proof assistants.
  3. Even production-level proof assistants can be annoyingly clumsy when it comes to writing equational proofs. However, most of the proofs that we ask our students are equational. Doing congruence reasoning comes very easy to students, but doing it in proof assistants formally is a pain. This aspect is something we wanted automated.
  4. We also need a generally high level of user-friendliness.

The source code, documentation and a range of examples from the simplest to moderately complicated are available on Github. Here I am just going to give the proof for the very simple example above. If you find it appealing I recommend you read the full documentation on-line.

Signatures:
append : 'a list -> 'a list -> 'a list ;
Definitions:
[append nil]: forall xs:'a list.
              append [] xs = xs : 'a list ;
[append ys]:  forall xs:'a list. forall ys:'a list. forall y:'a.
              append (y::ys) xs = y :: append ys xs : 'a list ;
Theorem [append neutral]:
Statement: forall xs:'a list.
           append xs [] = xs :'a list 
Proof: by induction on list :
case []:
 we know [0]: append [] [] = [] :'a list
              because [append nil] with ([]) .
 by [0]
case (hd :: tl):
 [IH]: append tl [] = tl : 'a list .
 we know [1]: append (hd :: tl) [] = hd :: (append tl []) :'a list
              because [append ys] with ([], tl, hd) .
 we know [2]: append (hd :: tl) [] = hd :: tl : 'a list
              because equality on ([1], [IH]) .
 by [2]
QED.

A comparison with the Agda equivalent is perhaps a good illustration of our trading off succinctness for readability. Note especially the need for an explicit congruence principle in the Agda proof. The list data type and append (++) are pre-defined. (Coq is not really comparable because it is tactics-based.)

cong-∷ : {A : Set} → (x : A) → ∀ xs ys → xs ≡ ys → x ∷ xs ≡ x ∷ ys
cong-∷ x xs .xs refl = refl
unitr[] : {A : Set} → (xs : List A) → xs ++ [] ≡ xs
unitr[] [] = refl
unitr[] (x ∷ xs) = Goal where
 p : xs ++ [] ≡ xs
 p = unitr[] xs 
 Goal : x ∷ (xs ++ []) ≡ x ∷ xs
 Goal = cong-∷ x (xs ++ []) xs p

The author of the tool is my intern student Yu-Yang Lin, with essential supervision and help from Neel Krishnaswami. Funding for the project was provided by the College of Engineering and Physical Science, University of Birmingham.

Your feedback is most welcome! Even more welcome are more examples, bug reports and bug fixes. If you are looking to make a more substantial contribution, parsing errors could be vastly improved.

Posted in General | Leave a comment

What things compute?

I suppose we can agree that computers compute. We know that they do that because we make them do that. We have a program, on paper or in our head, and we have certain facilities to load that program into the computer memory. This is called implementing the program. Then, when the program runs we can compare the actual behaviour of the physical computer with the expected behaviour defined by the abstract program. Note that this expected behaviour we can determine “by hand” via manual calculations, at least in principle. Programs in the abstract are mathematical objects, programs in the concrete are behaviours of physical devices we call computers. I suspect trained computer scientists will find very little controversial (or exciting) in the clarification above. This is the so-called “implementation theory” of computing, advocated among others by the influential cog-sci philosopher David Chalmers.

This definition is, I would say, sound in the sense that things and behaviours that are described by it are always computation. But is it complete? Are there notions of computation that fall outside of it? If we say “no” then we should expand the definition somehow — lets get back to that. But if we say “yes”, then what are the consequences? Implementation theory assumes that the abstract program defines the concrete program, which is an instance of it. A computer is then a device that accelerates computation by some physical means. The crucial assumption is that the abstract program, as a mathematical object, requires design and therefore intent. Accordingly, only artificial things can compute. We cannot say, for example, that a brain computes. There seems to be something that resembles “information processing” or “computation” in the way a brain behaves, but we don’t actually have the anchoring reference that is the abstract program to compare it with and say that it is an implementation of it. This may seem like nit-picking but it’s a genuine problem.

Outside artificial things designed to compute, i.e. computers, what else computes? What is computation, if anything, as a natural phenomenon? I haven’t seen a good answer so far. I have seen however two classes of bad answers.

The first bad answer is something that has to do with a “system” having “inputs” and “outputs”. This is bad on two accounts — without even going into the deeper philosophical problems of causation as a scientific concept. The first problem is that any natural system has many things that can be considered as “inputs” and “output”. For example if we throw a rock, and we take the initial velocity vector as an input and its space-time coordinates as an output, we can say that it is a system that computes a parabola tangent to that vector. But there is necessary intent in the choice of what measurements are considered as inputs and output. Any object has a trajectory in a highly-dimensional space of properties, and any projection of that trajectory on a lower-dimensional space of measurements defines a different computation. There is no one special computation the system performs. If we take intent out of the equation we have an incomprehensibly large state space and an incomprehensibly complicated trajectory through it, tracking everything down to atomic-level configurations. The second problem is that such broad definitions can quickly degenerate into pancomputationalism: everything computes. And, if everything computes, computation is an uninteresting concept. The brain computes, but so does a piece of wood. Computation is simply another word for behaviour subject to (natural) laws. The Universe is a big computer, computing its own future — grand but somewhat vacuous.

The second bad answer is something that has to do with “processing symbolic information”. As in the case of selecting particular measurements of a system as inputs and outputs, the notion of “symbol” requires not only intent but also, outside of a formal setting, very special powers of of abstraction — think about how many concrete shapes a symbol such as the letter ‘a’ can take. We know the human brain can process symbolic information, but we are far from certain any other kinds of brain can do that, so this definition rules them out as computing things. Also, the process of defining and recognising symbols itself is a prerequisite of computation, but not part of computation itself — which strikes me as odd. Whereas the input-output attempted definition is too broad, this is both too precise and too reliant on concepts that require further clarification. It is tantamount to saying “computation is what the human brain can do”. It does not explain the brain using computation, it uses the brain as an explanation for computation.

In conclusion I want to shatter our presumed consensus: computers do not compute. Not even computers. We compute. Computers just help us along. Implementation theory gives a lucid definition of computation: a relation between a pre-existing mathematical concept and the behaviour of a physical thing. The pre-existence of the mathematical concept is essential. Trying to fit a mathematical model on a pre-existing physical behaviour has a different name: scientific discovery. Where does this live computationalismthe theory that, broadly speaking, the brain is a computer and the mind a computation? I think it exposes its vacuity. A “computational theory of the mind” is along the same lines as having “a computational theory of the weather”, in which the earth is seen as a computer and the weather what it computes. It’s a metaphor that gives the illusion of understanding, without telling us whether tomorrow will rain or not — akin to using élan vital to “explain” life or scansoriality to “explain” propensity to climbing.

But perhaps I am being harsh. “Computation” is vacuous only when used as an explanation, as a first-class scientific concept. But it is not vacuous as an abstract concept or as a characterisation of a style of mathematical model or as a paradigm. The parallel between energy/work as used in physics and information/computation as used in cognitive science is quite striking. In his Lectures on Physics, Feynman has this to say about energy and energy conservation principles [PDF]. I think similar considerations apply to information and computation:

[Conservation of Energy] is a most abstract idea, because it is a mathematical principle; it says that there is a numerical quantity which does not change when something happens. It is not a description of a mechanism, or anything concrete; it is just a strange fact that we can calculate some number and when we finish watching nature go through her tricks and calculate the number again, it is the same. [...]

It is important to realize that in physics today, we have no knowledge of what energy is. We do not have a picture that energy comes in little blobs of a definite amount. It is not that way. However, there are formulas for calculating some numerical quantity, and when we add it all together it gives “28″—always the same number. It is an abstract thing in that it does not tell us the mechanism or the reasons for the various formulas.

For a good survey of the problem of defining computation I recommend Piccinini’s chapter on Computationalism in the Oxford Handbook of Philosophy and Cognitive Science.

Posted in anticomputationalism | 2 Comments

Leaving the Nest: variables with interleaving scopes

A paper [PDF] with this title, co-written with Jamie Gabbay and Daniela Petrişan, will appear in Computer Science Logic (CSL 2015). Actually the full title is Leaving the Nest: Nominal techniques for variables with interleaving scopes. The “nominal techniques” part is in reference to a new mathematics of names pioneered by Gabbay and Pitts in the late 1990s. In this post I will give a very brief and informal taster/introduction to nominal techniques then tell you why our paper is actually quite interesting.

Names are a peculiar data structure. There are only two things you can do with them. Given two names you can compare them, and given a finite set of names you can find a fresh name relative to them, i.e. a name that is not in that set. There is a perhaps subtle distinction to be made between names and concrete representation of names, which can be strings or integers. Concrete representation might allow other operations as well, such as concatenation or addition, but qua names they only have equality testing and fresh generation.

Because there are only very few things we can do with names and to names, nominal structures, i.e. data structures incorporating names have an interesting property called equivariance, which means informally that all meaningful properties of such structures are preserved by name permutations. What does this really mean? Consider the following OCaml program:

let x = ref () in let y = ref () in x == y;;

A new unit reference, a pointer is created and assigned to x, then another one to y, then they are compared for equality. This little program is always equivalent to false. No matter what pointer-values are returned by the memory management runtime of OCaml, they will always be different. The actual value of the pointers qua names must be irrelevant! This is why invariance is under permutation: they can change names but they will not map different names to equal names.  This simple and elegant principle leads to a powerful and elegant theory of names about which you can read in Andy Pitt’s excellent book.

Nominal techniques really come into their own when dealing with syntax which uses locally-scoped names, such that of the lambda calculus (e.g. λx.x) or predicate logic (e.g. ∀x.x=x). In a syntactic context, the principle of equivariance leads to an equivalence of terms up-to-renaming, commonly known as alpha-equivalence. So terms such as λx.x and λy.y are equivalent. Moreover, this equivalence is also a congruence (is preserved by syntax) so that we can work with equivalence classes under alpha-equivalence instead. This is done often informally in the study of syntax, but a solid mathematical understanding of names is actually required to ensure that recursive definitions over nominal structures — i.e. something as banal as substitution — are actually well defined.

A common theme in the literature, motivated by syntax, has been the use of names in inductively-defined structures, where name abstraction is a term-forming operation. By definition, these names are always well scoped, which means that their scopes are well nested. However, names can be used in a different way, not in the language itself but in the semantics, at the meta level, to mediate access to resources. Memory locations, file handlers, network sockets, etc.  are all examples of such resources which, much like names, can only be tested for equality and created: they are names. However, unlike syntactic names they can be not just created but also destroyed. And the pattern of creation and destruction doesn’t need to be well nested anymore.

For example, this is a sequence where the scopes of a, b, c are interleaved (we mark name-creation with ⟨ and name-destruction with ⟩): ⟨a ⟨b a⟩ ⟨c b⟩ c⟩. The name b is introduced after a is created and when is destroyed it is still “active”. With the (obvious) intuition of scope and alpha-equivalence, we should expect the sequence above to be equivalent to ⟨a ⟨c a⟩ ⟨b c⟩ b⟩.

In our paper we give a mathematical treatment of this phenomenon, focussing on what we believe to be the most interesting aspects: What does it mean for a variable to be free or bound in this setting? What is the right notion of alpha-equivalence when scopes are not well nested? How can we define proper functions on sequences that use names with interleaved scopes? We give syntactic, relational and algebraic models for alpha equivalence which coincide — this is certainly reassuring that we gave the right definitions.

An example which illustrates alpha equivalence and name capture quite well is interleaving of sequences with names. For example, what happens if we want to interleave the sequence ⟨a a⟩ with itself? The standard definition of sequence interleaving gives us two sequences: ⟨a a⟩ ⟨a a⟩ and ⟨a ⟨a a⟩ a⟩. However, taking into account the fact that ⟨a a⟩ = ⟨b b⟩ this should be the same as interleaving those two, which seems to give us six sequences: ⟨a a⟩ ⟨b b⟩, ⟨a ⟨b b⟩ a⟩, ⟨a ⟨b a⟩ b⟩, ⟨b b⟩ ⟨a a⟩, ⟨b ⟨a a⟩ b⟩ and ⟨b ⟨a b⟩ a⟩, but quotienting by alpha-equivalence (in the obvious way) we are left with three: ⟨a a⟩ ⟨b b⟩, ⟨a ⟨b b⟩ a⟩, ⟨a ⟨b a⟩ b⟩, which is still not the same as two. What is going on here?

⟨a a⟩ ⟨b b⟩ = ⟨a a⟩ ⟨a a⟩  – we can re-use bound variable ‘a’ instead of ‘b’
⟨a ⟨b b⟩ a⟩ = ⟨a ⟨a a⟩ a⟩  – we can reuse bound variable ‘a’ instead of ‘b’, noting that there is some name-shadowing going on

So we are left with ⟨a ⟨b a⟩ b⟩, which is the only one which genuinely requires two names in a situation akin to variable capture in the lambda calculus. With a proper notion of alpha-equivalence our definition of interleaving just works and we can skip such tedious analyses.

This has a neat diagrammatic representation — if you have read this blog you may have guessed that I like diagrams! The sequence ⟨a a⟩ represents the diagram , with the two endpoints delimiting the scope. This is nice because the name ‘a’ no longer appears, so it is “obvious” that

  • ⟨a a⟩ ⟨b b⟩ = ⟨a a⟩ ⟨a a⟩ = 
  • ⟨a ⟨b b⟩ a⟩ = ⟨a ⟨a a⟩ a⟩ = 
  • ⟨a ⟨b a⟩ b⟩ = 

In terms of more serious applications, formalisations of game semantics and other trace semantics should now further improve.

Before I sign off, a shout out to my student Bertie Wheen for proving some of the syntactic results in Agda.

Posted in programming languages | Leave a comment

Inventing an algebraic knot theory for eight year olds (V)

Towards Equations

We finally reconvened our maths club after a mid-term break (children not available) combined with the university exams period (me not available). I correctly assumed nobody remembered anything we talked about before. At least not in detail. Also, we had a significant number of new children joining our group, who had no idea about what we have been up to.

In the previous sessions we discovered and invented our notation, which consisted of three combinators (X, C, I) and three operations: composition (“gluing”), tensoring (“stacking”) and dual (“flipping”).

 

 

This time we just took this as a given, the starting point of our development. If the previous sessions were analytic, trying to decompose knots into basic constants and operations, this session was synthetic, starting with formulas and drawing them up and seeing what we get.

The first formula we contemplated was C;C*, which everyone immediately identified as a circle. The knot-theoretic name for this is the unknot, word which caused a certain amount of amusement. One student cleverly observed, unprompted, that (C;C*)* would have been the same thing, also an unknot, and we had a short discussion about how symmetry of a knot K boils down to K=K*. That was quite rewarding and fun and prepared the ground, somewhat serendipitously, for the next formula C;X;C*:

The same student rashly pointed out that this shape is also symmetric, i.e. C;X;C*=(C;X;C*)*. Is this true or false, qua knots?

The ensuing discussion was the most interesting and important part of the session. Initially the students were in majority supportive of the validity of the equation, then someone pointed out that X≠X* so, they reckoned, it must be that

Indeed, the shapes are not identical. The opinion of the group swung so they all agreed that the equation is not valid and the two shapes (original and flipped version) are not equal.

But aren’t they? What do you think? We are talking knots here, and two knots should be equal when they are topologically equal, i.e. they can be deformed into each other without any tearing and gluing. So in fact they are equal because:

So here we have a genuinely interesting equation, where two formulae are equal not because the shapes they correspond to are geometrically equal (i.e. “identical”) but because they are topologically equal, i.e. there is a smooth deformation between them. Also note that the deformation must be in 3D by “twisting” the left (or right) side of the unknot — a 2D transformation would make the wire “pinch” which may or may not be allowable, depending on how we precisely set things up (if we were very serious and careful about what we are doing).

The point was quickly grasped and we moved on. It is a subtle point which I think can be damaged by over-explaining if the initial intuitions are in the right place, which they seemed to be.

Next we looked at a much more complicated formula that took a long time to draw correctly: C;(I⨂C⨂I);(X⨂X*);(I⨂C*⨂I);C*.

As you may see, if we are drawing this as a knot and tidy things up a bit we actually have this:

Which is really just two unknots — they don’t interlock so they can be pulled apart with no tearing and gluing, i.e. topologically. The formula for this is the much simpler C;C*;C;C*!

This point was not easy to make, and it seemed difficult for the students to appreciate it. By this time they were quite tired and the drawing of the more complex formulation of the two unknots diagram drained them of energy and patience — they made quite a few errors and had to restart several times. Showing them a much easier way to achieve the same knot diagram almost made them angry!

I hope to use this as a motivation for the future: how to simplify a formula before we draw it. Until now we have looked at equality semantically, using the underlying model. In due course, as students become more used with manipulating knot diagrams using formulas, I will start introducing some general equational properties.

Posted in teaching | Leave a comment