Leaving the Nest: variables with interleaving scopes

A paper [PDF] with this title, co-written with Jamie Gabbay and Daniela Petrişan, will appear in Computer Science Logic (CSL 2015). Actually the full title is Leaving the Nest: Nominal techniques for variables with interleaving scopes. The “nominal techniques” part is in reference to a new mathematics of names pioneered by Gabbay and Pitts in the late 1990s. In this post I will give a very brief and informal taster/introduction to nominal techniques then tell you why our paper is actually quite interesting.

Names are a peculiar data structure. There are only two things you can do with them. Given two names you can compare them, and given a finite set of names you can find a fresh name relative to them, i.e. a name that is not in that set. There is a perhaps subtle distinction to be made between names and concrete representation of names, which can be strings or integers. Concrete representation might allow other operations as well, such as concatenation or addition, but qua names they only have equality testing and fresh generation.

Because there are only very few things we can do with names and to names, nominal structures, i.e. data structures incorporating names have an interesting property called equivariance, which means informally that all meaningful properties of such structures are preserved by name permutations. What does this really mean? Consider the following OCaml program:

let x = ref () in let y = ref () in x == y;;

A new unit reference, a pointer is created and assigned to x, then another one to y, then they are compared for equality. This little program is always equivalent to false. No matter what pointer-values are returned by the memory management runtime of OCaml, they will always be different. The actual value of the pointers qua names must be irrelevant! This is why invariance is under permutation: they can change names but they will not map different names to equal names.  This simple and elegant principle leads to a powerful and elegant theory of names about which you can read in Andy Pitt’s excellent book.

Nominal techniques really come into their own when dealing with syntax which uses locally-scoped names, such that of the lambda calculus (e.g. λx.x) or predicate logic (e.g. ∀x.x=x). In a syntactic context, the principle of equivariance leads to an equivalence of terms up-to-renaming, commonly known as alpha-equivalence. So terms such as λx.x and λy.y are equivalent. Moreover, this equivalence is also a congruence (is preserved by syntax) so that we can work with equivalence classes under alpha-equivalence instead. This is done often informally in the study of syntax, but a solid mathematical understanding of names is actually required to ensure that recursive definitions over nominal structures — i.e. something as banal as substitution — are actually well defined.

A common theme in the literature, motivated by syntax, has been the use of names in inductively-defined structures, where name abstraction is a term-forming operation. By definition, these names are always well scoped, which means that their scopes are well nested. However, names can be used in a different way, not in the language itself but in the semantics, at the meta level, to mediate access to resources. Memory locations, file handlers, network sockets, etc.  are all examples of such resources which, much like names, can only be tested for equality and created: they are names. However, unlike syntactic names they can be not just created but also destroyed. And the pattern of creation and destruction doesn’t need to be well nested anymore.

For example, this is a sequence where the scopes of a, b, c are interleaved (we mark name-creation with ⟨ and name-destruction with ⟩): ⟨a ⟨b a⟩ ⟨c b⟩ c⟩. The name b is introduced after a is created and when is destroyed it is still “active”. With the (obvious) intuition of scope and alpha-equivalence, we should expect the sequence above to be equivalent to ⟨a ⟨c a⟩ ⟨b c⟩ b⟩.

In our paper we give a mathematical treatment of this phenomenon, focussing on what we believe to be the most interesting aspects: What does it mean for a variable to be free or bound in this setting? What is the right notion of alpha-equivalence when scopes are not well nested? How can we define proper functions on sequences that use names with interleaved scopes? We give syntactic, relational and algebraic models for alpha equivalence which coincide — this is certainly reassuring that we gave the right definitions.

An example which illustrates alpha equivalence and name capture quite well is interleaving of sequences with names. For example, what happens if we want to interleave the sequence ⟨a a⟩ with itself? The standard definition of sequence interleaving gives us two sequences: ⟨a a⟩ ⟨a a⟩ and ⟨a ⟨a a⟩ a⟩. However, taking into account the fact that ⟨a a⟩ = ⟨b b⟩ this should be the same as interleaving those two, which seems to give us six sequences: ⟨a a⟩ ⟨b b⟩, ⟨a ⟨b b⟩ a⟩, ⟨a ⟨b a⟩ b⟩, ⟨b b⟩ ⟨a a⟩, ⟨b ⟨a a⟩ b⟩ and ⟨b ⟨a b⟩ a⟩, but quotienting by alpha-equivalence (in the obvious way) we are left with three: ⟨a a⟩ ⟨b b⟩, ⟨a ⟨b b⟩ a⟩, ⟨a ⟨b a⟩ b⟩, which is still not the same as two. What is going on here?

⟨a a⟩ ⟨b b⟩ = ⟨a a⟩ ⟨a a⟩  – we can re-use bound variable ‘a’ instead of ‘b’
⟨a ⟨b b⟩ a⟩ = ⟨a ⟨a a⟩ a⟩  – we can reuse bound variable ‘a’ instead of ‘b’, noting that there is some name-shadowing going on

So we are left with ⟨a ⟨b a⟩ b⟩, which is the only one which genuinely requires two names in a situation akin to variable capture in the lambda calculus. With a proper notion of alpha-equivalence our definition of interleaving just works and we can skip such tedious analyses.

This has a neat diagrammatic representation — if you have read this blog you may have guessed that I like diagrams! The sequence ⟨a a⟩ represents the diagram , with the two endpoints delimiting the scope. This is nice because the name ‘a’ no longer appears, so it is “obvious” that

  • ⟨a a⟩ ⟨b b⟩ = ⟨a a⟩ ⟨a a⟩ = 
  • ⟨a ⟨b b⟩ a⟩ = ⟨a ⟨a a⟩ a⟩ = 
  • ⟨a ⟨b a⟩ b⟩ = 

In terms of more serious applications, formalisations of game semantics and other trace semantics should now further improve.

Before I sign off, a shout out to my student Bertie Wheen for proving some of the syntactic results in Agda.

Posted in programming languages | 1 Comment

Inventing an algebraic knot theory for eight year olds (V)

Towards Equations

We finally reconvened our maths club after a mid-term break (children not available) combined with the university exams period (me not available). I correctly assumed nobody remembered anything we talked about before. At least not in detail. Also, we had a significant number of new children joining our group, who had no idea about what we have been up to.

In the previous sessions we discovered and invented our notation, which consisted of three combinators (X, C, I) and three operations: composition (“gluing”), tensoring (“stacking”) and dual (“flipping”).



This time we just took this as a given, the starting point of our development. If the previous sessions were analytic, trying to decompose knots into basic constants and operations, this session was synthetic, starting with formulas and drawing them up and seeing what we get.

The first formula we contemplated was C;C*, which everyone immediately identified as a circle. The knot-theoretic name for this is the unknot, word which caused a certain amount of amusement. One student cleverly observed, unprompted, that (C;C*)* would have been the same thing, also an unknot, and we had a short discussion about how symmetry of a knot K boils down to K=K*. That was quite rewarding and fun and prepared the ground, somewhat serendipitously, for the next formula C;X;C*:

The same student rashly pointed out that this shape is also symmetric, i.e. C;X;C*=(C;X;C*)*. Is this true or false, qua knots?

The ensuing discussion was the most interesting and important part of the session. Initially the students were in majority supportive of the validity of the equation, then someone pointed out that X≠X* so, they reckoned, it must be that

Indeed, the shapes are not identical. The opinion of the group swung so they all agreed that the equation is not valid and the two shapes (original and flipped version) are not equal.

But aren’t they? What do you think? We are talking knots here, and two knots should be equal when they are topologically equal, i.e. they can be deformed into each other without any tearing and gluing. So in fact they are equal because:

So here we have a genuinely interesting equation, where two formulae are equal not because the shapes they correspond to are geometrically equal (i.e. “identical”) but because they are topologically equal, i.e. there is a smooth deformation between them. Also note that the deformation must be in 3D by “twisting” the left (or right) side of the unknot — a 2D transformation would make the wire “pinch” which may or may not be allowable, depending on how we precisely set things up (if we were very serious and careful about what we are doing).

The point was quickly grasped and we moved on. It is a subtle point which I think can be damaged by over-explaining if the initial intuitions are in the right place, which they seemed to be.

Next we looked at a much more complicated formula that took a long time to draw correctly: C;(I⨂C⨂I);(X⨂X*);(I⨂C*⨂I);C*.

As you may see, if we are drawing this as a knot and tidy things up a bit we actually have this:

Which is really just two unknots — they don’t interlock so they can be pulled apart with no tearing and gluing, i.e. topologically. The formula for this is the much simpler C;C*;C;C*!

This point was not easy to make, and it seemed difficult for the students to appreciate it. By this time they were quite tired and the drawing of the more complex formulation of the two unknots diagram drained them of energy and patience — they made quite a few errors and had to restart several times. Showing them a much easier way to achieve the same knot diagram almost made them angry!

I hope to use this as a motivation for the future: how to simplify a formula before we draw it. Until now we have looked at equality semantically, using the underlying model. In due course, as students become more used with manipulating knot diagrams using formulas, I will start introducing some general equational properties.

Posted in teaching | Leave a comment

Types: computation vs. interaction

Type Theory is at the moment the workhorse of programming language theory. It is an elegant, clever and incredibly useful framework for thinking about programming (for full disclosure, I have written several papers with the word “type” in the title). In this post I will not elaborate on the positives of Types but I will try to find at what point they become less useful. The argument below should be accessible to any programmer. To the theorist it will say nothing new, but it will say things that are not said often enough or clearly enough — I think. I hope it will not be too impolite. 

For me, this is an important exercise, because Type Theory is a collection of beautiful and seductive ideas, and as such it has the potential to get one carried away to extremes — like Marxism or Libertarianism. The beauty of Type Theory is not a pragmatic beauty. One is not (yet) in awe at what has been achieved with Type Theory. C and Javascript have this kind of pragmatic beauty, akin to Democratic Capitalism or Social Democracy. They succeed despite being messy and riddled with internal inconsistencies. They emerged, were not designed — or at least not in a principled way. Type Theory on the other hand has a beauty arising out of internal coherence, elegance and simplicity. The messy, illogical stuff is frowned upon. But can it be avoided? 

What does a computer do when it does not compute?

Lets ask a silly question first: What does a computer do? Like all simple questions, this one is steeped in ambiguity, so a simple answer cannot follow absent clarifications. What is a computer? Here there are two answers, depending whether we mean a conceptual computer, so the answer could be A Turing machine! or A von Neuman architecture! These are fine answers. But if you mean a physical computer you could say A MacBook! or An Android phone! These are also fine answers. But the language is already playing tricks on us. A conceptual computer computes, but a physical computer computes and interacts with its physical environment. 

A physical computer that merely computes will still interact with the environment, but in an unsatisfactory way: by consuming energy and warming up. For us, as users of physical computers, the computation is the boring part. Computation is what happens when a spinning wheel appears on the screen and the device becomes unresponsive. We don’t like computers when they compute, we like them when they interact. We consider them to do useful stuff when they drive physical devices. When they draw. When they sing and dance. When they forward that packet to the right node on the network. When they retrieve that interesting nugget of information from the TB hard-drive. 

Computation v. interaction provides some creative tension in the world of programming languages. Programming languages come from two directions. Some of them were birthed by logicians who wanted to get a grip on the foundations of mathematics, and who understood that computation played an important role there. This was the genesis of the lambda calculus. Modern programming languages that hail from this tradition include Haskell and the ML family (OCaml, SML, F#). Other programming languages were birthed by engineers who made physical computers and they needed to write programs for them. Fortran and C came about this way. And, of course, nowadays the two families seem to converge. Computational languages (aka “functional”) reluctantly allow interaction-oriented features (“imperative”) and all languages seem to boast of “lambdas” and “closures”, hallmarks of computational/functional languages.

Much energy has been spent proposing and defending precise definition of “functional” (or “declarative”) versus “imperative” languages, so lets not go there. In a futile attempt to avoid this controversy I will say “computational” languages for those languages and language features that focus on computation, and “interaction” languages for those languages and features that focus on interacting with the environment.

What are Types for?

The concept of types, broadly speaking, is essentially computational. It was introduced by Bertrand Russell to avoid certain paradoxes in the foundations of mathematics. In programming languages, types are syntactic guarantees that certain errors will not happen. For example

  • F is code defining a function that takes an integer as an argument and produces an integer as a result.
  • M is code computing an integer.
  • Therefore F(M) is code in which the function F is guaranteed to run correctly, as it receives an integer according to expectations, and which produces an integer as a result.

So just like in Principia Mathematica, types can be used to prevent certain kinds of errors. This is very reasonable, although there are some strings attached. Because type systems are defined compositionally on the syntax they will rule out some programs that would not cause run-time problems. They are sound (no faulty programs accepted) but not complete (some non-faulty programs are rejected). For example, here is a failing OCaml program:

# if true then 0 else 'a';;
Error: This expression has type char but an expression was expected of type int

The first error is caused because in the process of calculating the type of the program the two branches of the if statement must have compatible types, even though the ‘else’ branch is irrelevant dead code. Here is how Python handles this problem:

>>> 0 if 1 else 'a'

The type is not checked and the program runs alright. Everyone happy?

People who think they understand something about the theory of programming languages, including me, tend to agree that what Python does is wrong. Of course, millions of happy Python users disagree and get things done with it on a daily basis. Such is life — if only they listened to the experts.

Is Curry-Howard an isomorphism?

But is this all that there is to be said about types, that they inject a bit of sanity and discipline in programming languages? No, there is more to the story. Enter Curry-Howard. These two logicians noticed a quite fascinating analogy between type systems of programming languages and logics. You can formulate the types of a programming language using logical connectives: A→B can be read either as “A implies B” or “function of argument A and result B”. And then, the calculations required to determine the type of a program, where the type is written as some formula P, are closely related to the calculations required to prove that the logical formula P is true. This is truly an amazing, fascinating analogy and it holds remarkably well for a broad range of logics. Via the correspondence we can derive some truly elegant programming languages, for example Agda, truly a Shangri-La — or maybe a land of the Lotophagi, depending on one’s taste — of types.

But is it really all that surprising that a rich formal system, such as a logic’s proof system, can compute? Any complex enough formal system can compute, see e.g. rewrite systems. Without taking anything away from the elegance of the approach, the fact that it works should not be all that shocking. But the Curry-Howard method teaches us how to awaken the computation that is lurking within any logical system. Only a fool wouldn’t see the beauty and the usefulness of this.

But what about the opposite direction, can we always associate a type system with a “computer program”? That is a harder question because all sorts of things can be used to compute, some of them quite extravagant. Neural networks are an old example, as old as conventional computers. If neural networks compute, what is a “program”? Can we associate a type system with it? Not that I know of. The very question seems strange. How about chemical or DNA-based computing? Types are intrinsically associated with composition, and if a computing device is not made out of components, hoping for a meaningful type system to emerge seems to me unjustified.

To many programmers the notion of program seems to be more basic than that of type, and the two are not intrinsically connected. If a program is a proof, then what does type inference do? We already have the proof, but we don’t know if it is a proof of something? This is a curious thing, isn’t it?


To start questioning the utility, or the possibility, of a type system we don’t need to go as far as computers made of slime. Whenever we need to employ programming languages which are computation-oriented (“functional”) to interact with the real world there is a general feeling of unease. What is the right way to do it? At one end of the spectrum we have ML-languages which simply ignore, at the level of the type system, the interaction aspects. A program of type int->int will take an int, produce an int, and may or may not interact with the world. This is reasonable but to many it seems like an opportunity lost. Why not use the type system to say something about interaction as well?

Haskell does that, by using a type-theoretic device called a monad. Each monad indicates one particular kind of interaction, such as using memory or perform input-output. I am not a fan of monads, I confess. Monads, introduced by Eugenio Moggi, can be used as a language mechanism, the way Haskell uses them, or as a meta-language construct to formulate semantic models of languages that combine computation and interaction, a la ML. The latter is unquestionably useful, a matter of mathematical necessity. The former, I am not sure. Some monads (I/O) package real interaction, some have nothing to do with interaction (the “maybe” monad) and others are mock interactions (the “state” monad). The first one is reasonable, the second one is also reasonable but for completely different reasons (which I suppose is already becoming reasonable), and the third one is unreasonable on the face of it. It is not the same thing to program with state or as if there was state available, for the same reasons that driving a car simulator and a real car is not quite the same thing.

In the research lit there are many other effect systems. They are clever pieces of mathematical thinking, and very elegant within the fairly restricted bounds of what they can do. But to me, at the heart of it, they seem like languages of mock interaction rather than real interaction. They are reasonable to use as meta-languages, to describe certain interactions, but not as programming languages where the interaction should be real.

Compositionality, but at a cost

Compositionality, another word for plug-and-play, is a very pleasant property of systems. It guarantees that if the parts have some good property, the whole will have a good property. It is nice, but it comes at a cost. For example, it is a fact that global optimisations are always more efficient than composing local optimisations and global analyses are more precise than composing local analyses.

If we want to compose, we need to push all the properties at the interface and ignore the innards. This requires abstraction. This is a cost. For example, a language like OCaml won’t let you write a heterogeneously branched if statement because a type cannot be assigned to it.

if x then 0 else 'a'

For the same reason it won’t let you create a heterogeneous list

# [0; 'a'; fun x -> x + 1];;
Error: This expression has type char but an expression was expected of type int

However, unlike heterogeneous if statements, which are silly, heterogeneous lists could be useful. Here is a fairly exhaustive list of how various languages handle heterogeneous lists. In fact you can program heterogeneous lists in dependently typed languages, but it’s unreasonably complicated. Python makes no complaints:

>>> g = lambda x: x**2
>>> [1, 'a', "hello", g] [1, 'a', 'hello', <function <lambda> at 0x103e4aed8>]

To me this is one methodological weakness of type theory, the commitment of having types for all terms in the language. Why is that? Types are designed to facilitate composition, but the natural unit for program composition is not the term but the function, or even the module.  It makes sense to assign types to modules — it is methodologically consistent. But how we calculate these types could conceivably be more flexible, allowing a global perspective within the unit of composition. Types, as properties at the interface could be calculated using the entire arsenal of (decidable!) logics and static analyses. I understand it would make things like language spec and error reporting more difficult, but it should make the programming easier, if the success of languages such as Python is teaching us anything.

Compositionality v. interaction

To me it seems that the abstraction penalty induced by compositionality plays an important role in the questionable suitability of type theories for capturing interaction. Interaction with real-life systems means interacting with messy systems. In order to interact with them we need an interface language that is suitable both for the programming language and for the real-life physical system. This is not impossible, but only if we don’t require the interaction language to model the physical system itself.

Hardware design is an interesting example, as it works on two levels of abstraction. At the high level we have hardware description languages (HDL) which deal with the logical description of the circuit — the language of the Booleans, sometimes enhanced with extra values such as “undefined”. In this language the order of events can be described, but not precise timings. In this language combinatorial feedback loops, are almost always banned because the precise behaviour depends on extra-logical factors, such as precise timings. Yet without combinatorial feedback loops flip-flops and other memory elements cannot be defined.

SR flip-flop

SR flip-flop

The result is to provide memory elements as black box library components, defined and tested in a different, lower-level, language. But at the interface the two levels of abstraction are reconciled and composition can occur.

In a follow-on post I will explain how a similar approach to types can be taken in the case of programming languages.

Posted in programming languages | 1 Comment

Inventing an algebraic knot theory for eight year olds (IV)


Here is a quote that captures the ethos against which our maths club militates:

We take other men’s knowledge and opinions upon trust; which is an idle and superficial learning. We must make them our own. We are just like a man who, needing fire, went to a neighbor’s house to fetch it, and finding a very good one there, sat down to warm himself without remembering to carry any back home. What good does it do us to have our belly full of meat if it is not digested, if it is not transformed into us, if it does not nourish and support us?          – Montagne, “Of Pedantry”.

This is how mathematics is usually delivered to students: knowledge upon trust. And it is knowledge of a particularly mystical variety, where magical symbols come together in incomprehensible ways. This is the myth we need to dispel, this is the knowledge we need our kids to digest. Sometimes Algebra can be nothing more than a way to write down knots and their properties.

Last session, using the notation we developed together, went well. One unexpected hick-up was the fact that my students had not seen parentheses before, but we took that in our stride. They also hadn’t use variables before, so most of the examples were concrete. But the little use of variables we made was not confusing.

I even introduced some proper maths terminology, the Big Maths Words. First I asked them to give names to the composition operation, and they came up with some reasonable suggestions such as gluing, linking or joining. When I told them the Big Math Word was composition they liked it — it was “like music”, where you “compose” a song from notes. Indeed! They also had reasonable suggestions for the tensor, things such as stacking or building, but they didn’t much like the Big Math Word tensor. I don’t like it very much myself, I confess.

For the first 20 minutes we went over the new and tidied up notation. The rest of  the time, 30 minutes or so, we worked out examples of going from a drawn knot to the formula. It is not easy, in fact it is quite a challenge (try the exercises) so they often got things wrong. One pervasive error was getting confused by repeated tensoring (“stacking”) of tangles, K⨂K’⨂K” which often was written as K⨂K’K” — an interesting error.

Tomorrow’s session we will finally reach equations. We have already seen the equation I*=I, which was noticed while developing the notation. First we will start with similar coherence equations, which basically mean that two notations denote the same knot. Things like associativity of composition or tensor are typical coherence equations. But because children have no appreciation of parantheses and operator precedence associativity is perhaps not the best example. Functoriality of the tensor is much more interesting.

  1. Draw the tangle for (C◦C*)⨂(C◦C*).
  2. Draw the tangle for (C⨂C)◦(C*⨂C*).
In both cases the result is a couple of loops. The difference is in how we put the loops together and not in what the result is:
Functoriality means that no matter what tangles H1, H2, H3, H4 (for some reason the kids chose H as the standard variable for knots) the following two tangles, if they can be constructed, are always equal:

The unit and the co-unit also suggest compact-closed-like coherence axioms, which have a more topological flavour. Try this knot: (I⨂C)◦(C*⨂I). It looks like this:

But this is only a wire with some irrelevant bends. We have the equation

(I⨂C)◦(C*⨂I) = I

There is another similar equation, where the shape is like an ‘S’ rather than like a ‘Z’. Can you guess it?

The trove of equations is deep for this algebraic theory of knots and tangles and there is plenty of scope for children to discover equations once they understand what they represent. In addition to compact-closed we can also find equations from traced, braided and closed-monoidal categories.

But most importantly, I think the point is not to just give such equations to the students in order for them to develop skill in calculating knot equivalences using equational reasoning, so I won’t do that. That’s the kind of mystical maths I want to avoid. What is important are these two points:

  1. Understanding the idea of equational reasoning, that the same structure can be expressed in several equivalent ways, and that equations are the mathematical way to show this.
  2. Exploring the space of equations. Ideally, after a while the issues of soundness and completeness should emerge naturally.

Now, a brief comment to the cognoscenti. The categorical structure of this algebra of tangles is not an obvious one. We are using both the unit and the co-unit of compact closed categories, but our category is braided rather than symmetric. However, compact closed categories are symmetric so we are not in an obvious setting. The interaction of braiding and “compact closure” (informally speaking) gives equations like this:

C◦X◦C* = C◦C*

which correspond to this topological isomorphism of tangles:

These are two ways to describing the unknot. The category is clearly non-degenerate because we seem to be able to describe any knot. So combinatorial completeness seems an achievable result. Equational completeness (i.e. any equivalent knots can be proved to be so)  however seems like a challenging and interesting question!

In terms of related work, this seems similar enough: Peter Freyd and David Yetter, Braided compact monoidal categories with applications to low dimensional topology, Adv. Math. 77, 156-182, 1989.

If you are interested in another diagrammatic explanation of algebra also have a look at Pawel Sobocinksi’s entertaining and informative blog.

Posted in teaching | 1 Comment

Inventing an algebraic knot theory for eight year olds (III)

Unlike previous posts, this one precedes the math club meeting. The reason is that we need to tighten some loose ends in our emerging knot theory notation. The first two meetings (I and II) were all about exploring knots and ways to describe them using notations, and we made some impressive strides. But at this stage I need to exercise some authority and steer them firmly in the right direction:

  • We are going to use I rather than 0 (zero) to denote the identity of composition. Even though 0 is a neutral element for addition, as correctly spotted by the children, using it as the neutral element for composition is not idiomatic. 1 (one) is more standard, for various reason I will not get into here.
  • Using a fraction-like notation for “parallel” (tensorial) composition is also quite unidiomatic, although it is intuitive. We shall introduce the standard ⨂ notation instead.
  • We will stick with a generic duality operation _* so that our “unit” and “counit” are C and C* rather than C and D as some of the children insisted on.

These are of course small matters which we could put up with, at the expense of becoming incomprehensible to the rest of the world. The more important thing is that the interplay between “sequential” (“functional”) composition and “parallel” (“tensorial”) composition is not very easy to get right. Look at our overhand knot:

The individual components are well identified but a nice way to put them together is unclear. There is a sequential composition at the top (X◦X◦X) but how to connect that with the C, L and C* underneath is not clear from the decomposition. X “interacts” with C and C with L but not obviously sequentially or in parallel.

The way out is to introduce types. What are types? To paraphrase John Reynolds’s succinct explanation, they are a syntactic discipline that guarantees correct composition. And this is what we need here.

We shall always give knots a type (m, n) ∈ N². The left projection represents how many “loose ends” stick out to the left and the right projection how many to the right. So this is a (4, 6)-knot:

Note the implicit restriction: we allow no loose ends to stick out from the top or from the bottom. Is this a reasonable restriction? Yes, because if a bit of string goes down or up we can always “bend” it left or right then “extend” it until it lines up nicely with the other loose ends. This topological invariance of knots should be intuitively obvious to my young students, I hope.

So the types of our basic knot parts are:

Now we can say that we only allow the composition K◦K’ for knots K:(m,n) and K:(m’, n’) if n=m’, resulting in a knot K◦K’:(m, n’). Here are two composable knots:

And this is their composition, with “composition links” emphasised:

Note that because these are “strings” the loose ends don’t need to be “geometrically” aligned. It is enough to have the right number of loose ends to be able to glue them. And this is their actual composition:

Any two knots K:(m, n) and K’:(m’, n’) can be composed in parallel in a knot K⨂K’:(m+m’, n+n’). The composition of the two knots above is:

So now everything is in place to properly describe algebraically our overhand knot:

Above we introduced -•- as a shortcut for repeated sequential composition, useful for concision. Note that we needed the identity string I not just on the bottom, but also to the left and to the right, in order to make the loose ends match properly in sequential composition. This kind of “padding” with identities will be a useful trick that will allow us to define a lot of knots compositionally!


  1. (easy) Is this the only way to describe an overhand knot? Give an alternative description.
  2. (easy) Is this the only kind of overhand knot? If no, write the definition of a different overhand knot.
  3. (medium) Write the definition of a Granny Knot and its variations: the reef, the thief and the grief knots.
  4. (medium) Write a definition for the Shoelace Knot.
  5. (hard) Is there a knot that cannot be described with our notation?
Posted in teaching | 1 Comment

Chalmers’s Digital Radio

In previous posts I gave arguments against the apparently widely held belief, at least amongst readers of this blog, that all mental states and phenomena, consciousness in particular, are reducible to computational processes in the brain: computationalism. Whether this is or isn’t the case divides the philosophical community, with famous names agreeing (Searle, Penrose) or disagreeing (Dennett, Chalmers).

This post is not another argument against computationalism but an attempt to be more scholarly. I will take a brief critical look at one of the more influential papers in the area, Chalmers’s “A Computational Foundation for the Study of Cognition“ (Journal of Cognitive Science 12:323-57, 2011). There are some things I admire about this paper. For once, it sets out quite meticulously to define what a computation is, making the important distinction between abstract computation and implemented computation. Even though as a computer scientist I find many points of disagreement in the detail of how this theory is constructed, the fact that Chalmers considers it in the first place is admirable. Many computationalist philosophers fail to make this distinction, which renders some of their arguments sloppy and implausible. But at least Chalmers answers definitively some rather silly (at least to a computer scientist) questions raised by Searle and Putnam about the relationship between computation in the abstract and the concrete. The opening sections of the paper found me nodding in agreement most of the time.

The main thrust of Chalmers’s computationalist explanation is in Section 3 which spells out his theory of “organizational invariance of mental properties”. Unlike the previous section, where technical words are used in their technical sense, here technical-sounding words are used in a rather vague and informal way. Sentences like this set off all sorts of alarm bells, as the amount of sense they make is questionable: “Causal topology can be thought of as a dynamic topology analogous to the static topology of a graph or a network.” I shall not quibble with language though but try to assign sensible and charitable meaning to the argument in order to make a sensible counterpoint.

The argument is conveniently summarised by Chalmers, so I can quote it in its entirety:

The argument for this, very briefly, is a reductio. Assume conscious experience is not organizationally invariant. Then there exist systems with the same causal topology but different conscious experiences. Let us say this is because the systems are made of different materials, such as neurons and silicon; a similar argument can be given for other sorts of differences. As the two systems have the same causal topology, we can (in principle) transform the first system into the second by making only gradual changes, such as by replacing neurons one at a time with I/O equivalent silicon chips, where the overall pattern of interaction remains the same throughout. Along the spectrum of intermediate systems, there must be two systems between which we replace less than ten percent of the system, but whose conscious experiences differ. Consider these two systems, N and S, which are identical except in that some circuit in one is neural and in the other is silicon.

The key step in the thought-experiment is to take the relevant neural circuit in N, and to install alongside it a causally isomorphic silicon back-up circuit, with a switch between the two circuits. What happens when we flip the switch? By hypothesis, the system’s conscious experiences will change: say, for purposes of illustration, from a bright red experience to a bright blue experience (or to a faded red experience, or whatever). This follows from the fact that the system after the change is a version of S, whereas before the change it is just N.

Before pointing out the logical fallacy of this argument (can you spot it?) lets travel in time, to a strange and dystopian future where in the wake of an untold cataclysm humanity will have lost some of its scientific knowledge. Among the forever lost items is all knowledge of radio communication. So when the archeologists of the future discovered a radio set in working condition it caused somewhat of a sensation. They surrendered the radio for examination to their contemporary scientists and philosophers. Moreover, unbeknownst to all, one fully automated nuclear-powered radio station, hidden among the ruins of the past, was still eternally broadcasting.

The scientists of the future figured out quickly that the radio was an electronic device, plugged it in and delicately experimented with the various knobs and buttons. The on/off button and the volume knob where easy to figure out. Their initial theory was that they discovered a white-noise machine. The philosophers of the day explained at length and convincingly why ancient folk might have wanted to build such a machine. But their speculations where quickly abandoned when the scientists figured out that the most mysterious knob, which we call the dial, had a function after all: If you point it to a particular number on the scale the device starts producing music instead of noise. Being able to listen to what was probably the sacred music of the ancient culture caused widespread fascination. Historians established with some confidence that Madonna was none other than the mythical figure revered as the mother of God in the ancient religion of Christianity.

This was a most interesting discovery not only culturally but also technologically. In the dystopian future digital devices such as computers and digital music players were still manufactured. So there was a solid understanding of music-producing electronic devices. The scientists figured out rather quickly that the radio was not digital, but analog. However, its input-output behaviour was precisely that of a digital music player, so they assumed it was some kind of analog equivalent of a digital music player. They even figured out what the dial was: a broken shuffle knob.

But they were not quite sure, and that was really annoying them. They knew that the ancient civilisation had mastered some lost technologies so they kept wondering. Was that really a music player, just analog? One of their prominent philosophers argued persuasively that ‘yes’:

All the components of the radio are electronic, but analog. So you could isolate each electronic component, interface it to the rest of the device via a kind of modem, then reimplement its functionality digitally. When two digital components are connected we can just remove the superfluous modems. So, in a step-by-step way, we can replace the entire analog device with a digital device so that we end up with a digital music player.

Persuaded, most scientists gave up further investigations into the nature of the radio. Except one well funded and talented scientist who questioned the argument and set out to actually put it in practice. Painstakingly, she figured out and replaced the analogue components of the radio with digital components — until she reached the antenna of the radio. She was shocked by the behaviour of this component. It was a very basic-looking arrangement of conductive elements which produced output out of apparently nothing.  The only plausible hypothesis was that this was the memory element of the device, which stored all the music coming out of it. But it was much too simple in structure to be a memory. Our scientist was baffled. She wrote a few research papers about the stupendous analog memory technology of the ancients, but they got a mixed reception. The tabloid magazines made quite a stir around this mysterious technology but the established research publications refused to publish such speculation.

We are not sure what happened next. Did our scientist give up? Did she eventually discover electromagnetic waves and unlocked the true mystery of the radio? Most likely her lack of publications resulted in her losing her research funding.

Lets return to Chalmers now. His recipe for building a silicone computer out of a brain is similar to the above recipe for building a digital music player out of a radio set. The fallacy of the argument is called begging the question: when you assume that all brain components can be replaced with little silicone computers you already assume that all relevant functionality of that brain component is of a computational nature. But this is what the argument set out to prove in the first place, that the activity of the brain is of a computational nature.

If not computation, then what else is there in the brain that could realize mental states, in particular consciousness? Perhaps not radio-waves either. My money is on the theory that we don’t have the faintest clue. Just like our talented scientist from the future could not imagine radio waves, so we cannot understand what is going on, really, in the brain. Just like her, we understand a lot about various phenomena in the brain — except for the essence of how and why it works at a fundamental level. Our attempts to explain it using whatever concepts we have at hand will be, I have little doubt, as laughable as the miasmatic theory of disease or the phlogistonic theory of combustions. I admit, my theory is not a fascinating one.

Other than that, the paper is nice and readable. With this (fallacious) argument in place and its (mistaken) conclusion reached everything else fits neatly together. Just don’t use it as a recipe if you are trying to build a digital radio.

Posted in anticomputationalism, armchair philosophy | Leave a comment

Inventing a knot theory for eight year olds (II)

So I am running a math club for 8 year olds and we are reinventing (an) algebraic knot theory. We are not trying to reinvent existing knot theory, just to make a journey of intellectual and mathematical discovery. In the first meeting I gently shocked the children by showing them that mathematics is not only about numbers, but about patterns in general — such as knots. Being only 8 they are very theatrical in expressing emotion. WHAT!? Maths about KNOTS!? I loved the wild face expressions and hand gestures. We spent most time making and drawing knots and concluded that drawing knots is quite hard. We needed a better notation.

A notation for knots: the knotation

One preliminary observation that I made which resonated well was that there are a lot of numbers out there but we only need 10 symbols to represent all of them: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Of course we could only use 0 and 1, or just 1, but let us not get distracted. It is about having a nice and convenient notation not just any notation. So I drew a couple of knots on the whiteboard and I encouraged the children to try to identify the “smallest pieces the knot is made of”, where by a “piece” I meant this: we draw a small circle on the whiteboard and the “piece” is the bit of knot we see inside the circle. If our circles are small enough it is easy to see that the pieces inside the circles are quite similar. So here is an overhand knot with some parts of it circled.

  •  We notice that even if we ‘zoom in’ on a knot piece, as in the case of the red circles, the interesting knot piece ‘looks the same’:

  • We notice that if a knot piece is too large, as in the case of the blue circle, it can be clearly decomposed into smaller basic knot pieces.
  • We also notice that many knot pieces which we select here and there look quite similar, as in the case of the green circles.

I don’t think there is a clear methodology one can use in inventing a good notation for a concept or set of concepts. It is a fun and creative endeavour. It is a design exercise in which many criteria must be balanced. On the one hand we have expressiveness, which means that we can use the notation to represent a large range of concepts. But we also have elegance, which is more subjective but not entirely arbitrary. We want a small (but not too small) set of notations. We want it to be also concise. We want it to be pretty.

So we explored a while and in the end, with some careful guidance, we narrowed in on these constants:

The names C and X were suggested because of the shape of the knot piece they represent. L was short for “line”. These shapes can be flipped over and result in 3 other constants:

Here there was some disagreement between me and the kids. They were happy with the flipping operation and they were happy with X* but they didn’t like C* because it looked more like a D (without the vertical bar). Some of them insisted that we call them D and X*. L* was clearly a non-issue because it was just like L. I had to put my foot down and we settled on C, X, L and the -* operator.

This exercise also presented us with our first equation: L = L*. We didn’t insist on equations because they will become more important in a few weeks.

The final part of the hour was dedicated to developing notations for assembling knots out of these constants. There are two obvious ways to compose knot pieces, horizontally and vertically. Here is how the overhand knot can be split into basic components:

How do we put them together? The bits at the top can be quite clearly written as X⋅X⋅X. The bit below seemed like C⋅C*. And at the bottom we have L. Because the bit at the bottom is quite long many kids wrote it as L⋅L or L⋅L⋅L but they quickly realised our second equation: L = L⋅L. This observation led to an unfortunate turn of events in which the kids decided that is like 0 because added with itself stays the same, and therefore changed it from L to 0. Next week we will need to change it to the correct one (1) even if I need to exercise my executive power.

This is where we ran out of time. So we didn’t have any good notation for vertical assembly. Because we didn’t have vertical assembly we couldn’t really test our notation — to the trained eye it’s quite obvious it is not quite good yet. The way C, C* and L are composed is not entirely clear because vertical and horizontal composition interact. But it is a start. I want the children to move away from how they are used to study mathematics, as a stipulated set of rules in which answers are either correct or incorrect. I don’t want them to execute the algorithm, but to invent the programming language in which it is written. Coming up with stuff that doesn’t work is OK, as long as we notice it doesn’t work and we improve on it.

But what did we accomplish? Quite a lot. In one hour a dozen 8 year olds rediscovered some of the basic combinators used in monoidal categories, such as the braid, or such as duality, unit and co-unit as used in compact closed categories. We also rediscovered composition and its identity, and some of its equations. We used none of the established names and notations, but we discovered the concepts. We are on our way to inventing a fairly original categorical model of knots and braids. Next week we will quickly figure out the monoidal tensor and move on to the really exciting bit: coherence equations.

Posted in teaching | 1 Comment

Inventing a knot theory for eight year olds (I)

Last year I experimented with teaching seven and eight year olds about monoidal categories using a very concrete model of cardboard rectangles and bits of string. The experiment was a success, with children becoming able to represent formally a model or to construct a model from a formal specification. They also developed an understanding of some key concepts, such as coherence. The physical manipulation of cardboard and strings was fun and relating it to notation and equations was surprising to the children and, I think, interesting.

This year I am reprising the experiment, with some changes. The first thing I wasn’t too happy with last year was the overhead of constructing models of cardboard and string. Too many things can go wrong, from poking holes too close to the margins so that they tear to making the wires too long or too short. Not to mention that some children did not know how to tie a knot! The motivation of building structures of cardboard and strings was also unclear, I didn’t even have a name for these structures. So this year we are only looking at knots. They are a simpler to build physical structure since all you need is the strings — no more cardboard and hole punchers. They are also something children already know. And if they don’t know how to tie them it’s OK, because that’s what we are learning. The second change I am making is slowing down the pace and making the process more exploratory. I will not try to teach them anything in particular but we will aim to develop together a term algebra for knots.

The first meeting of our club started with a general discussion about what is mathematics. Third-graders are taught a very narrow view of the subject, which they tend to identify with arithmetic. “Tricky problems” was another common view of what mathematics is about. When I asked whether a triangle is something that mathematics deals with the kids started to question their own understanding of what it is really about.

I proposed that mathematics is the general study of patterns and this was immediately accepted. Then we talked a while about the need in mathematics to represent concepts and I introduced the concept of “notation”, illustrated with a potted history of numerals, notations for the numbers. We spent some time on hieroglyphic numerals, a topic which turned out to be very enjoyable in its own right. I wanted children to discover the inconveniences of this system compared with our Indo-Arabic system and they indeed came to that conclusion quite soon.

The next step was to introduce knots. The fact that mathematics would have something to say about knots struck many children as implausible but even the most skeptical changed their tune immediately when reminded that “knots are some kind of patterns we make with strings” and “maths is about patterns”. This abrupt change of mind and wholehearted acceptance of a previously rejected thesis makes working with children a great pleasure.

We started by making knots. The first bit of technical terminology we established was calling anything we make with strings “a knot”, including braids or bows or even a plain bit of string. This was deemed acceptable. Making knots and experimenting with various ways of tying them was of course a lot of fun.

This is also a knot!

The second step was representing knots by drawing them. The challenge was to draw a half-knot:

The children under-estimated the difficulty of the task, producing something like this:

This drawing of course makes line crossing ambiguous. Which line is above and which line is below?  I suggested leaving small gaps to indicate line crossings:

The principle was readily understood but the execution was problematic. Drawing the knot correctly proved a real challenge, with more than half the children failing to draw it correctly.

The upshot of all this struggle was a consensus that we really need a better way to write down knots, a “notation” (knotation?). The groundwork has been laid for creating a term algebra for knots next week.


Posted in teaching | 2 Comments

Two kinds of mathematics

Preliminary reading:

  • “The Origins and Motivations of Univalent Foundations” by V. Voevodsky’s [PDF]
  • “Proofs and refutations” by I. Lakatos [link]
  • “On proofs and progress in mathematics” by W. Thurston [link]

I find the recent surge in interest in the use of proof assistants in mathematics exciting. Because of the nature of proof assistants — computer systems by necessity lack mathematical intuition and must be fully formal — there is a renewed interest in logic and foundations of mathematics. What is special this time around is that there seem to be echoes of this interest among mainstream mathematicians. Because history always repeats itself some of the old arguments are being rekindled:  ”conceptual” versus “formal” mathematics, or call it “platonism” vs. “logicism” if you have a philosophical bend, or, as G. C. Rota nicely put it “truth” vs. “the game of truth”.

As most philosophical arguments, these arguments arise out of confusion. This is not slagging philosophy, just highlighting its role and methodology: clarifying confused (or confusing) concepts by employing them in dialogue, like new and untested weapons in a military exercise. Once the concepts are clarified they can be passed over to science, mathematics, etc. This passing of the baton is of course not an official ceremony, but it is a slow osmosis of ideas through the many membranes that separate various fields and sub-fields of knowledge. This is my attempted contribution to clarifying the confusion.

There is no need for a conflict between “formal” and “conceptual” mathematics, because there is no need to have only one kind of mathematics. This is as unnecessary as forcing a choice between “constructive” and “classical” mathematics: they simply are two different worlds, as explained very nicely by Andrej Bauer.  The universe of mathematics would be poorer without any of them. Similarly, the universe of mathematics would be poorer without “formal” or without “conceptual” mathematics. But what are they and what purpose do they serve?

 ”Conceptual” mathematics

This is the conventional established mathematical tradition that Lakatos and Thurston so beautifully describe in their papers. Somewhat surprisingly, conceptual mathematics is not “just” about formulating and proving theorems. It is not automatically the case that the mathematician who proves the most theorems or the hardest-to-prove theorems is the highest rated. The fascinating beauty of conceptual mathematics comes out of the creative tension between definitions, theorems and proofs. It is not always the case that the definitions are produced, then, using proofs, theorems are churned out. Sometimes the desirable theorems are known well before the definitions have been clarified (see early work on mathematical analysis). Sometimes a proof is so interesting that both definitions and theorems are adjusted to fit it (See “Proofs and Refutations”). Everything is in a state of flux and mistakes are possible but not necessarily devastating: they show room for improvement and further work.

Computer-aided proof assistants can get in the way of conceptual mathematics. Because they force a full formalisation of all concepts, the concepts can become buried under too much book-keeping and thus lose their intuitiveness. Proof assistants are excellent at checking correctness but are not necessarily the best vehicle for communicating ideas. And this is what mathematicians use proofs for: communicating ideas, not checking correctness. This is one of the reason why mathematical papers in any field are so impenetrable even to mathematicians in other fields. They use a specialised language that relies heavily on the social context in which that particular mathematics happens. Lacking the context it is almost impossible to read the papers. The mathematics happen inside mental models that are poorly described by the formalisations used in mathematical papers. Conceptual mathematics requires more than correctness of proof, this is why the Four-Colour Theorem and its proof which relies heavily on computation makes conventional mathematicians scrunch their noses in dismay.

 ”Formal” mathematics

Exploring mathematical concepts is fascinating, but sometimes we really just need to know if a statement is true or not. This is often the case if we build things and we are concerned that these things behave in a desired way. Since mathematics is the language we use to talk about things in the real world (e.g. physics or engineering) there is a practical concern with the validity of mathematical statemants. In the same context, the aesthetics of the definition-proof-theorem interaction are often of no interest.

Software is perhaps the most obvious example in which mathematical concepts are designed and used in a way that has an immediate impact on the real world. A computer program is an abstract mathematical concept that drives the behaviour of a concrete physical device, the computer. Most errors in the behaviour of a program come not from the hardware but from the software itself — they are of a purely mathematical nature. But programming languages and programs (existing ones, not as a matter of necessity) are ugly mathematical objects, devoid of conceptual elegance and encumbered by complications and contingencies. Their genesis is not driven by a taste for elegance, as most conceptual mathematics, but mainly by how efficient they are in driving the physical computer, itself a complicated and ad hoc device, to perform certain tasks. However the correctness of software is of crucial importance, and the only realistic way in which it can be achieved is by using formal mathematics, as embodied by proof assistants.

But it’s not just software. Properties of security protocols and encryption schemes are also examples of mathematical properties where correctness trumps in importance conceptual coherence.

Conceptual mathematicians would of course rather that their theorems were true. This is why computer checking of conceptual mathematics is of some interest. And when a massive formalisation effort such as Gonthier et. al.’s formalisation of the Feit–Thompson (odd order) theorem turns out an insignificant amount of errors there is good reason for high-fives and some sighs of relief.

 ”Formal conceptual” or “Conceptual formal” mathematics

The area of overlap between formal and conceptual mathematics is a fascinating one. This is where the Univalent Foundations project is happening.  What is the right foundation for mathematics? Not just in terms of being sound but also in making conceptual mathematics more like formal mathematics (i.e. checkable) and formal mathematics more like conceptual mathematics (i.e. readable).

Other areas of mathematics that should be both formal and conceptual are those where the choice of foundational structure is important: type theory, logic, constructive mathematics. Unlike classical mathematics, which take an everything-goes approach to its logical foundations, these areas must be more discerning in what rules are or are not acceptable. A proof of a property must not trespass into illegal territory, and for informal proofs it can be quite difficult to assess that.

As most computer scientists my own mathematical work has elements of the formal and the conceptual. An example of the mostly formal is recent work on distributing the Krivine machine [link][PDF] whereas an example of the mostly conceptual is work on game semantics [link][PDF]. I cannot brag that I have done a lot of work in mathematics that is both formal and conceptual, but if you forced me to give the best example it’s perhaps work on formalising some of the structures encountered in game semantics to facilitate reasoning [link][PDF].

Posted in armchair philosophy, proof assistants | 1 Comment

Two views of programming language design

In computer science, the Boolean or logical data type is a data type, having two values (usually denoted true and false), intended to represent the truth values of logic and Boolean algebra. [Wikipedia]

In computer science, a tagged union, also called a variantvariant recorddiscriminated uniondisjoint union, or sum type, is a data structure used to hold a value that could take on several different, but fixed types. [Wikipedia]

In certain circles there is only one way to think of a programming language: lambda calculus and some extra stuff. Depending on where you add this extra stuff, you get two different, although not incompatible, views on programming language design.

I hope we can all agree that the simply typed lambda calculus is a useful starting point for a programming language. But it must be merely a starting point. Because the only values are functions, comparing results of computations requires some theoretical somersaults. It also makes operations such as input or output problematic. So we need to add stuff to it. Without undue theorising let us jump into an example: booleans.

The first way of doing it is a minimalistic one: we just add a new type to the language (lets call it bool ) and some typed constants for manipulating booleans:

true, false : bool
not         : bool -> bool
and, or     : bool -> bool -> bool 

Some are ground-type constants, and some are unary or binary first-order constants (a.k.a. operators).  For easier future reference, lets call this the axiomatic approach because, if you keep in mind the Curry-Howard correspondence, it is sort of like adding arbitrary axioms to a natural deduction system.

This works but it is unsatisfactory on several accounts. The first one is the rather arbitrary way in which we are adding types and constants to the language. The second one is that the behaviour of booleans and the associated constants is not (yet) specified, so we need to add this essential information somewhere (language reference, operational semantics, etc.).

The third objection is that whenever we require yet another data type we need to extend our programming language, with all the hassle that entails. We even need to change the parser. For example, adding natural numbers would require yet another new type nat and a bunch of constants:

0, 1, 2, ...           : nat
neg                    : nat -> nat 
eq, gt, lt             : nat -> nat -> bool

Then how about more complicated data types such as lists or trees, or types for dealing with state or other computational effects? There must be a better way.

A better way, which we shall call the type-theoretic approach, seems to be to provide the programming language with a way of constructing its own types, such as algebraic types or dependent types.  Many functional programming language offer this facility. In OCaml, for example, booleans and naturals and the associated operations can be defined:

type bool = True | False ;;
let not = function 
 | True -> False
 | False -> True;;
type nat = Zero | Suc of nat;;
let rec plus m n = match m with 
 | Zero -> n
 | Suc m -> Suc (plus m n);;

This second — type theoretic — approach is clearly more elegant, solving the problems of the axiomatic approach: no more adhockery, behaviour is defined, the language can be extended uniformly in a nice and elegant way.

But is it always better, from all points of view?

There is at least one dubious scenario: interfacing with the physical world, including the computer itself. The built-in data types of the computer (char, int, float, etc.) are terribly ad-hoc and lacking in uniformity. However, the computer as a physical device is at its most efficient when working with these built-in types, because it has been designed to work with them. But it is very tricky to make a compiler that just knows how to connect a user-defined type (e.g. for natural numbers) and its constants with hardware-supplied operations. For starters, the match may not be perfect. The user-defined natural numbers for example don’t have size restrictions, so you might say that compiling them as computer integers is just wrong. But in general, since the type-theoretic approach defines new concepts within the language, their implementation proceeds via the definitions, simplified into the core of the language. This unpacking of definitions obscures whatever intended connections might be with a physical reality.

A similar related scenario is when a program needs to interface with a physical device, which can usually be reduced to sending bits and receiving bits. Here the type-theoretic approach becomes uncomfortably constraining and some languages simply abandon typing discipline when I/O is concerned. A similar problem does not arise in the case of the axiomatic approach, mainly because the axiomatic approach is too simplistic to even ask the question in the first place.

Consider memory (RAM) as a physical device. How do you interface with it? The program either sends a tuple representing an address and some data to be written at that address, to which the memory replies with an acknowledgement; or the program sends an address and receives the data stored at that address. To add this memory to a lambda calculus in an axiomatic way is not a problem.

We introduce the type of variables simply as a tuple of a write and a read operation with the signatures as informally described above. And a ram (second-order) operation would take a function of a var, bind that to the memory, use it, then release the binding upon exiting the body of the function.

type mem = (nat -> nat -> nat) x (nat -> nat)
ram : (mem -> nat) -> nat

Read and write are just projections to select the right “method”. For example, a program such as

#new ram m;
m[10] := m[10] + 1

would be desugared with the syntax above as

ram (\m.(proj1 m)(10)((proj2 m)(10) + 1)

A type-theoretic approach to effects is perhaps possible, using the rather well known monads or the funkier lenses or the more sophisticated algebraic effects. However, it is very difficult in general to make types match whatever arbitrary interface your physical system or device might have. And it is even more difficult to make the compiler hook up the generated code to the device interface, because of the same reason mentioned above: in the type-theoretic approach complex types are first usually translated into a simpler core language, and in the translation process the type-theoretic hooks that allow the connection to the physical device can get obscured or even lost.

Note that the device doesn’t need to be physical, it can also be a data type that doesn’t have a convenient algebraic representation. For example, in the axiomatic approach I can give the right interface to a circular buffer but fully specifying it type-theoretically is very hard — maybe not impossible-hard, but research-topic-hard at best.

To summarise, the axiomatic approach views the underlying lambda calculus simply as middleware for hooking arbitrary stuff together, whereas the type-theoretic approach sees the (appropriately enhanced) lambda calculus as a proper programming language. Fortunately the two approaches are not incompatible, because there is nothing stopping us from postulating axioms in a type theory. It may be considered in questionable taste, but it can be helpful!

Posted in programming languages | 3 Comments