The category of cardboard rectangles and bits of wool string (II)

In the second session of our category-theory club for school children we took a giant leap from the informality of diagrams towards formal mathematical notations. Our starting point were the cardboard-and-string diagrams from the previous session which we have already established that they were the same even though the strings had different lengths and colours and the bits of cardboard slightly different sizes. The reason they were the same (I chose not to use “equal” because of possibly overloaded meaning to the children) was because they were created following the same instructions.

The next step was now to draw on paper these diagrams and each child was requested to do that, and there were relatively few problems in performing this task reasonably. As one would expect, the differences between the resulting diagrams drawn by each child were now quite staggering but the idea that they were still the same diagram thankfully persisted because these were obviously accurate (as much as possible) renderings on paper of things that were the same in reality.

With this part of the work concluded successfully we moved on to what I thought will be the most challenging step: introducing a combinator notation for diagrams, inspired by compact closed categories. As a preliminary step they were required to draw the diagrams using only straight lines and right angles (the example was chosen carefully so no braiding was needed) and then we went over the diagram and tried to identify the simplest components. I actually introduced the term “component” as “a thing out of which we make other things” and after some ooh-ing and aah-ing the children seemed happy enough to accept the Big Word. Unfortunately at this critical stage a logistical problem struck: the room did not have a whiteboard or a blackboard, just an electronic “smart”-board. The irony of the term is obvious, and I am preparing a separate rant for the faddish abandonment of perfectly good classical tools of learning for fickle and fragile technology-heavy alternatives. So instead of drawing a big diagram on a big whiteboard and identifying the components I had to walk around each table and do that with each group of children, which wasted a lot of time.

The initial set of combinators we identified were: I (a line, the identity), E (a feedback loop, the compact-closed unit), U (a feedforward loop, the compact-closed co-unit) and f (a rectangle box, the only morphism so far in our category). We also discovered two ways to combine components, writing them next to each other (composition) or stacked on top of each other (tensor).

The obvious two tasks were first to write the formula corresponding to our running example diagram then drawing a diagram from a given formula. Some typical results are below.

Posted in teaching | Comments Off

Why computers can’t think (III)

This is the final instalment of a series of posts on machine consciousness. It’s a fascinating topic but now that the Easter holidays are over I will need to refocus my attention on the equally fascinating topics I am actually paid to research.

The motivation for this series of posts is the recent personal realisation that consciousness (having subjective first-person experiences) cannot be explained as a purely computational phenomenon. The consciousness-as-computation view is widely held, especially among computer scientists. It is a very convenient materialist and physicalist answer to some of the deepest questions about the mind. I used to subscribe to this view, but now I realise it is based on some misunderstandings of what computation is. The gist of the argument is that a computing device, if it’s complex enough, could somehow develop consciousness. Our own human consciousness is given as an instance of this, because our brains, as computing devices, have the required level of complexity. To emphasise: we are not talking about the ability to mimic a consciousness, we are talking about having a consciousness.

In the first post on this topic I argued that from the Church-Turing thesis it follows that any complex computing device can be reduced to a Universal Turing Machine, which is a simple device, with a very very long tape — so there is nothing qualitatively special about the complexity of the device. In the second post I argued that even the universality of a Turing Machine is not a special property, because given a conscious UTM we can construct a finite state machine which can have some of the subjective experiences of the UTM. Neither of these two arguments are conclusively proving that a computer cannot think, but they show that the common view that computers can think is based on some bad arguments, a misunderstanding of what computers are.

No doubt, our brain does a lot of computation. As a computer it is an enormous finite state machine. The question is: does our consciousness arise out of the computation it does, or does it arise out of it being a brain? In other words, if we were to simulate all its computations in a computer, would that computer have the subjective experiences we have (joy, love, despair, confusion, etc.) or is it a non-computational function of the brain biological processes? Considering the arguments in my previous two posts, I incline towards the latter. It’s not an idealist or dualist position, it’s still materialist and physicalist, but it entails that there is a lot about the brain that we don’t know.

I will attempt now a stronger argument that machines cannot think by trying to derive a contradiction. Because there is nothing special about UTMs as far as consciousness goes, and because the brain itself is a FSM, let us focus on the (im)possibility of FSMs to have a subjective experience. For simplicity let us further focus on deterministic FSMs since they are simpler and as expressive as the non-deterministic ones.

Before we proceed, let us set some things straight. A FSM is commonly understood as an abstract structure, whereas we will consider the physical realisation of a FSM as a physical system with inputs, outputs and changing internal state. The following is not obvious, but essential: if a FSM is to have a subjective experience (consciousness) and if consciousness is computational then the nature of that subjective experience will only depend on the abstract FSM and not on the way the FSM is physically realised. So if a particular FSM is realised as a brain and it has certain subjective experiences, then if we realised it as a silicone computer it will have precisely the same. This is required for the sake of argument: if we allow the subjective experience to depend on the physical realisation of the FSM then it is no longer a computational phenomenon and we are done. Another even less obvious assumption is that if a FSM can give rise to a subjective experience then that particular subjective experience is not associated with the FSM as a structure, but must be associated with a particular run of that FSM. In other words if the FSM has a subjective experience it should arise only out of its interaction with the environment and the evolution of its internal state, i.e. the computation. As before, the argument is that if it arises otherwise then it makes consciousness not of a computational nature, and we are done. A final and essential clarification, of what it means to implement an abstract FSM as a concrete physical system: it means to establish a FSM homomorphism between the latter and the former. This means that if we can observe the states, inputs and outputs of the concrete FSM we can “decode” the states, inputs and outputs of the abstract FSM (using the homomorphism) so that the transitions are preserved.

Let us now try to derive a contradiction out of the existence of consciousness-producing FSMs. More precisely, we will show that the consciousness of the brain cannot arise simply out of its computational activity. Computationally, we know that the brain must be a FSM. Let us take two runs of that FSM, of equal length, but corresponding to clearly distinct mental states (e.g. one is taken on a happy day, one on a sad day). These runs correspond to traces of inputs, outputs and internal states. If consciousness is computation, whenever we run a FSM isomorphic to the brain-as-a-FSM so that the two traces are realised, that isomorphic FSM will feel happy, respectively sad.

But if, as we established, the subjective experiences depends on the run rather than on the FSM, in each case we can discard all the states that are not used and end up with two non-isomorphic FSMs which can realise two distinct subjective experiences if fed with particular inputs, and some other behaviour we don’t care about if fed with other inputs. Now let us construct the product of these two FSMs, i.e. the FSM with the transition

f(s_1,s_2)(i_1, i_2)=p(f_1 s_1 i_1, f_2 s_2 i_2)

where f_i are the two transition functions and p is the isomorphism  p ((s_1, o_1), (s_2, o_2)) = ((s_1, s_2), (o_1, o_2))

And this gives us the contradiction: a physical realisation of the product automaton is an implementation of both automata, by the composition of the implementation homomorphism with each projection homomorphism. But if subjective experience is purely computational then the same run of the product automaton must give rise to two distinct yet simultaneous subjective experiences. But having two distinct subjective experiences simultaneously — assuming that even makes sense — is not the same experience as having just one of the two alone. To me, this is a contradiction.

A FSM cannot have a subjective experience just by virtue of its computational behaviour. I suppose this must be for the same reason that a computer doesn’t get wet if it simulates a liquid and it doesn’t become radioactive if it simulates a nuclear explosion — no matter how precise the simulation. Why is a brain different? Why or how does a brain “make” consciousness? We only know that it does, because we experience it first hand, but we know little else.

A final qualification: when I say that machines cannot think I mean that they cannot think qua machines. This argument doesn’t rule out panpsychism — the doctrine that everything is more-or-less conscious — or idealism — the doctrine that the mind has a non-physical nature. But this argument doesn’t require them either. For details, I refer the reader to the work of Searle which proposes a modest and sensible programme which starts from the doctrine that consciousness is a biological process specific to the brain.

Acknowledgements. Steve Zdancewic and Lennart Augstsson pointed out that the brain must be a FSM thus simplifying this argument quite a lot.

Posted in armchair philosophy | 6 Comments

The category of cardboard rectangles and bits of wool string (I)

I am conducting an educational experiment: teaching children in year 2 and 3 (7-8 years old) about categorical concepts, more precisely about monoidal categories. Monoidal categories are an appealing formalism because of their connections with diagrams [1]. So we just discover together a bunch of stuff about diagrams.

The first lesson started with a general conversation on what Mathematics is and, unsurprisingly, children were surprised to learn that it is not all about arithmetic. The idea of mathematics without numbers seemed strangely exciting at least to some of them — a boy clenched his fist and muttered ‘yes!’ to himself when I announced that we will do the mathematics of shapes. Because this is highly experimental only 10 children were invited to participate, among the more gifted in their year. I divided them up into teams and assigned them tasks. The first task was to cut rectangles out of cardboard, punch holes in the four corners and label both the rectangles and the corners: these were going to be the morphisms of the category. Preparing 8 such rectangles proved to be unexpectedly challenging and took quite a lot of time. The children in each team first spent quite some time deciding on how they will cooperate. Two common models were parallel (everyone does the same thing independently) and pipelined (each one does a specialised task). There was also the unexpected difficulty of operating a hole puncher so that you punch just one hole rather than a pair of holes, with at least one team being unable to figure out how to insert the cardboard into the puncher to achieve this. The same team struggled with 5 or 6 ruined “morphisms” because the hole was too close to the edge.

As is usually done in categories of diagrams, identities and other structural morphisms are represented by lines. In our case we were going to use wool strings, tied to the holes. The hole patterns are, of course, the type of the morphism, i.e. the objects.  The next task was just to get familiar with the mechanics of tying bits of string to little holes in cardboard rectangles. This was a skill essential in building more complex morphisms out of simpler morphisms and compositions. Just like before, the level of dexterity of the children was unexpectedly low. For example several did not know how to tie a double knot and they had to learn that first. In the end, however, all teams mastered the task.

The final task was for each team to construct a composite morphism, which was specified in a relational style as a list of connections between node labels: 1c to 2a, 1d to 2b, 2c to 1a, 2d to 1b. With the cardboard morphisms at the ready and the string-tying skills mastered, this task was completed relatively painlessly and error free. Each team displayed their creation.

In the conclusion of the lesson, with all the diagrams displayed, I asked the children whether these were the same diagrams or different diagrams. Intuitively, because they knew they built the diagrams using the same instructions, they all answered that these were the same diagrams. I challenged their answer, objecting that the cardboard was different colours and sizes and the strings were different colours and lengths, to which they replied that these don’t matter. Indeed, what we wanted to matter is just the what-is-connected-to-what-ness of the diagram.

To summarise, in this first lesson we covered the basic skills of constructing morphisms in the the category of cardboard rectangles and bits of wool string: making cardboard morphisms and composing by tying the bits of string, using a relational-style specification. The children also started to develop the basic intuitions which will lead up towards a topological notion of equality of diagrams.

Figure. Two equal morphisms in the category of cardboard rectangles and bits of wool string.

Posted in teaching | 2 Comments

Why computers can’t (really) think (II)

This is a follow on to my previous post Why computers can’t think. The reactions and comments to it helped me to tighten and clarify the argument.

I am going to use the term mental event in order to denote a subjective first person experience which is delimited in time, so that we can talk of a beginning and an end for this experience. Like when you stick your hand in ice-cold water you start feeling the cold, and when you withdraw it you stop feeling the cold, perhaps after a few seconds delay. It is not important that the temporal delimitation is precise, just finite.

Suppose for the sake of argument that consciousness is a computational phenomenon, so mental events can occur in a computer. For simplicity, let us reduce this computer to a Universal Turing Machine (UTM) and a (very long) tape — the Church-Turing thesis says that no matter how complicated a computer or a computer program we can always reduce them to UTMs. This means that when we run the UTM on that particular tape a mental event will occur. Let us consider the state of the UTM when the mental event occurs (the internal state, the tape state, the head position) and also let us discard all the tape that is not used by the UTM during the occurrence of the mental event.

The behaviour of a UTM is determined by its state so, necessarily, whenever we run the UTM from that internal state on that fragment of tape the mental event will occur. But a UTM with a finite tape is equivalent, computationally, to a finite state machine, to an automaton. So for any mental event we can build an automaton that creates that mental event.

I hope this consequence is striking on at lest two accounts. First, we have a FSM which can experience consciousness. Second, we have a stunted kind of consciousness which consists of one single mental state all the time — it doesn’t even have the absence of that state. These two consequences are not quite a contradiction, logically, but I find them unacceptable. If we want to accept conscious UTMs we must accept conscious finite state automata and we must accept the fact that consciousness can be dissected into atomic mental events which can be reproduced by automata.

Notes

  1. In the above we need to make an important distinction which sometimes gets lost: I am not talking about whether computers can pass the Turing test (I think they will be eventually be able to do that) so I am not talking about whether computers can exhibit something like consciousness. I am also not talking about whether we know how to detect for sure whether a computer can have conscious thoughts. What I am talking about is whether a computer can have a subjective first-person experience — or, more precisely still, what are the other things that we need to accept if we believe computers to be able to have consciousness. The other topics are of course interesting and important, they are just not the subject of this argument.
  2. The mind-as-computation view is its most prevalent materialist explanation, so what are we left with if we reject it. God? Magic? Not necessarily. One unavoidable consequence is that our brains do more than computation. They do a lot of computation, that is established, but they do more than that. They ‘do’ consciousness in a way that a computer without a brain cannot do. We don’t know the details though. Materialism means that reality can be explained as particles moving in fields of force, so the necessary consequence to this argument is that there are yet unknown particles or fields involved in the operation of the brain. Penrose, in The Emperor’s New Mind, tries to blame for example consciousness on quantum effects, which is rather unconvincing, but it is not silly as an attempt to find physical distinctions between brains and computers.
  3. Much of the mind-as-computation view (Dennett is a prime exponent of it) rests on the premiss that there is a qualitative distinction between simple and complex systems. However, the Church-Turing thesis makes it quite clear that the only difference is in the length and contents of the tape of a UTM. Concepts such as complex sytem or virtual machine or internal architecture sound mysterious and ungraspable so it seems plausible to attribute other mysterious and ungraspable phenomena such as consciousness to them. But if you go through a book by Dennett and replace complex system with a simple controller with a very long tape it is quite apparent how silly those arguments are.
Posted in armchair philosophy | 1 Comment

Why computers can’t think

I have a weakness for philosophy of the mind, but I find most of the writing on the topic either impenetrable or laughable. Recently, a good friend who is a philosopher recommended Searle’s book Mind: A Brief Introduction. It is indeed an excellent book. It makes lucid and careful arguments and, gratifying for a non-philosopher, it shows why most influential theories of the mind are wrong. The programme it sets is very modest and meant to delegate matters from philosophy to science as quickly as possible, but more on this later.

Searle is famous, among other things, for the Chinese Room argument, which is meant to clarify that mimicking understanding and having the experience of understanding are not the same thing. I found the argument compelling. The argument is also meant to have a farther reaching consequence, that the mind cannot be (just) a computational process.

The second part of the argument is less convincing, because you cannot in general prove impossibility by example. It’s close to a straw-man fallacy. I think a different argument is needed here and I will try to formulate it. It takes the Church-Turing thesis as its starting point: anything that can be computed can be computed by a Turing machine. If the mind is a computational phenomenon it must be exhibited by an appropriately programmed Turing machine.

If you don’t know what a Turing Machine is, have a look at one such device. It consists of a tape, a read-write head, and a controller (which can be as small as two-state, three-symbol automaton). I take it as obvious that this device cannot experience thought (i.e. consciousness), but let us elaborate for the sake of argument. The subjective experience of thought should come, if it arises at all, from the contents of the tape, the software. But that is not possible because TM computation is a local phenomenon: at any moment the machine is only “aware” of the controller state and the tape contents. The TM is not “aware” of the global contents of the state. Which would mean that the tape experiences thought. But the software on the tape can be a mind only insofar as it works for a particular encoding of the TM, e.g. the set of symbols. So the “seat” of artificial consciousness is not in the hardware nor in the software, but in the relation between them. But a mind is a real concept, and cannot be contingent on an abstract one. A TM and its tape cannot be or have a mind.

Searle doesn’t quite make this point, but he makes a similar one. A computation is defined by the intention to compute. A computer is a thing that is used by someone to compute — just like a tool is an object with an intended function, otherwise it is just stuff or perhaps an artistic artefact. Any physical system, to the extent that it has a state that changes according to laws that are known, can be used to compute. But in the absence of intent it is not a computer, it is simply a physical system. So the reason why mind cannot be explained as a computational phenomenon is that computation presupposes intent, which presupposes a mind. This would be a circular definition.

A computation is a supra-conscious behaviour: it express the (conscious) intent of another mind. For someone who practices and teaches mathematical subjects the distinction between conscious and algorithmic thought is quite clear. A student (but also a mathematician) can carry out a computation (or a proof) by following an algorithm either deterministically (a simple calculation) or heuristically (manipulation of an expression by symbol-pushing to achieve a proof-objective). This activity is not experienced as conscious in the same way that a mathematical activity involving genuine understanding is. There is no a-ha! moment; the quality of it is not intellectual excitement but a vague sense of boredom. However, this activity is not unconscious either in the sense that one is aware of something that is going on and can start it or stop it at will. I think a new term, such as supra-conscious, is required to capture this experience. I think this term also describes the experience of the Chinese Room operator.

What is mind then? I find Searle’s modest proposal acceptable: the mind is a biological phenomenon, a function of the brain. We don’t understand the details of this function, but it is not of a (purely) computational nature. The brain is not (just) a computer, so no computational simulations of the brain will produce a mind. 

Posted in armchair philosophy | 6 Comments

Real time real numbers

At the Synchronous languages workshop I have attended a very interesting and amusing talk given by Timothy Bourke and Marc Pouzet regarding real time real number computation and simulation. It’s a take on the classic puzzle: Suppose that two cars A and B are 100 km apart and they drive towards each other at 50 km/hr each. A fly flies from A towards B at 80 km/hr and, when it reaches it, flies back to A and so on. Whenever the fly reaches a car it turns and flies to the other car. The usual question is “how many km the fly travels before it is crushed between the two cars?”. The question with a twist is: “how many turns the fly makes before it is crushed between the two cars?”.

This is of course a Zeno-like situation and, as the problem is stated, the fly must make an infinite number of turns even though it travels a finite distance. So far so good. The amusing turn of the presentation is that there exist commercial (and expensive) software packages out there which can simulate this scenario. The market leader Simulink calculates the number of turns very efficiently: 42. Of course, 42 is an unnaceptably poor approximation for infinity. Or maybe it is.

The fist observation is that, of course, Simulink uses not real numbers but floating point representations, which introduce round-off errors. The second observation is that arbitrary precision representations for real numbers are possible and libraries are actually available. How would Simulink  work if it used arbitrary precision? For now let us ignore issues of efficiency: producing an answer as hilariously wrong as 42 cannot be excused by the fact that it was produced “efficiently”. It is rather preferable that an answer is not produced at all! Even if a Zeno-like computation would diverge, without producing any digits of output, it would be preferable to a misleadingly confident 42.

I have not tried to set up this puzzle using arbitrary-precision arithmetic so I am not sure how it works out. If we calculate not in ℝ but in the unit interval, would the computation produce no output digits or would it produce a stream of digits converging towards 1? I might just try it, but as interesting as the result would be, it wouldn’t settle the discussion though.

Marc Pouzet explained to me very convincingly the difference between simulation and computation in a hybrid continous-discrete system. The continuous values, at least in this case, are outside the computation. The computation samples continuous values, converts them to floating point and operates with them. So there are not one but two sources of possible errors in the simulation. The first one is, as discussed, due to the use of floating points, which introduce round-off errors. The second one is due to the sampling error, when the external real value is converted for the first time into a floating point representation. One of the most interesting sampling constructs is a time-sampling construct (up) which monitors an (external) continuous function and issues a (boolean) signal when the function changes sign.

How would both these sources of errors be affected by switching to arbitrary precision? Programmatic numerical errors will be eliminated, in the usual sense: all digits of a value that are calculated are correctly calculated. But what about sampling errors? In arbitrary precision arithmetic equality is not decidable because, simplistically put, you don’t know how many digits to look at before you can figure out whether 0.999999… and 1.00000… are equal or not.

So what would happen to Simulink if we wanted to use arbitrary precision both in the computation and in the simulation? Something beautiful would happen! Simulink would no longer allow you to describe physically unrealisable models. Physical functions must be computable. The fix is very simple: whenever you sample a continuous value, the margin of error of the sampling process must be specified. The margin of error will tell you that you don’t need to bother looking at the digits that mean less than the sampling error because they are irrelevant.

This is a somewhat subtle point: even if you are sampling the continuous external value into a value which you represent as arbitrary precision, it still needs to be made computable through the specification of a sampling error. The mechanism of arbitrary precision used in the computation is only meant not to introduce additional errors, but it cannot compute the uncomputable! And sampling a real number with infinite precision is uncomputable because it requires an infinite amount of information to be transmitted in a finite amount of time.

How about the up construct: it is external and it only needs to communicate a finite amount of information to the program (1 bit). Why is it unrealistic? The uncomputability of up comes out of the infinitely precise timing of the signal. In other words, the amount of information is still infinite because in addition to the bit representing the signal we have the potentially infinite precision of the moment at which it is received.

Of course, specifying a sampling error is perfectly reasonable: any instrument that can be produced must have finite precision, therefore must introduce a sampling error. The use of arbitrary precision arithmetic only prevents you from writing simulations that are nonsensical anyway. This is a good thing!

In the case of our initial puzzle, the introduction of a sampling error would quickly lead to the identification of a special case in the simulation, when the cars A and B are so close that the fly cannot tell whether it is on A or on B. And in this case the simulation could stop, and the answer could be 42! However, in this case the answer would be correct, because this would be correct behaviour for a system in which the sensors have a (known) sampling error. The greater the precision of the instruments, the more turns the fly will be able to make before it enters the uncertainty area.

As my colleague and area expert Martín Escardó can attest, I was quite skeptical regarding the usefulness of arbitrary precision real numbers. I was clinging to the faulty intuition that the real real numbers are things that you can grasp and work with. They are more complicated than that, it turns out. And the fact that the arbitrary precision representation forces you to face their subtlety, and thus avoid unrealistic computation (and simulation) scenarios is to me a strong indication that arbitrary precision real numbers could be quite useful!

Further reading

  • M.H. Escardo. Computing with real numbers represented as infinite sequences of digits in Haskell. CCA’2009 tutorial [zipfile]
  • Zélus: A synchronous language with ODEs [link]
  • T. Burke and M. Pouzet. A slow afternoon chez PARKAS and a very fast fly [link]
Posted in General | 5 Comments

Searching/fumbling for a proof assistant

I was always very impressed with people who use proof assistants to do their maths. It seems like something from the future, mechanised, unerring, complicated, slightly incomprehensible, and it makes traditional pen-and-paper (well, LaTeX) maths seem quaint, artisanal, even obsolete. The contrast cannot be more striking when it comes to proving some complicated system (e.g. an operational semantics) satisfies some boring sanity property (e.g. reduction preserves typing). For such work lying at the border between mathematics and bureaucracy the use of proof assistants makes such a big difference that it is quite likely that in the not-too-distant future their use will be mandated. Mathematically, such proofs are rarely interesting (and the interesting bits can always be factored out into nice-to-present lemmas) but quite important. It’s a lot more like making sure a piece of code is free of bugs than showing that a piece of maths is interesting.

So I am sold on the importance of proof assistants, but which one to use? The first one I’ve tried was Coq. It is used very broadly, it has an established community of developers, and has been the engine behind some heroic achievements such as finally conclusively proving the Four Colours Theorem.

After a couple of weeks of reading about it, watching some tutorials, playing with it, I produced a tutorial/demo/taster where I prove some simple things about lists in Coq. There was enough information in this tutorial to allow some (smarter than average) first year students to work out some of their assignments using Coq, which was very impressive to me.

More recently I became exposed to Agda and, for the sake of comparison, I reworked the same tutorial/demo/taster in Agda this time. Agda is newer and smaller than Coq, and not quite as well known.

It is something a bit strange about doing a tutorial when yourself are a beginner. From a certain point of view it is dangerous because you may say or do things that are, for the expert, unusual or wrong. On the other hand, there is something positive to say about the innocent approach of the non-expert. Where I fumble you are likely to fumble, where I get stuck, you are likely to get stuck. These are the kind of tutorials where you don’t actually learn Coq/Agda, but you learn whether you want to learn Coq/Agda.

You can obviously draw your own conclusions from what you see in the videos, but I will also share my impressions. Both systems are wonderful and there is something about them that tells me that this is what the future should be like.

Working with Coq felt like driving a tank. Working with Agda felt like driving a Formula 1 car. Coq feels big and powerful, but not very elegant. Agda feels slick and futuristic but you can only wonder how the lack of a tactics language will work out in the trenches of a big boring project. Some of the more wonderful features of Agda (mixfix operators, seamless Unicode support, interactive development) I can see trickling into mainstream programming languages. The elegant proof-term syntax of Agda I can see more appealing to mathematically inclined users. Whereas the nuclear-powered tactic language of Coq is bound to appeal to hackers who just want to get (impressive) things done. But that’s just my opinion, and I don’t think it’s a very original or unique one.

Perhaps the main difference, for my money, between the two is the way I get stuck trying to get things done. In Coq it is quite often the case that I know how to achieve a proof goal but I cannot figure out what is the tactic I should use to achieve it or how to find it — so I blame Coq. Whereas in Agda, when I get stuck it’s because I made a mistake and I get an incomprehensible error message — but then I blame myself. Because of that, I end up being quite frustrated working with Coq whereas I derive a lot of satisfaction working with Agda.

 

Posted in proof assistants | 4 Comments

Seamless computing

In an earlier post (“Remember machine independence?“) I pointed out my dissatisfaction with the so-called domain specific languages (DSL) which are popping up everywhere nowadays, in support of new and esoteric programming platforms. The name DSL bugs me because I find it misleading. The right name would be ASL: architecture-specific language. They are not meant to solve problems in a particular domain, they are meant to program a particular architecture. Many such languages have some of the trappings of higher-level programming languages but also enjoy unconventional syntax and semantics, designed to match the platform.

I am not buying it. I think it is a mistake. I think the first and foremost principle of a higher-level programming language is that of machine independence. A higher-level programming language should allow the programmer to solve a problem rather than drive a physical system. This was a crazy idea in the 1950s, which went through all the stages that a crazy idea goes through:

First they ignore you. Then they ridicule you. And then they attack you and want to burn you. And then they build monuments to you. (Nicholas Klein)

The you in this quote was John Backus, who I am not sure whether he got a monument out of it (or whether anyone really wanted to burn him), but he did get a Turing Award, which is not bad. Not accidentally, the title of his lecture on the occasion of the award was Can Programming Be Liberated From the von Neumann Style? [PDF]. And machine independence figured high on the agenda.

Now the idea of machine independence is getting ridiculed again. As it was back in the 1950s the reason is exactly the same: computer programming experts are comfortable with machine-dependence in a programming language, and are worried that machine-independence will cause a substantial loss of efficiency. Which is both true and misguided at the same time, unless you are working on system-level programming. For application-level programming (scientific, corporate, etc) it rarely helps that your program is twice as fast if it takes twice as long to write.

In a 1958 paper called Languages, Logic, Learning and Computers, (J. of Computers and Automation) J.W. Carr III writes this:

One would like to be able to use these computers up to the day of the next one’s arrival, with interchange of problems and yet efficiency. Can this be done?

Carr is referring to the upgrade from IBM-650 to the “totally different” IBM-701 and IBM-704. This is before FORTRAN had been invented so the answer was not obvious at the time. But can we extend this point to our situation and replace ‘computers’ with ‘architectures’. Should our programs, written for the CPU, not work for GPU, FPGA, cloud, etc? The skeptics say ‘no’ and often argue that a mismatch between the programming language model and the system model leads to inefficiency. Perhaps, but in many circumstances loss of performance is a price worth paying. If we look at the languages of today, especially functional languages, they do not match the CPU-based architecture very well at all. They are declarative where computers are imperative. They use functions where computers use jumps. To compile (efficiently) one for the other a lot of optimization work is put in and the native systems are augmented by runtime systems to provide essential services such as garbage collection. But this does not invalidate the principle of having good programming languages even if they don’t match the architecture on which they run. Designing machines especially to run functional languages is actually a cool and interesting idea (see the REDUCERON).

Here is the same objection, taken from Carr’s paper:

The problems of efficiency have been often emphasized by programmers who believe strongly in the continuance of hand-programming over translational techniques.

“Hand-programming” means programming in assembly or even binary. “Translational” means using what we now call compilers. It’s safe to say history vindicates Carr and not those overly concerned with efficiency.

Bob Harper’s blog has a number of excellent posts on the duality between programming in the language and programming for the machine. They are all quite relevant to the point I am making, so have a look.

Machine-independence is clearly the necessary ingredient to achieve the kind of portability considered desirable by Carr. So we are doing something about it. We are developing a compiler for a conventional imperative and functional programming language (we call it Verity) which would allow effective and efficient compilation to any architecture. We already have a back end for compiling to FPGAs, which we call the Geometry of Synthesis project. It supports higher-order functions, recursion, separate compilation and foreign function interfaces to native HDL, all in an architecture-agnostic way. I wrote about this project on this blog before.

Distributed seamless computing

Today I want to talk distributed computing. From the point of view of supporting architecture-independent programming, distributed computing is a mess. Lets start with a simplistic program, just to illustrate a point, a program that in a functional programming language I might write as:

let f x = x * x in f 3 + f 4

And suppose that I want to distribute this code on two nodes, one that would execute the function f, one that would execute the main program. I can choose a high-level distributed programming language such as ERLANG. Here is how the code looks:

c(A_pid) -> receive X -> A_pid ! X * X end, c(A_pid).
main() ->
 C_pid = spawn(f, c, [self()]), C_pid ! 3,
 receive X -> C_pid ! 4, receive Y -> X + Y end
 end.

In the first program I describe logical operations such as multiplication and addition, and a logical structure to the program. In the second program I describe a system: I spawn processes and send messages and handle messages. A program which should be trivial is no longer trivial.

Things get even worse if I choose to distribute my program on three nodes: one for f, one for the function applications, one for whatever is left of the main program. This is the ERLANG code for this:

c() -> receive {Pid, X} -> Pid ! X * X end, c().
b(A_pid, C_pid) ->
 receive
 request0 -> C_pid ! {self(), 3}, receive X -> A_pid ! X end;
 request1 -> C_pid ! {self(), 4}, receive X -> A_pid ! X end
 end,
 b(A_pid, C_pid).
main() ->
 C_pid = spawn(f2, c, []),
 B_pid = spawn(f2, b, [self(), C_pid]),
 B_pid ! request0,
 receive X -> B_pid ! request1, receive Y -> X + Y end
 end.

Can you tell that these two ERLANG programs compute the same result? If you initially wrote the first two-node version and decided to re-factor to three nodes how hard would the transition be? You need to rewrite most of the code.

There is a better way. The way you might want to indicate nodes to the compiler should not be more than that:

let f x = (x * x)@A in f 3 + f 4

or

let f x = (x * x)@A in (f 3)@B + (f 4)@B

Is this possible? To the programmer who needs to struggle with the intricacies of writing distributed code this may seem like a pipe dream. But it is possible. Theorists of programming languages have known since the late 80s and early 90s that computation and communication are closely related. Whether it was Milner’s Functions as Processes [PDF], Girard’s Geometry of Interaction [DOI] or the early work on Game Semantics [see my survey] the message is the same: function calls are nothing but structured communication protocols. And this is not just pure esoteric theory, we really knew how to write compilers and interpreters based on it. Nick Benton wrote a very cool tutorial which also applies functions-as-processes to distribution. Why didn’t we write this compiler yet? Because we were afraid it might be too inefficient.

I think we should be braver and more sanguine and have a go at it. This worry could be misplaced or exaggerated. With my student Olle Fredriksson we made the first step, which is writing a compiler for a core functional programming language which can be distributed using light-weight compiler annotations just as described above. We call it seamless computing because it is distributed computing where all the communication between components is via function calls rather than via explicit sending and handling of messages. Explicit communication is the seam in distributed computing. This distribution is communication-free.

A paper describing the compiler was presented at the International Symposium on Trustworthy Global Computing earlier this year [PDF] and there is a video based on it which you can watch online. The compiler is available for download from veritygos.org. It is based on Girard’s Geometry of Interaction and it compiles down to C and MPI. There are some good things and some bad things regarding its efficiency. On the one hand the compiler does not require a garbage collector, which is very desirable in the distributed tokens. On the other, the tokens being exchanged during the computation have a size proportional to the depth of the call stack, which can be large especially for recursive calls. But this can be avoided by using a better implementation, still without requiring a distributed garbage collector. This is ongoing work. We are also currently expanding the compiler to handle the whole of Verity, so that it has concurrency and shared state. We are making great progress and it’s a lot of fun both as a theoretical challenge and as an implementation and engineering one.

We will keep you posted as things develop!

Posted in seamless computing | 2 Comments

The ultimate programming language?

In several earlier posts I was considering how the conventional notion of contextual equivalence, which relies relies on quantification over syntactic contexts, is sometimes not suitable and a more direct notion of equivalence should be used. This notion of equivalence, which I like to call system-level semantics, should be induced directly by what is observable by the execution environment as defined not only by language but also by abstract models of the compiler and the operating system.

More concretely, what I proposed (in a paper [PDF] with Nikos Tzevelekos) is that the system context should be defined along the lines of a Dolev-Yao attacker. This context is omnipotent but not omniscient: the program can hide information (data, code) from the context, but once the context gets this information it can use it in an unrestricted way. What we show then in our paper is how this attacker can produce “low-level” attacks against code, i.e. attacks which cannot be carried out in the language itself.

But what if the system-level context and the syntactic context are just as expressive? Then you have a fully abstract system-level semantics. This means that the attacker/context cannot produce any behaviours that cannot be produced in the language itself. Why is this cool? Because it means that all your correctness arguments about source level can be applied to the compiled code as well. You can think of a fully abstract system-level semantics as an abstract specification for a tamper-proof compilation scheme.

In a previous post I describe such a fully abstract SLS. Full abstraction is achieved by using monitors that serve as tamper detectors: if the environment behaves in ways which are illegal then the program takes defensive action (stops executing). So the full abstraction is achieved by introducing runtime checks for detecting bad behaviour, as in control-flow integrity, for example (see Abadi &al [PDF]).

But can you achieve a fully abstract SLS the other way around? By having a programming language that is so expressive that it can implement any attacks performed by a DY-style environment? I think the answer is ‘yes’. And such a language is surprisingly civilised: lambda calculus with state and control introduced in a uniform way. This language, called lambda-mu-hashref, was introduced especially to study security properties in a language capable of producing very powerful contexts. A fully abstract SLS-type semantics is given in the paper.

I find this quite remarkable, the fact that a civilised programming language which is not that much different from ML can induce contexts as powerful as a DY attacker. Any (reasonable) behaviour should be possible to implement using it. There is nothing else you can add to it to make it more expressive. It is, from this point of view, the ultimate programming language. 

Posted in system level semantics | Comments Off

Two consequences of full abstraction

In a recent paper [pdf] with my student Zaid Al-Zobaidi we describe two enhancements to the GoI compiler.

The first one is a tamper-proof mechanism which guarantees that no low-level attacks against the compiled code are possible. This is very nice to have because it means that all properties that you can prove about a program at code level are preserved by the compilation process. Of course, since we are generating hardware (rather than machine code) the ways in which a malicious environment can mount an attack are very constrained. Most importantly, the code cannot be changed, as there is no code just a fixed circuit. However, the compiler is compositional and can interface with libraries written in other languages, and such libraries can violate language invariants in unexpected ways. For example, in the hardware model calls/returns to/from functions are associated with fixed ports in the interface, so a malicious environment can return from a function that wasn’t even called!

The way our synthesised hardware can be made tamper-proof is by providing interaction monitors which take defensive action (e.g. halting the circuit) if the environment behaves in ways which are inconsistent with the behaviour of circuits obtained from compiling legal programs — e.g. circuits which act like functions that return without being called. This is possible because we have a fully abstract model of our programming language, i.e. a model which correctly and completely characterises programs written in the language. With the addition of tamper-detecting monitors, the game semantics and the system-level semantics of the language become the same: the system must play by the rules we impose!

We are not the first to note the importance of fully abstract compilation in the prevention of low-level attacks [Abadi 98] but as far as we can tell our technique is the first compiler to do that for a substantial programming language. Of course, if compiling to machine code this task becomes hard-to-impossible.

The second improvement is due to the fact that the environment can observe the code only in a limited way, so more states of the system seem equivalent from the outside. This is quite similar in principle to bisimulation in a typed context, although the technical details differ. Our laxer notion of ‘equivalence’ lacks transitivity so it’s not really. However, from the point of view of reducing the state space of the automata, hiding behind a monitor turns out to be very effective.

The paper was presented by Zaid at ICE 2012.

Posted in game semantics, Geometry of Synthesis, system level semantics | 1 Comment