Why computers can’t think

I have a weakness for philosophy of the mind, but I find most of the writing on the topic either impenetrable or laughable. Recently, a good friend who is a philosopher recommended Searle’s book Mind: A Brief Introduction. It is indeed an excellent book. It makes lucid and careful arguments and, gratifying for a non-philosopher, it shows why most influential theories of the mind are wrong. The programme it sets is very modest and meant to delegate matters from philosophy to science as quickly as possible, but more on this later.

Searle is famous, among other things, for the Chinese Room argument, which is meant to clarify that mimicking understanding and having the experience of understanding are not the same thing. I found the argument compelling. The argument is also meant to have a farther reaching consequence, that the mind cannot be (just) a computational process.

The second part of the argument is less convincing, because you cannot in general prove impossibility by example. It’s close to a straw-man fallacy. I think a different argument is needed here and I will try to formulate it. It takes the Church-Turing thesis as its starting point: anything that can be computed can be computed by a Turing machine. If the mind is a computational phenomenon it must be exhibited by an appropriately programmed Turing machine.

If you don’t know what a Turing Machine is, have a look at one such device. It consists of a tape, a read-write head, and a controller (which can be as small as two-state, three-symbol automaton). I take it as obvious that this device cannot experience thought (i.e. consciousness), but let us elaborate for the sake of argument. The subjective experience of thought should come, if it arises at all, from the contents of the tape, the software. But that is not possible because TM computation is a local phenomenon: at any moment the machine is only “aware” of the controller state and the tape contents. The TM is not “aware” of the global contents of the state. Which would mean that the tape experiences thought. But the software on the tape can be a mind only insofar as it works for a particular encoding of the TM, e.g. the set of symbols. So the “seat” of artificial consciousness is not in the hardware nor in the software, but in the relation between them. But a mind is a real concept, and cannot be contingent on an abstract one. A TM and its tape cannot be or have a mind.

Searle doesn’t quite make this point, but he makes a similar one. A computation is defined by the intention to compute. A computer is a thing that is used by someone to compute — just like a tool is an object with an intended function, otherwise it is just stuff or perhaps an artistic artefact. Any physical system, to the extent that it has a state that changes according to laws that are known, can be used to compute. But in the absence of intent it is not a computer, it is simply a physical system. So the reason why mind cannot be explained as a computational phenomenon is that computation presupposes intent, which presupposes a mind. This would be a circular definition.

A computation is a supra-conscious behaviour: it express the (conscious) intent of another mind. For someone who practices and teaches mathematical subjects the distinction between conscious and algorithmic thought is quite clear. A student (but also a mathematician) can carry out a computation (or a proof) by following an algorithm either deterministically (a simple calculation) or heuristically (manipulation of an expression by symbol-pushing to achieve a proof-objective). This activity is not experienced as conscious in the same way that a mathematical activity involving genuine understanding is. There is no a-ha! moment; the quality of it is not intellectual excitement but a vague sense of boredom. However, this activity is not unconscious either in the sense that one is aware of something that is going on and can start it or stop it at will. I think a new term, such as supra-conscious, is required to capture this experience. I think this term also describes the experience of the Chinese Room operator.

What is mind then? I find Searle’s modest proposal acceptable: the mind is a biological phenomenon, a function of the brain. We don’t understand the details of this function, but it is not of a (purely) computational nature. The brain is not (just) a computer, so no computational simulations of the brain will produce a mind. 

Posted in armchair philosophy | 3 Comments

Real time real numbers

At the Synchronous languages workshop I have attended a very interesting and amusing talk given by Timothy Bourke and Marc Pouzet regarding real time real number computation and simulation. It’s a take on the classic puzzle: Suppose that two cars A and B are 100 km apart and they drive towards each other at 50 km/hr each. A fly flies from A towards B at 80 km/hr and, when it reaches it, flies back to A and so on. Whenever the fly reaches a car it turns and flies to the other car. The usual question is “how many km the fly travels before it is crushed between the two cars?”. The question with a twist is: “how many turns the fly makes before it is crushed between the two cars?”.

This is of course a Zeno-like situation and, as the problem is stated, the fly must make an infinite number of turns even though it travels a finite distance. So far so good. The amusing turn of the presentation is that there exist commercial (and expensive) software packages out there which can simulate this scenario. The market leader Simulink calculates the number of turns very efficiently: 42. Of course, 42 is an unnaceptably poor approximation for infinity. Or maybe it is.

The fist observation is that, of course, Simulink uses not real numbers but floating point representations, which introduce round-off errors. The second observation is that arbitrary precision representations for real numbers are possible and libraries are actually available. How would Simulink  work if it used arbitrary precision? For now let us ignore issues of efficiency: producing an answer as hilariously wrong as 42 cannot be excused by the fact that it was produced “efficiently”. It is rather preferable that an answer is not produced at all! Even if a Zeno-like computation would diverge, without producing any digits of output, it would be preferable to a misleadingly confident 42.

I have not tried to set up this puzzle using arbitrary-precision arithmetic so I am not sure how it works out. If we calculate not in ℝ but in the unit interval, would the computation produce no output digits or would it produce a stream of digits converging towards 1? I might just try it, but as interesting as the result would be, it wouldn’t settle the discussion though.

Marc Pouzet explained to me very convincingly the difference between simulation and computation in a hybrid continous-discrete system. The continuous values, at least in this case, are outside the computation. The computation samples continuous values, converts them to floating point and operates with them. So there are not one but two sources of possible errors in the simulation. The first one is, as discussed, due to the use of floating points, which introduce round-off errors. The second one is due to the sampling error, when the external real value is converted for the first time into a floating point representation. One of the most interesting sampling constructs is a time-sampling construct (up) which monitors an (external) continuous function and issues a (boolean) signal when the function changes sign.

How would both these sources of errors be affected by switching to arbitrary precision? Programmatic numerical errors will be eliminated, in the usual sense: all digits of a value that are calculated are correctly calculated. But what about sampling errors? In arbitrary precision arithmetic equality is not decidable because, simplistically put, you don’t know how many digits to look at before you can figure out whether 0.999999… and 1.00000… are equal or not.

So what would happen to Simulink if we wanted to use arbitrary precision both in the computation and in the simulation? Something beautiful would happen! Simulink would no longer allow you to describe physically unrealisable models. Physical functions must be computable. The fix is very simple: whenever you sample a continuous value, the margin of error of the sampling process must be specified. The margin of error will tell you that you don’t need to bother looking at the digits that mean less than the sampling error because they are irrelevant.

This is a somewhat subtle point: even if you are sampling the continuous external value into a value which you represent as arbitrary precision, it still needs to be made computable through the specification of a sampling error. The mechanism of arbitrary precision used in the computation is only meant not to introduce additional errors, but it cannot compute the uncomputable! And sampling a real number with infinite precision is uncomputable because it requires an infinite amount of information to be transmitted in a finite amount of time.

How about the up construct: it is external and it only needs to communicate a finite amount of information to the program (1 bit). Why is it unrealistic? The uncomputability of up comes out of the infinitely precise timing of the signal. In other words, the amount of information is still infinite because in addition to the bit representing the signal we have the potentially infinite precision of the moment at which it is received.

Of course, specifying a sampling error is perfectly reasonable: any instrument that can be produced must have finite precision, therefore must introduce a sampling error. The use of arbitrary precision arithmetic only prevents you from writing simulations that are nonsensical anyway. This is a good thing!

In the case of our initial puzzle, the introduction of a sampling error would quickly lead to the identification of a special case in the simulation, when the cars A and B are so close that the fly cannot tell whether it is on A or on B. And in this case the simulation could stop, and the answer could be 42! However, in this case the answer would be correct, because this would be correct behaviour for a system in which the sensors have a (known) sampling error. The greater the precision of the instruments, the more turns the fly will be able to make before it enters the uncertainty area.

As my colleague and area expert Martín Escardó can attest, I was quite skeptical regarding the usefulness of arbitrary precision real numbers. I was clinging to the faulty intuition that the real real numbers are things that you can grasp and work with. They are more complicated than that, it turns out. And the fact that the arbitrary precision representation forces you to face their subtlety, and thus avoid unrealistic computation (and simulation) scenarios is to me a strong indication that arbitrary precision real numbers could be quite useful!

Further reading

  • M.H. Escardo. Computing with real numbers represented as infinite sequences of digits in Haskell. CCA’2009 tutorial [zipfile]
  • Zélus: A synchronous language with ODEs [link]
  • T. Burke and M. Pouzet. A slow afternoon chez PARKAS and a very fast fly [link]
Posted in General | 5 Comments

Searching/fumbling for a proof assistant

I was always very impressed with people who use proof assistants to do their maths. It seems like something from the future, mechanised, unerring, complicated, slightly incomprehensible, and it makes traditional pen-and-paper (well, LaTeX) maths seem quaint, artisanal, even obsolete. The contrast cannot be more striking when it comes to proving some complicated system (e.g. an operational semantics) satisfies some boring sanity property (e.g. reduction preserves typing). For such work lying at the border between mathematics and bureaucracy the use of proof assistants makes such a big difference that it is quite likely that in the not-too-distant future their use will be mandated. Mathematically, such proofs are rarely interesting (and the interesting bits can always be factored out into nice-to-present lemmas) but quite important. It’s a lot more like making sure a piece of code is free of bugs than showing that a piece of maths is interesting.

So I am sold on the importance of proof assistants, but which one to use? The first one I’ve tried was Coq. It is used very broadly, it has an established community of developers, and has been the engine behind some heroic achievements such as finally conclusively proving the Four Colours Theorem.

After a couple of weeks of reading about it, watching some tutorials, playing with it, I produced a tutorial/demo/taster where I prove some simple things about lists in Coq. There was enough information in this tutorial to allow some (smarter than average) first year students to work out some of their assignments using Coq, which was very impressive to me.

More recently I became exposed to Agda and, for the sake of comparison, I reworked the same tutorial/demo/taster in Agda this time. Agda is newer and smaller than Coq, and not quite as well known.

It is something a bit strange about doing a tutorial when yourself are a beginner. From a certain point of view it is dangerous because you may say or do things that are, for the expert, unusual or wrong. On the other hand, there is something positive to say about the innocent approach of the non-expert. Where I fumble you are likely to fumble, where I get stuck, you are likely to get stuck. These are the kind of tutorials where you don’t actually learn Coq/Agda, but you learn whether you want to learn Coq/Agda.

You can obviously draw your own conclusions from what you see in the videos, but I will also share my impressions. Both systems are wonderful and there is something about them that tells me that this is what the future should be like.

Working with Coq felt like driving a tank. Working with Agda felt like driving a Formula 1 car. Coq feels big and powerful, but not very elegant. Agda feels slick and futuristic but you can only wonder how the lack of a tactics language will work out in the trenches of a big boring project. Some of the more wonderful features of Agda (mixfix operators, seamless Unicode support, interactive development) I can see trickling into mainstream programming languages. The elegant proof-term syntax of Agda I can see more appealing to mathematically inclined users. Whereas the nuclear-powered tactic language of Coq is bound to appeal to hackers who just want to get (impressive) things done. But that’s just my opinion, and I don’t think it’s a very original or unique one.

Perhaps the main difference, for my money, between the two is the way I get stuck trying to get things done. In Coq it is quite often the case that I know how to achieve a proof goal but I cannot figure out what is the tactic I should use to achieve it or how to find it — so I blame Coq. Whereas in Agda, when I get stuck it’s because I made a mistake and I get an incomprehensible error message — but then I blame myself. Because of that, I end up being quite frustrated working with Coq whereas I derive a lot of satisfaction working with Agda.

 

Posted in proof assistants | 4 Comments

Seamless computing

In an earlier post (“Remember machine independence?“) I pointed out my dissatisfaction with the so-called domain specific languages (DSL) which are popping up everywhere nowadays, in support of new and esoteric programming platforms. The name DSL bugs me because I find it misleading. The right name would be ASL: architecture-specific language. They are not meant to solve problems in a particular domain, they are meant to program a particular architecture. Many such languages have some of the trappings of higher-level programming languages but also enjoy unconventional syntax and semantics, designed to match the platform.

I am not buying it. I think it is a mistake. I think the first and foremost principle of a higher-level programming language is that of machine independence. A higher-level programming language should allow the programmer to solve a problem rather than drive a physical system. This was a crazy idea in the 1950s, which went through all the stages that a crazy idea goes through:

First they ignore you. Then they ridicule you. And then they attack you and want to burn you. And then they build monuments to you. (Nicholas Klein)

The you in this quote was John Backus, who I am not sure whether he got a monument out of it (or whether anyone really wanted to burn him), but he did get a Turing Award, which is not bad. Not accidentally, the title of his lecture on the occasion of the award was Can Programming Be Liberated From the von Neumann Style? [PDF]. And machine independence figured high on the agenda.

Now the idea of machine independence is getting ridiculed again. As it was back in the 1950s the reason is exactly the same: computer programming experts are comfortable with machine-dependence in a programming language, and are worried that machine-independence will cause a substantial loss of efficiency. Which is both true and misguided at the same time, unless you are working on system-level programming. For application-level programming (scientific, corporate, etc) it rarely helps that your program is twice as fast if it takes twice as long to write.

In a 1958 paper called Languages, Logic, Learning and Computers, (J. of Computers and Automation) J.W. Carr III writes this:

One would like to be able to use these computers up to the day of the next one’s arrival, with interchange of problems and yet efficiency. Can this be done?

Carr is referring to the upgrade from IBM-650 to the “totally different” IBM-701 and IBM-704. This is before FORTRAN had been invented so the answer was not obvious at the time. But can we extend this point to our situation and replace ‘computers’ with ‘architectures’. Should our programs, written for the CPU, not work for GPU, FPGA, cloud, etc? The skeptics say ‘no’ and often argue that a mismatch between the programming language model and the system model leads to inefficiency. Perhaps, but in many circumstances loss of performance is a price worth paying. If we look at the languages of today, especially functional languages, they do not match the CPU-based architecture very well at all. They are declarative where computers are imperative. They use functions where computers use jumps. To compile (efficiently) one for the other a lot of optimization work is put in and the native systems are augmented by runtime systems to provide essential services such as garbage collection. But this does not invalidate the principle of having good programming languages even if they don’t match the architecture on which they run. Designing machines especially to run functional languages is actually a cool and interesting idea (see the REDUCERON).

Here is the same objection, taken from Carr’s paper:

The problems of efficiency have been often emphasized by programmers who believe strongly in the continuance of hand-programming over translational techniques.

“Hand-programming” means programming in assembly or even binary. “Translational” means using what we now call compilers. It’s safe to say history vindicates Carr and not those overly concerned with efficiency.

Bob Harper’s blog has a number of excellent posts on the duality between programming in the language and programming for the machine. They are all quite relevant to the point I am making, so have a look.

Machine-independence is clearly the necessary ingredient to achieve the kind of portability considered desirable by Carr. So we are doing something about it. We are developing a compiler for a conventional imperative and functional programming language (we call it Verity) which would allow effective and efficient compilation to any architecture. We already have a back end for compiling to FPGAs, which we call the Geometry of Synthesis project. It supports higher-order functions, recursion, separate compilation and foreign function interfaces to native HDL, all in an architecture-agnostic way. I wrote about this project on this blog before.

Distributed seamless computing

Today I want to talk distributed computing. From the point of view of supporting architecture-independent programming, distributed computing is a mess. Lets start with a simplistic program, just to illustrate a point, a program that in a functional programming language I might write as:

let f x = x * x in f 3 + f 4

And suppose that I want to distribute this code on two nodes, one that would execute the function f, one that would execute the main program. I can choose a high-level distributed programming language such as ERLANG. Here is how the code looks:

c(A_pid) -> receive X -> A_pid ! X * X end, c(A_pid).
main() ->
 C_pid = spawn(f, c, [self()]), C_pid ! 3,
 receive X -> C_pid ! 4, receive Y -> X + Y end
 end.

In the first program I describe logical operations such as multiplication and addition, and a logical structure to the program. In the second program I describe a system: I spawn processes and send messages and handle messages. A program which should be trivial is no longer trivial.

Things get even worse if I choose to distribute my program on three nodes: one for f, one for the function applications, one for whatever is left of the main program. This is the ERLANG code for this:

c() -> receive {Pid, X} -> Pid ! X * X end, c().
b(A_pid, C_pid) ->
 receive
 request0 -> C_pid ! {self(), 3}, receive X -> A_pid ! X end;
 request1 -> C_pid ! {self(), 4}, receive X -> A_pid ! X end
 end,
 b(A_pid, C_pid).
main() ->
 C_pid = spawn(f2, c, []),
 B_pid = spawn(f2, b, [self(), C_pid]),
 B_pid ! request0,
 receive X -> B_pid ! request1, receive Y -> X + Y end
 end.

Can you tell that these two ERLANG programs compute the same result? If you initially wrote the first two-node version and decided to re-factor to three nodes how hard would the transition be? You need to rewrite most of the code.

There is a better way. The way you might want to indicate nodes to the compiler should not be more than that:

let f x = (x * x)@A in f 3 + f 4

or

let f x = (x * x)@A in (f 3)@B + (f 4)@B

Is this possible? To the programmer who needs to struggle with the intricacies of writing distributed code this may seem like a pipe dream. But it is possible. Theorists of programming languages have known since the late 80s and early 90s that computation and communication are closely related. Whether it was Milner’s Functions as Processes [PDF], Girard’s Geometry of Interaction [DOI] or the early work on Game Semantics [see my survey] the message is the same: function calls are nothing but structured communication protocols. And this is not just pure esoteric theory, we really knew how to write compilers and interpreters based on it. Nick Benton wrote a very cool tutorial which also applies functions-as-processes to distribution. Why didn’t we write this compiler yet? Because we were afraid it might be too inefficient.

I think we should be braver and more sanguine and have a go at it. This worry could be misplaced or exaggerated. With my student Olle Fredriksson we made the first step, which is writing a compiler for a core functional programming language which can be distributed using light-weight compiler annotations just as described above. We call it seamless computing because it is distributed computing where all the communication between components is via function calls rather than via explicit sending and handling of messages. Explicit communication is the seam in distributed computing. This distribution is communication-free.

A paper describing the compiler was presented at the International Symposium on Trustworthy Global Computing earlier this year [PDF] and there is a video based on it which you can watch online. The compiler is available for download from veritygos.org. It is based on Girard’s Geometry of Interaction and it compiles down to C and MPI. There are some good things and some bad things regarding its efficiency. On the one hand the compiler does not require a garbage collector, which is very desirable in the distributed tokens. On the other, the tokens being exchanged during the computation have a size proportional to the depth of the call stack, which can be large especially for recursive calls. But this can be avoided by using a better implementation, still without requiring a distributed garbage collector. This is ongoing work. We are also currently expanding the compiler to handle the whole of Verity, so that it has concurrency and shared state. We are making great progress and it’s a lot of fun both as a theoretical challenge and as an implementation and engineering one.

We will keep you posted as things develop!

Posted in seamless computing | 2 Comments

The ultimate programming language?

In several earlier posts I was considering how the conventional notion of contextual equivalence, which relies relies on quantification over syntactic contexts, is sometimes not suitable and a more direct notion of equivalence should be used. This notion of equivalence, which I like to call system-level semantics, should be induced directly by what is observable by the execution environment as defined not only by language but also by abstract models of the compiler and the operating system.

More concretely, what I proposed (in a paper [PDF] with Nikos Tzevelekos) is that the system context should be defined along the lines of a Dolev-Yao attacker. This context is omnipotent but not omniscient: the program can hide information (data, code) from the context, but once the context gets this information it can use it in an unrestricted way. What we show then in our paper is how this attacker can produce “low-level” attacks against code, i.e. attacks which cannot be carried out in the language itself.

But what if the system-level context and the syntactic context are just as expressive? Then you have a fully abstract system-level semantics. This means that the attacker/context cannot produce any behaviours that cannot be produced in the language itself. Why is this cool? Because it means that all your correctness arguments about source level can be applied to the compiled code as well. You can think of a fully abstract system-level semantics as an abstract specification for a tamper-proof compilation scheme.

In a previous post I describe such a fully abstract SLS. Full abstraction is achieved by using monitors that serve as tamper detectors: if the environment behaves in ways which are illegal then the program takes defensive action (stops executing). So the full abstraction is achieved by introducing runtime checks for detecting bad behaviour, as in control-flow integrity, for example (see Abadi &al [PDF]).

But can you achieve a fully abstract SLS the other way around? By having a programming language that is so expressive that it can implement any attacks performed by a DY-style environment? I think the answer is ‘yes’. And such a language is surprisingly civilised: lambda calculus with state and control introduced in a uniform way. This language, called lambda-mu-hashref, was introduced especially to study security properties in a language capable of producing very powerful contexts. A fully abstract SLS-type semantics is given in the paper.

I find this quite remarkable, the fact that a civilised programming language which is not that much different from ML can induce contexts as powerful as a DY attacker. Any (reasonable) behaviour should be possible to implement using it. There is nothing else you can add to it to make it more expressive. It is, from this point of view, the ultimate programming language. 

Posted in system level semantics | Comments Off

Two consequences of full abstraction

In a recent paper [pdf] with my student Zaid Al-Zobaidi we describe two enhancements to the GoI compiler.

The first one is a tamper-proof mechanism which guarantees that no low-level attacks against the compiled code are possible. This is very nice to have because it means that all properties that you can prove about a program at code level are preserved by the compilation process. Of course, since we are generating hardware (rather than machine code) the ways in which a malicious environment can mount an attack are very constrained. Most importantly, the code cannot be changed, as there is no code just a fixed circuit. However, the compiler is compositional and can interface with libraries written in other languages, and such libraries can violate language invariants in unexpected ways. For example, in the hardware model calls/returns to/from functions are associated with fixed ports in the interface, so a malicious environment can return from a function that wasn’t even called!

The way our synthesised hardware can be made tamper-proof is by providing interaction monitors which take defensive action (e.g. halting the circuit) if the environment behaves in ways which are inconsistent with the behaviour of circuits obtained from compiling legal programs — e.g. circuits which act like functions that return without being called. This is possible because we have a fully abstract model of our programming language, i.e. a model which correctly and completely characterises programs written in the language. With the addition of tamper-detecting monitors, the game semantics and the system-level semantics of the language become the same: the system must play by the rules we impose!

We are not the first to note the importance of fully abstract compilation in the prevention of low-level attacks [Abadi 98] but as far as we can tell our technique is the first compiler to do that for a substantial programming language. Of course, if compiling to machine code this task becomes hard-to-impossible.

The second improvement is due to the fact that the environment can observe the code only in a limited way, so more states of the system seem equivalent from the outside. This is quite similar in principle to bisimulation in a typed context, although the technical details differ. Our laxer notion of ‘equivalence’ lacks transitivity so it’s not really. However, from the point of view of reducing the state space of the automata, hiding behind a monitor turns out to be very effective.

The paper was presented by Zaid at ICE 2012.

Posted in game semantics, Geometry of Synthesis, system level semantics | 1 Comment

Remember machine-independence?

An interesting quotation from Dijkstra, written in 1961:

I would require of a programming language that it should facilitate the work of the programmer as much as possible, especially in the most difficult aspects of his task, such as creating confidence in the correctness of his program. This is already difficult in the case of a specific program that must produce a finite set of results. But then the programmer only has the duty to show (afterwards) that if there were any flaws in his program they apparently did not matter (e.g. when the converging of his process is not guaranteed beforehand). The duty of verification becomes much more difficult once the programmer sets himself the task of constructing algorithms with the pretence [sic] of general applicability. But the publication of such algorithms is precisely one of the important fields of application for a generally accepted machine independent programming language. In this connection, the dubious quality of many ALGOL 60 programs published so far is a warning not to be ignored. [On the Design of Machine Independent Programming Languages]

I am bringing this up because to me it seems a lesson forgotten. There is a tidal wave of new architectures (manycore, GPU, FPGA, hybrid, cloud, distributed, etc) coming and people are happily producing special-purpose languages for each of them. There is a fashion now to call such languages Domain Specific Languages (DSL), but that is to me a misnomer. DSLs should be problem specific, whereas these are architecture-specific. They are ASLs.

At the low-level we certainly need (C-like) languages to program these systems. But what worries me is the embedding of “DSLs” into languages such as Haskell, resulting in strange mongrels, architecture-specific high-level languages. They offer you the poor performance of a high-level language in exchange for the poor abstraction of a low-level language. In fairness, they do offer you a much better syntax than the native languages, but is this enough? Aren’t we settling for too little?

I don’t think we should give up on machine independence just yet. I have recently given two talks on compiling conventional higher level languages into unconventional architectures. The first one is on the Geometry of Synthesis (compiling into circuits), invited talk at Mathematics of Program Construction 2012. The second is on compiling to distributed architectures, a talk at the Games for Logic and Languages 2012.

Posted in Geometry of Synthesis | Comments Off

Game semantics, nominally

I have an upcoming paper in MFPS28, about using nominal techniques to represent “justification pointers” in game semantics (jointly with Murdoch Gabbay).

In game semantics we represent “plays” as sequences with “pointers” from each element to some earlier elements. This is an awkward structure  to formalise, and many papers (including my own) tend to gloss over the annoying bureaucratic details involved in formalising them (e.g. by using integer indices into the sequence).

I don’t mind very much in general this aspect of game semantics, but one situation when it comes into play quite painfully is in teaching it. It’s hard not to feel a bit embarrassed when explaining them chalk-in-hand at the blackboard. It was in fact during such a lecture when Andy Pitts jokingly (I hope) booed me from the audience when I explained pointers and plays.

Ever since, I wanted to fix this annoying little aspect of game semantics and now I think we did, by using nominal techniques: every move in a sequence introduces a fresh name (the tip of the pointer) and uses an existing name (the tail of the arrow). It is as simple as that, really, and everything seems to just work out nicely.

Mathematically, this is perhaps trivial, but this is pretty much the point of it. The contribution of the paper is “philological”, it shows that the nominal setting gives the right language for pointer sequences.

Link to the PDF.

Posted in game semantics | Comments Off

Armchair philosophy

I was reading recently a fun paper by Wesley Phoa, Should computer scientists read Derrida? [pdf]. I was attracted by what seemed to me a preposterous title, being quite sure the paper is a parody. Instead, I found myself confronted with a serious, beautifully written and thoughtful paper.

I will not comment on the paper itself because it deserves to be commented on seriously. I’ll just say that although I liked it quite a lot and although I recommend it as an excellent and thought-provoking read, I didn’t quite agree with its conclusions. Yet I was intrigued by the question. So I’ll have a go at giving a different answer.

First, we need to frame the entire history of semantics of programming languages in a blog post and then lets see how we can fit Derrida in there. We need to start with logic, of course.

There are two kinds of semantics (to use JY Girard’s terminology): essentialist and existentialist. I disclose that I am 10% essentialist and 90% existentialist, so that the reader can calibrate my account taking personal bias into consideration.

Essentialists take after Tarski and aim to understand language by expressing meaning into a meta-language. So that the proposition the grass is green is true if the grass is indeed green. The meaning of grass is ‘grass’, of is is ‘is’, etc. To the uninitiated this may seem funny, but it is serious business. In formal logic symbols get involved, so it seems less silly. The proposition A∧B is true if A is true and B is true, so the meaning of “” is ‘and’.

In programming languages this approach is the oldest, and is called “Scott-Strachey” semantics according to the two influential researchers who promoted it. It is also called ‘denotational’ (because symbols ‘denote’ their meaning) or ‘mathematical’ (because the meaning is expressed mathematically). To map symbols into meaning it is conventional to use fat square brackets 〚  and 〛.

So what is the meaning of the ‘2+2‘? It is 〚2+2〛 = 〚2〛 + 〚2= 2 + 2 = 4. Students are sometimes baffled by this interpretation even though we do our best to try to make things clearer by using different fonts to separate object and meta-language. (Because this blog is written in a meta-meta-language I need to use even more fonts.) It is easy to laugh at this and some do by calling it, maliciously, ‘renotational pedantics’. Well, much of it perhaps is, but not all of it. It is really interesting and subtle when ascribing meaning to types, i.e. things like 〚intint〛 which can be read as ‘what are the functions from ints to ints describable by some programming language’.

In programming language semantics, and in semantics in general, the paramount question is that of equivalence: two programs are equivalent if they can be interchanged to no observable effect. In an essentialist semantics A is equivalent to B if 〚A〛=〚B〛. This justifies the name quite well, since A and B are ‘essentially’ the same because their meanings are equal. Above we have a proof that 2+2 is equivalent to 4, because 〚2+2〛 =〚4〛= 4.

Essentialism has come under some fierce criticism. Wittgenstein, a promoter of it, famously reneged it. Famous contemporary logicians JY Girard (The Blind Spot) and J Hintikka (Principles of Mathematics Revisited) both attack it viciously (and amusingly) because of its need for a meta-language: what is the meaning of the meta-language? “Matrioshka-turtles” is what Girard calls the hierarchy of language, meta-language, meta-meta-language, etc. — it goes all the way down to the foundations of logic, language and mathematics.

Existentialism takes a different approach: language is self-contained and we should study its dynamics in its own right. Wittgenstein uses the concept of ‘language games‘ both to understand the properties of language (which are still called ‘meaning’ although the non-essentialist reading of ‘meaning’ is a bit confusing) and to explain how language arises. For some reason existentialist approaches to meaning always end up in games. Lorenzen used games to give a model theory of logic and Gentzen used games to explain proof theory. Proving something to be true means winning a debate. When something is true a winning strategy for the debate-game exists.

In programming languages the first existentialist semantics was operational semantics, which describes meaning of language in terms of itself, by means of syntactic transformations. It is intuitive and incredibly useful, but it is quite simplistic. Most importantly, it does not give a good treatment of equivalence of programs. It can only formalise the definition of equivalence: A and B are equivalent if for all language-contexts C[-] such that C[A] and C[B] are executable programs, C[A] and C[B] produce the same result. It is not a very useful definition, as induction over contexts is so complicated as to be useless.

Something very nice about programming language semanticists is that we’re not dogmatic. The battles over ‘which one is better’, denotational (essentialist) or operational (existentialist), were non-existent, largely restrained to friendly pub chats. In fact one of the first great open problems was how to show that the two are technically equivalent (the technical term is ‘fully abstract’ for a denotational model which captures operational equivalence perfectly). Milner, who coined the term, also showed how an essentialist (denotational) semantics can be constructed (where the word ‘constructed’ is not necessarily meant constructively) out of an existentialist (operational) semantics. I am not sure how aware philosophers of language are of this construction, because its implications are momentous. It establishes the primacy of existentialist semantics and reduces essentialist semantics to a mathematical convenience (publishing a research paper on programming language semantics without a proof of ‘adequacy’ is a serious challenge).

Milner’s construction is not very useful (which takes nothing away from its importance) so finding more palatable mathematical models is still important, and proving ‘full abstraction’ is the ultimate technical challenge. This turned out to be very hard, but it was eventually done (for many languages), also using the ‘game semantics‘ I mentioned above.

Curiously, in programming language semantics games were introduced as a denotational (essentialist!) semantics: a program denotes a strategy rather than an operational (existentialist) semantics. In fact, some technical aspects of the first game models [see Inf. Comp. 163(2)] made PL Curien (and others) complain that game models were too much like Milner’s construction of a denotational semantics out of an operational semantics [link].

Again, this full abstraction result was not the intended goal: we wanted a fully abstract model of PCF for short. But Loader’s later result settled the question negatively: he showed that the observational equivalence for PCF is not effective. As a matter of fact, the game-theoretic models of PCF given in 1994 by Abramsky, Jagadeesan, and Malacaria (AJM), and by Hyland and Ong (H0) offer syntax-free presentations of term-models, and the fully abstract model of PCF is obtained from them by a rather brutal quotient, called “extensional collapse”, which gives little more information than Milner’s original term model construction of the fully abstract model.

However, these technical issues were only manifest in the PCF model and most subsequent models, starting with Abramsky and McCusker model of Algol were syntax-free, complaint-free, fully abstract models.

Still, this is somewhat of an anomaly, at least historically and I think methodologically. Things were, however, set right by J Rathke and A Jeffrey who came up with a neat combination of operational and game semantics which they called ‘trace semantics‘. P Levy renamed this to ‘operational game semantics’, although I think for historical reasons it would be nice if it was all called just ‘game semantics’. Unlike vanilla operational semantics, operational game semantics offers a neat treatment of equivalence. Two programs A and B are equivalent if whatever ‘move’ one can make in its interaction with the environment the other can match it (the technical term is ‘bisimilar’). Also, unlike vanilla game semantics operational game semantics doesn’t really need the operational semantics.

What about Derrida then?

Broadly speaking, Derrida makes a critique (called deconstruction) of existing approaches to understanding language and logic on the basis that they ‘ignore the context’:

… [deconstruction is] the effort to take this limitless context into account, to pay the sharpest and broadest attention possible to context, and thus to an incessant movement of recontextualisation. The phrase … ‘there is nothing outside the text’, means nothing else: there is nothing outside the context.

This is obviously a statement exaggerated for rhetoric effect, but it’s not silly. In understanding language, context is indeed essential. In understanding formal logic, however, I don’t see how context plays a role, because the context is known. It is sort-of the point of formalisation to do away with the difficult questions that context raises. How about programming then? Is context important? Yes, it is. I’ve been saying this for a while, including on this blog. Realistic programming languages exist in a complex and uncontrollable context. Take these two little programs in simplified C-like languages.

int f() { int x = 0; g(); return x; } 
versus
int f() { g(); return 0; }

Are they equivalent? We don’t know what g() is. Importantly, we don’t know if it’s even a C function (C has a foreign function interface). If g() was “civilised”, then the two are equivalent because a civilised function g() has no access to local variable x. However, g() may be uncivilised and poke through the stack and change x. It’s not hard to write such a g() and hackers write uncivilised functions all the time.

This may seem like a hopeless problem, but it’s not. In fact, operational game semantics can model this situation very well. I call it system-level semantics, and it’s game semantics with a twist. In game semantics we model program+environment as a game with rules, which embody certain assumptions on the behaviour of the environment. In a system-level semantics the game has no rules, but it has knowledge. The environment is an omnipotent but not omniscient god, sort-of like Zeus. We can hide things from it. This is the same principle with which the Dolev-Yao attacker model in protocol security operates.

This does not necessarily lead to chaos, just like Derrida’s deconstruction need not degenerate in relativism:

… to the extent to which it … is itself rooted in a given context, … [deconstruction] does not renounce … the “values” that are dominant in this context (for example, that of truth, etc).

The point is that any enforcement of properties are up to the program and cannot rely on the context. In a game, the program should be able to beat any opponent-context, not just those behaving in a certain way (just like a security protocol).

A system-level semantics of C can be given where local variables are not hidden (a la gcc), and in such a semantics the two programs above are not equivalent. But a system-level semantics of C can also be given where local variables are hidden (e.g. via memory randomisation), which makes the two programs equivalent in any context. In a civilised context both semantics would be equivalent, but in an unrestricted context they are different.

So should computer scientists read Derrida? No. However, like Derrida urges, they should worry more about context.

Posted in game semantics, system level semantics | 11 Comments

Verity now with call-by-value

Call-by-name is generally seen as an annoyance by programmers. Fortunately, if your programming language has mutable state you can force evaluation at ground type by using assignment, so by writing

new v in v := x; f(!v)

you achieve the same effect as calling f(x) by-value. This is tedious, so ALGOL use

f(val x)

to achieve the same effect by using simpler syntax. We have added this feature to Verity now as well.

This was also an opportunity to me to get used to the compiler, which was written by my student Alex Smith. So I sat down with him and had him guide me in implementing this syntactic transformation in the GOS compiler. So that I don’t forget all this stuff I screencasted our session, which you can watch here.

Warning: it is not a very exciting video, other than the mild amusement generated by my constant fumbling.

On the plus side, the whole compiler change took as long as the video, i.e. about one hour including some simple testing. And it worked, no bugs, first time, which is yet another testament to what a nice programming OCaml is.

Posted in Geometry of Synthesis | Comments Off