Two kinds of mathematics

Preliminary reading:

  • “The Origins and Motivations of Univalent Foundations” by V. Voevodsky’s [PDF]
  • “Proofs and refutations” by I. Lakatos [link]
  • “On proofs and progress in mathematics” by W. Thurston [link]

I find the recent surge in interest in the use of proof assistants in mathematics exciting. Because of the nature of proof assistants — computer systems by necessity lack mathematical intuition and must be fully formal — there is a renewed interest in logic and foundations of mathematics. What is special this time around is that there seem to be echoes of this interest among mainstream mathematicians. Because history always repeats itself some of the old arguments are being rekindled:  ”conceptual” versus “formal” mathematics, or call it “platonism” vs. “logicism” if you have a philosophical bend, or, as G. C. Rota nicely put it “truth” vs. “the game of truth”.

As most philosophical arguments, these arguments arise out of confusion. This is not slagging philosophy, just highlighting its role and methodology: clarifying confused (or confusing) concepts by employing them in dialogue, like new and untested weapons in a military exercise. Once the concepts are clarified they can be passed over to science, mathematics, etc. This passing of the baton is of course not an official ceremony, but it is a slow osmosis of ideas through the many membranes that separate various fields and sub-fields of knowledge. This is my attempted contribution to clarifying the confusion.

There is no need for a conflict between “formal” and “conceptual” mathematics, because there is no need to have only one kind of mathematics. This is as unnecessary as forcing a choice between “constructive” and “classical” mathematics: they simply are two different worlds, as explained very nicely by Andrej Bauer.  The universe of mathematics would be poorer without any of them. Similarly, the universe of mathematics would be poorer without “formal” or without “conceptual” mathematics. But what are they and what purpose do they serve?

 ”Conceptual” mathematics

This is the conventional established mathematical tradition that Lakatos and Thurston so beautifully describe in their papers. Somewhat surprisingly, conceptual mathematics is not “just” about formulating and proving theorems. It is not automatically the case that the mathematician who proves the most theorems or the hardest-to-prove theorems is the highest rated. The fascinating beauty of conceptual mathematics comes out of the creative tension between definitions, theorems and proofs. It is not always the case that the definitions are produced, then, using proofs, theorems are churned out. Sometimes the desirable theorems are known well before the definitions have been clarified (see early work on mathematical analysis). Sometimes a proof is so interesting that both definitions and theorems are adjusted to fit it (See “Proofs and Refutations”). Everything is in a state of flux and mistakes are possible but not necessarily devastating: they show room for improvement and further work.

Computer-aided proof assistants can get in the way of conceptual mathematics. Because they force a full formalisation of all concepts, the concepts can become buried under too much book-keeping and thus lose their intuitiveness. Proof assistants are excellent at checking correctness but are not necessarily the best vehicle for communicating ideas. And this is what mathematicians use proofs for: communicating ideas, not checking correctness. This is one of the reason why mathematical papers in any field are so impenetrable even to mathematicians in other fields. They use a specialised language that relies heavily on the social context in which that particular mathematics happens. Lacking the context it is almost impossible to read the papers. The mathematics happen inside mental models that are poorly described by the formalisations used in mathematical papers. Conceptual mathematics requires more than correctness of proof, this is why the Four-Colour Theorem and its proof which relies heavily on computation makes conventional mathematicians scrunch their noses in dismay.

 ”Formal” mathematics

Exploring mathematical concepts is fascinating, but sometimes we really just need to know if a statement is true or not. This is often the case if we build things and we are concerned that these things behave in a desired way. Since mathematics is the language we use to talk about things in the real world (e.g. physics or engineering) there is a practical concern with the validity of mathematical statemants. In the same context, the aesthetics of the definition-proof-theorem interaction are often of no interest.

Software is perhaps the most obvious example in which mathematical concepts are designed and used in a way that has an immediate impact on the real world. A computer program is an abstract mathematical concept that drives the behaviour of a concrete physical device, the computer. Most errors in the behaviour of a program come not from the hardware but from the software itself — they are of a purely mathematical nature. But programming languages and programs (existing ones, not as a matter of necessity) are ugly mathematical objects, devoid of conceptual elegance and encumbered by complications and contingencies. Their genesis is not driven by a taste for elegance, as most conceptual mathematics, but mainly by how efficient they are in driving the physical computer, itself a complicated and ad hoc device, to perform certain tasks. However the correctness of software is of crucial importance, and the only realistic way in which it can be achieved is by using formal mathematics, as embodied by proof assistants.

But it’s not just software. Properties of security protocols and encryption schemes are also examples of mathematical properties where correctness trumps in importance conceptual coherence.

Conceptual mathematicians would of course rather that their theorems were true. This is why computer checking of conceptual mathematics is of some interest. And when a massive formalisation effort such as Gonthier et. al.’s formalisation of the Feit–Thompson (odd order) theorem turns out an insignificant amount of errors there is good reason for high-fives and some sighs of relief.

 ”Formal conceptual” or “Conceptual formal” mathematics

The area of overlap between formal and conceptual mathematics is a fascinating one. This is where the Univalent Foundations project is happening.  What is the right foundation for mathematics? Not just in terms of being sound but also in making conceptual mathematics more like formal mathematics (i.e. checkable) and formal mathematics more like conceptual mathematics (i.e. readable).

Other areas of mathematics that should be both formal and conceptual are those where the choice of foundational structure is important: type theory, logic, constructive mathematics. Unlike classical mathematics, which take an everything-goes approach to its logical foundations, these areas must be more discerning in what rules are or are not acceptable. A proof of a property must not trespass into illegal territory, and for informal proofs it can be quite difficult to assess that.

As most computer scientists my own mathematical work has elements of the formal and the conceptual. An example of the mostly formal is recent work on distributing the Krivine machine [link][PDF] whereas an example of the mostly conceptual is work on game semantics [link][PDF]. I cannot brag that I have done a lot of work in mathematics that is both formal and conceptual, but if you forced me to give the best example it’s perhaps work on formalising some of the structures encountered in game semantics to facilitate reasoning [link][PDF].

Posted in armchair philosophy, proof assistants | Comments Off

Two views of programming language design

In computer science, the Boolean or logical data type is a data type, having two values (usually denoted true and false), intended to represent the truth values of logic and Boolean algebra. [Wikipedia]

In computer science, a tagged union, also called a variantvariant recorddiscriminated uniondisjoint union, or sum type, is a data structure used to hold a value that could take on several different, but fixed types. [Wikipedia]

In certain circles there is only one way to think of a programming language: lambda calculus and some extra stuff. Depending on where you add this extra stuff, you get two different, although not incompatible, views on programming language design.

I hope we can all agree that the simply typed lambda calculus is a useful starting point for a programming language. But it must be merely a starting point. Because the only values are functions, comparing results of computations requires some theoretical somersaults. It also makes operations such as input or output problematic. So we need to add stuff to it. Without undue theorising let us jump into an example: booleans.

The first way of doing it is a minimalistic one: we just add a new type to the language (lets call it bool ) and some typed constants for manipulating booleans:

true, false : bool
not         : bool -> bool
and, or     : bool -> bool -> bool 

Some are ground-type constants, and some are unary or binary first-order constants (a.k.a. operators).  For easier future reference, lets call this the axiomatic approach because, if you keep in mind the Curry-Howard correspondence, it is sort of like adding arbitrary axioms to a natural deduction system.

This works but it is unsatisfactory on several accounts. The first one is the rather arbitrary way in which we are adding types and constants to the language. The second one is that the behaviour of booleans and the associated constants is not (yet) specified, so we need to add this essential information somewhere (language reference, operational semantics, etc.).

The third objection is that whenever we require yet another data type we need to extend our programming language, with all the hassle that entails. We even need to change the parser. For example, adding natural numbers would require yet another new type nat and a bunch of constants:

0, 1, 2, ...           : nat
neg                    : nat -> nat 
eq, gt, lt             : nat -> nat -> bool

Then how about more complicated data types such as lists or trees, or types for dealing with state or other computational effects? There must be a better way.

A better way, which we shall call the type-theoretic approach, seems to be to provide the programming language with a way of constructing its own types, such as algebraic types or dependent types.  Many functional programming language offer this facility. In OCaml, for example, booleans and naturals and the associated operations can be defined:

type bool = True | False ;;
let not = function 
 | True -> False
 | False -> True;;
type nat = Zero | Suc of nat;;
let rec plus m n = match m with 
 | Zero -> n
 | Suc m -> Suc (plus m n);;

This second — type theoretic — approach is clearly more elegant, solving the problems of the axiomatic approach: no more adhockery, behaviour is defined, the language can be extended uniformly in a nice and elegant way.

But is it always better, from all points of view?

There is at least one dubious scenario: interfacing with the physical world, including the computer itself. The built-in data types of the computer (char, int, float, etc.) are terribly ad-hoc and lacking in uniformity. However, the computer as a physical device is at its most efficient when working with these built-in types, because it has been designed to work with them. But it is very tricky to make a compiler that just knows how to connect a user-defined type (e.g. for natural numbers) and its constants with hardware-supplied operations. For starters, the match may not be perfect. The user-defined natural numbers for example don’t have size restrictions, so you might say that compiling them as computer integers is just wrong. But in general, since the type-theoretic approach defines new concepts within the language, their implementation proceeds via the definitions, simplified into the core of the language. This unpacking of definitions obscures whatever intended connections might be with a physical reality.

A similar related scenario is when a program needs to interface with a physical device, which can usually be reduced to sending bits and receiving bits. Here the type-theoretic approach becomes uncomfortably constraining and some languages simply abandon typing discipline when I/O is concerned. A similar problem does not arise in the case of the axiomatic approach, mainly because the axiomatic approach is too simplistic to even ask the question in the first place.

Consider memory (RAM) as a physical device. How do you interface with it? The program either sends a tuple representing an address and some data to be written at that address, to which the memory replies with an acknowledgement; or the program sends an address and receives the data stored at that address. To add this memory to a lambda calculus in an axiomatic way is not a problem.

We introduce the type of variables simply as a tuple of a write and a read operation with the signatures as informally described above. And a ram (second-order) operation would take a function of a var, bind that to the memory, use it, then release the binding upon exiting the body of the function.

type mem = (nat -> nat -> nat) x (nat -> nat)
ram : (mem -> nat) -> nat

Read and write are just projections to select the right “method”. For example, a program such as

#new ram m;
m[10] := m[10] + 1

would be desugared with the syntax above as

ram (\m.(proj1 m)(10)((proj2 m)(10) + 1)

A type-theoretic approach to effects is perhaps possible, using the rather well known monads or the funkier lenses or the more sophisticated algebraic effects. However, it is very difficult in general to make types match whatever arbitrary interface your physical system or device might have. And it is even more difficult to make the compiler hook up the generated code to the device interface, because of the same reason mentioned above: in the type-theoretic approach complex types are first usually translated into a simpler core language, and in the translation process the type-theoretic hooks that allow the connection to the physical device can get obscured or even lost.

Note that the device doesn’t need to be physical, it can also be a data type that doesn’t have a convenient algebraic representation. For example, in the axiomatic approach I can give the right interface to a circular buffer but fully specifying it type-theoretically is very hard — maybe not impossible-hard, but research-topic-hard at best.

To summarise, the axiomatic approach views the underlying lambda calculus simply as middleware for hooking arbitrary stuff together, whereas the type-theoretic approach sees the (appropriately enhanced) lambda calculus as a proper programming language. Fortunately the two approaches are not incompatible, because there is nothing stopping us from postulating axioms in a type theory. It may be considered in questionable taste, but it can be helpful!

Posted in programming languages | 2 Comments

The category of cardboard rectangles and bits of wool string (II)

In the second session of our category-theory club for school children we took a giant leap from the informality of diagrams towards formal mathematical notations. Our starting point were the cardboard-and-string diagrams from the previous session which we have already established that they were the same even though the strings had different lengths and colours and the bits of cardboard slightly different sizes. The reason they were the same (I chose not to use “equal” because of possibly overloaded meaning to the children) was because they were created following the same instructions.

The next step was now to draw on paper these diagrams and each child was requested to do that, and there were relatively few problems in performing this task reasonably. As one would expect, the differences between the resulting diagrams drawn by each child were now quite staggering but the idea that they were still the same diagram thankfully persisted because these were obviously accurate (as much as possible) renderings on paper of things that were the same in reality.

With this part of the work concluded successfully we moved on to what I thought will be the most challenging step: introducing a combinator notation for diagrams, inspired by compact closed categories. As a preliminary step they were required to draw the diagrams using only straight lines and right angles (the example was chosen carefully so no braiding was needed) and then we went over the diagram and tried to identify the simplest components. I actually introduced the term “component” as “a thing out of which we make other things” and after some ooh-ing and aah-ing the children seemed happy enough to accept the Big Word. Unfortunately at this critical stage a logistical problem struck: the room did not have a whiteboard or a blackboard, just an electronic “smart”-board. The irony of the term is obvious, and I am preparing a separate rant for the faddish abandonment of perfectly good classical tools of learning for fickle and fragile technology-heavy alternatives. So instead of drawing a big diagram on a big whiteboard and identifying the components I had to walk around each table and do that with each group of children, which wasted a lot of time.

The initial set of combinators we identified were: I (a line, the identity), E (a feedback loop, the compact-closed unit), U (a feedforward loop, the compact-closed co-unit) and f (a rectangle box, the only morphism so far in our category). We also discovered two ways to combine components, writing them next to each other (composition) or stacked on top of each other (tensor).

The obvious two tasks were first to write the formula corresponding to our running example diagram then drawing a diagram from a given formula. Some typical results are below.

Posted in teaching | Comments Off

Why computers can’t think (III)

This is the final instalment of a series of posts on machine consciousness. It’s a fascinating topic but now that the Easter holidays are over I will need to refocus my attention on the equally fascinating topics I am actually paid to research.

The motivation for this series of posts is the recent personal realisation that consciousness (having subjective first-person experiences) cannot be explained as a purely computational phenomenon. The consciousness-as-computation view is widely held, especially among computer scientists. It is a very convenient materialist and physicalist answer to some of the deepest questions about the mind. I used to subscribe to this view, but now I realise it is based on some misunderstandings of what computation is. The gist of the argument is that a computing device, if it’s complex enough, could somehow develop consciousness. Our own human consciousness is given as an instance of this, because our brains, as computing devices, have the required level of complexity. To emphasise: we are not talking about the ability to mimic a consciousness, we are talking about having a consciousness.

In the first post on this topic I argued that from the Church-Turing thesis it follows that any complex computing device can be reduced to a Universal Turing Machine, which is a simple device, with a very very long tape — so there is nothing qualitatively special about the complexity of the device. In the second post I argued that even the universality of a Turing Machine is not a special property, because given a conscious UTM we can construct a finite state machine which can have some of the subjective experiences of the UTM. Neither of these two arguments are conclusively proving that a computer cannot think, but they show that the common view that computers can think is based on some bad arguments, a misunderstanding of what computers are.

No doubt, our brain does a lot of computation. As a computer it is an enormous finite state machine. The question is: does our consciousness arise out of the computation it does, or does it arise out of it being a brain? In other words, if we were to simulate all its computations in a computer, would that computer have the subjective experiences we have (joy, love, despair, confusion, etc.) or is it a non-computational function of the brain biological processes? Considering the arguments in my previous two posts, I incline towards the latter. It’s not an idealist or dualist position, it’s still materialist and physicalist, but it entails that there is a lot about the brain that we don’t know.

I will attempt now a stronger argument that machines cannot think by trying to derive a contradiction. Because there is nothing special about UTMs as far as consciousness goes, and because the brain itself is a FSM, let us focus on the (im)possibility of FSMs to have a subjective experience. For simplicity let us further focus on deterministic FSMs since they are simpler and as expressive as the non-deterministic ones.

Before we proceed, let us set some things straight. A FSM is commonly understood as an abstract structure, whereas we will consider the physical realisation of a FSM as a physical system with inputs, outputs and changing internal state. The following is not obvious, but essential: if a FSM is to have a subjective experience (consciousness) and if consciousness is computational then the nature of that subjective experience will only depend on the abstract FSM and not on the way the FSM is physically realised. So if a particular FSM is realised as a brain and it has certain subjective experiences, then if we realised it as a silicone computer it will have precisely the same. This is required for the sake of argument: if we allow the subjective experience to depend on the physical realisation of the FSM then it is no longer a computational phenomenon and we are done. Another even less obvious assumption is that if a FSM can give rise to a subjective experience then that particular subjective experience is not associated with the FSM as a structure, but must be associated with a particular run of that FSM. In other words if the FSM has a subjective experience it should arise only out of its interaction with the environment and the evolution of its internal state, i.e. the computation. As before, the argument is that if it arises otherwise then it makes consciousness not of a computational nature, and we are done. A final and essential clarification, of what it means to implement an abstract FSM as a concrete physical system: it means to establish a FSM homomorphism between the latter and the former. This means that if we can observe the states, inputs and outputs of the concrete FSM we can “decode” the states, inputs and outputs of the abstract FSM (using the homomorphism) so that the transitions are preserved.

Let us now try to derive a contradiction out of the existence of consciousness-producing FSMs. More precisely, we will show that the consciousness of the brain cannot arise simply out of its computational activity. Computationally, we know that the brain must be a FSM. Let us take two runs of that FSM, of equal length, but corresponding to clearly distinct mental states (e.g. one is taken on a happy day, one on a sad day). These runs correspond to traces of inputs, outputs and internal states. If consciousness is computation, whenever we run a FSM isomorphic to the brain-as-a-FSM so that the two traces are realised, that isomorphic FSM will feel happy, respectively sad.

But if, as we established, the subjective experiences depends on the run rather than on the FSM, in each case we can discard all the states that are not used and end up with two non-isomorphic FSMs which can realise two distinct subjective experiences if fed with particular inputs, and some other behaviour we don’t care about if fed with other inputs. Now let us construct the product of these two FSMs, i.e. the FSM with the transition

f(s_1,s_2)(i_1, i_2)=p(f_1 s_1 i_1, f_2 s_2 i_2)

where f_i are the two transition functions and p is the isomorphism  p ((s_1, o_1), (s_2, o_2)) = ((s_1, s_2), (o_1, o_2))

And this gives us the contradiction: a physical realisation of the product automaton is an implementation of both automata, by the composition of the implementation homomorphism with each projection homomorphism. But if subjective experience is purely computational then the same run of the product automaton must give rise to two distinct yet simultaneous subjective experiences. But having two distinct subjective experiences simultaneously — assuming that even makes sense — is not the same experience as having just one of the two alone. To me, this is a contradiction.

A FSM cannot have a subjective experience just by virtue of its computational behaviour. I suppose this must be for the same reason that a computer doesn’t get wet if it simulates a liquid and it doesn’t become radioactive if it simulates a nuclear explosion — no matter how precise the simulation. Why is a brain different? Why or how does a brain “make” consciousness? We only know that it does, because we experience it first hand, but we know little else.

A final qualification: when I say that machines cannot think I mean that they cannot think qua machines. This argument doesn’t rule out panpsychism — the doctrine that everything is more-or-less conscious — or idealism — the doctrine that the mind has a non-physical nature. But this argument doesn’t require them either. For details, I refer the reader to the work of Searle which proposes a modest and sensible programme which starts from the doctrine that consciousness is a biological process specific to the brain.

Acknowledgements. Steve Zdancewic and Lennart Augstsson pointed out that the brain must be a FSM thus simplifying this argument quite a lot.

Posted in armchair philosophy | 6 Comments

The category of cardboard rectangles and bits of wool string (I)

I am conducting an educational experiment: teaching children in year 2 and 3 (7-8 years old) about categorical concepts, more precisely about monoidal categories. Monoidal categories are an appealing formalism because of their connections with diagrams [1]. So we just discover together a bunch of stuff about diagrams.

The first lesson started with a general conversation on what Mathematics is and, unsurprisingly, children were surprised to learn that it is not all about arithmetic. The idea of mathematics without numbers seemed strangely exciting at least to some of them — a boy clenched his fist and muttered ‘yes!’ to himself when I announced that we will do the mathematics of shapes. Because this is highly experimental only 10 children were invited to participate, among the more gifted in their year. I divided them up into teams and assigned them tasks. The first task was to cut rectangles out of cardboard, punch holes in the four corners and label both the rectangles and the corners: these were going to be the morphisms of the category. Preparing 8 such rectangles proved to be unexpectedly challenging and took quite a lot of time. The children in each team first spent quite some time deciding on how they will cooperate. Two common models were parallel (everyone does the same thing independently) and pipelined (each one does a specialised task). There was also the unexpected difficulty of operating a hole puncher so that you punch just one hole rather than a pair of holes, with at least one team being unable to figure out how to insert the cardboard into the puncher to achieve this. The same team struggled with 5 or 6 ruined “morphisms” because the hole was too close to the edge.

As is usually done in categories of diagrams, identities and other structural morphisms are represented by lines. In our case we were going to use wool strings, tied to the holes. The hole patterns are, of course, the type of the morphism, i.e. the objects.  The next task was just to get familiar with the mechanics of tying bits of string to little holes in cardboard rectangles. This was a skill essential in building more complex morphisms out of simpler morphisms and compositions. Just like before, the level of dexterity of the children was unexpectedly low. For example several did not know how to tie a double knot and they had to learn that first. In the end, however, all teams mastered the task.

The final task was for each team to construct a composite morphism, which was specified in a relational style as a list of connections between node labels: 1c to 2a, 1d to 2b, 2c to 1a, 2d to 1b. With the cardboard morphisms at the ready and the string-tying skills mastered, this task was completed relatively painlessly and error free. Each team displayed their creation.

In the conclusion of the lesson, with all the diagrams displayed, I asked the children whether these were the same diagrams or different diagrams. Intuitively, because they knew they built the diagrams using the same instructions, they all answered that these were the same diagrams. I challenged their answer, objecting that the cardboard was different colours and sizes and the strings were different colours and lengths, to which they replied that these don’t matter. Indeed, what we wanted to matter is just the what-is-connected-to-what-ness of the diagram.

To summarise, in this first lesson we covered the basic skills of constructing morphisms in the the category of cardboard rectangles and bits of wool string: making cardboard morphisms and composing by tying the bits of string, using a relational-style specification. The children also started to develop the basic intuitions which will lead up towards a topological notion of equality of diagrams.

Figure. Two equal morphisms in the category of cardboard rectangles and bits of wool string.

Posted in teaching | 2 Comments

Why computers can’t (really) think (II)

This is a follow on to my previous post Why computers can’t think. The reactions and comments to it helped me to tighten and clarify the argument.

I am going to use the term mental event in order to denote a subjective first person experience which is delimited in time, so that we can talk of a beginning and an end for this experience. Like when you stick your hand in ice-cold water you start feeling the cold, and when you withdraw it you stop feeling the cold, perhaps after a few seconds delay. It is not important that the temporal delimitation is precise, just finite.

Suppose for the sake of argument that consciousness is a computational phenomenon, so mental events can occur in a computer. For simplicity, let us reduce this computer to a Universal Turing Machine (UTM) and a (very long) tape — the Church-Turing thesis says that no matter how complicated a computer or a computer program we can always reduce them to UTMs. This means that when we run the UTM on that particular tape a mental event will occur. Let us consider the state of the UTM when the mental event occurs (the internal state, the tape state, the head position) and also let us discard all the tape that is not used by the UTM during the occurrence of the mental event.

The behaviour of a UTM is determined by its state so, necessarily, whenever we run the UTM from that internal state on that fragment of tape the mental event will occur. But a UTM with a finite tape is equivalent, computationally, to a finite state machine, to an automaton. So for any mental event we can build an automaton that creates that mental event.

I hope this consequence is striking on at lest two accounts. First, we have a FSM which can experience consciousness. Second, we have a stunted kind of consciousness which consists of one single mental state all the time — it doesn’t even have the absence of that state. These two consequences are not quite a contradiction, logically, but I find them unacceptable. If we want to accept conscious UTMs we must accept conscious finite state automata and we must accept the fact that consciousness can be dissected into atomic mental events which can be reproduced by automata.

Notes

  1. In the above we need to make an important distinction which sometimes gets lost: I am not talking about whether computers can pass the Turing test (I think they will be eventually be able to do that) so I am not talking about whether computers can exhibit something like consciousness. I am also not talking about whether we know how to detect for sure whether a computer can have conscious thoughts. What I am talking about is whether a computer can have a subjective first-person experience — or, more precisely still, what are the other things that we need to accept if we believe computers to be able to have consciousness. The other topics are of course interesting and important, they are just not the subject of this argument.
  2. The mind-as-computation view is its most prevalent materialist explanation, so what are we left with if we reject it. God? Magic? Not necessarily. One unavoidable consequence is that our brains do more than computation. They do a lot of computation, that is established, but they do more than that. They ‘do’ consciousness in a way that a computer without a brain cannot do. We don’t know the details though. Materialism means that reality can be explained as particles moving in fields of force, so the necessary consequence to this argument is that there are yet unknown particles or fields involved in the operation of the brain. Penrose, in The Emperor’s New Mind, tries to blame for example consciousness on quantum effects, which is rather unconvincing, but it is not silly as an attempt to find physical distinctions between brains and computers.
  3. Much of the mind-as-computation view (Dennett is a prime exponent of it) rests on the premiss that there is a qualitative distinction between simple and complex systems. However, the Church-Turing thesis makes it quite clear that the only difference is in the length and contents of the tape of a UTM. Concepts such as complex sytem or virtual machine or internal architecture sound mysterious and ungraspable so it seems plausible to attribute other mysterious and ungraspable phenomena such as consciousness to them. But if you go through a book by Dennett and replace complex system with a simple controller with a very long tape it is quite apparent how silly those arguments are.
Posted in armchair philosophy | 1 Comment

Why computers can’t think

I have a weakness for philosophy of the mind, but I find most of the writing on the topic either impenetrable or laughable. Recently, a good friend who is a philosopher recommended Searle’s book Mind: A Brief Introduction. It is indeed an excellent book. It makes lucid and careful arguments and, gratifying for a non-philosopher, it shows why most influential theories of the mind are wrong. The programme it sets is very modest and meant to delegate matters from philosophy to science as quickly as possible, but more on this later.

Searle is famous, among other things, for the Chinese Room argument, which is meant to clarify that mimicking understanding and having the experience of understanding are not the same thing. I found the argument compelling. The argument is also meant to have a farther reaching consequence, that the mind cannot be (just) a computational process.

The second part of the argument is less convincing, because you cannot in general prove impossibility by example. It’s close to a straw-man fallacy. I think a different argument is needed here and I will try to formulate it. It takes the Church-Turing thesis as its starting point: anything that can be computed can be computed by a Turing machine. If the mind is a computational phenomenon it must be exhibited by an appropriately programmed Turing machine.

If you don’t know what a Turing Machine is, have a look at one such device. It consists of a tape, a read-write head, and a controller (which can be as small as two-state, three-symbol automaton). I take it as obvious that this device cannot experience thought (i.e. consciousness), but let us elaborate for the sake of argument. The subjective experience of thought should come, if it arises at all, from the contents of the tape, the software. But that is not possible because TM computation is a local phenomenon: at any moment the machine is only “aware” of the controller state and the tape contents. The TM is not “aware” of the global contents of the state. Which would mean that the tape experiences thought. But the software on the tape can be a mind only insofar as it works for a particular encoding of the TM, e.g. the set of symbols. So the “seat” of artificial consciousness is not in the hardware nor in the software, but in the relation between them. But a mind is a real concept, and cannot be contingent on an abstract one. A TM and its tape cannot be or have a mind.

Searle doesn’t quite make this point, but he makes a similar one. A computation is defined by the intention to compute. A computer is a thing that is used by someone to compute — just like a tool is an object with an intended function, otherwise it is just stuff or perhaps an artistic artefact. Any physical system, to the extent that it has a state that changes according to laws that are known, can be used to compute. But in the absence of intent it is not a computer, it is simply a physical system. So the reason why mind cannot be explained as a computational phenomenon is that computation presupposes intent, which presupposes a mind. This would be a circular definition.

A computation is a supra-conscious behaviour: it express the (conscious) intent of another mind. For someone who practices and teaches mathematical subjects the distinction between conscious and algorithmic thought is quite clear. A student (but also a mathematician) can carry out a computation (or a proof) by following an algorithm either deterministically (a simple calculation) or heuristically (manipulation of an expression by symbol-pushing to achieve a proof-objective). This activity is not experienced as conscious in the same way that a mathematical activity involving genuine understanding is. There is no a-ha! moment; the quality of it is not intellectual excitement but a vague sense of boredom. However, this activity is not unconscious either in the sense that one is aware of something that is going on and can start it or stop it at will. I think a new term, such as supra-conscious, is required to capture this experience. I think this term also describes the experience of the Chinese Room operator.

What is mind then? I find Searle’s modest proposal acceptable: the mind is a biological phenomenon, a function of the brain. We don’t understand the details of this function, but it is not of a (purely) computational nature. The brain is not (just) a computer, so no computational simulations of the brain will produce a mind. 

Posted in armchair philosophy | 6 Comments

Real time real numbers

At the Synchronous languages workshop I have attended a very interesting and amusing talk given by Timothy Bourke and Marc Pouzet regarding real time real number computation and simulation. It’s a take on the classic puzzle: Suppose that two cars A and B are 100 km apart and they drive towards each other at 50 km/hr each. A fly flies from A towards B at 80 km/hr and, when it reaches it, flies back to A and so on. Whenever the fly reaches a car it turns and flies to the other car. The usual question is “how many km the fly travels before it is crushed between the two cars?”. The question with a twist is: “how many turns the fly makes before it is crushed between the two cars?”.

This is of course a Zeno-like situation and, as the problem is stated, the fly must make an infinite number of turns even though it travels a finite distance. So far so good. The amusing turn of the presentation is that there exist commercial (and expensive) software packages out there which can simulate this scenario. The market leader Simulink calculates the number of turns very efficiently: 42. Of course, 42 is an unnaceptably poor approximation for infinity. Or maybe it is.

The fist observation is that, of course, Simulink uses not real numbers but floating point representations, which introduce round-off errors. The second observation is that arbitrary precision representations for real numbers are possible and libraries are actually available. How would Simulink  work if it used arbitrary precision? For now let us ignore issues of efficiency: producing an answer as hilariously wrong as 42 cannot be excused by the fact that it was produced “efficiently”. It is rather preferable that an answer is not produced at all! Even if a Zeno-like computation would diverge, without producing any digits of output, it would be preferable to a misleadingly confident 42.

I have not tried to set up this puzzle using arbitrary-precision arithmetic so I am not sure how it works out. If we calculate not in ℝ but in the unit interval, would the computation produce no output digits or would it produce a stream of digits converging towards 1? I might just try it, but as interesting as the result would be, it wouldn’t settle the discussion though.

Marc Pouzet explained to me very convincingly the difference between simulation and computation in a hybrid continous-discrete system. The continuous values, at least in this case, are outside the computation. The computation samples continuous values, converts them to floating point and operates with them. So there are not one but two sources of possible errors in the simulation. The first one is, as discussed, due to the use of floating points, which introduce round-off errors. The second one is due to the sampling error, when the external real value is converted for the first time into a floating point representation. One of the most interesting sampling constructs is a time-sampling construct (up) which monitors an (external) continuous function and issues a (boolean) signal when the function changes sign.

How would both these sources of errors be affected by switching to arbitrary precision? Programmatic numerical errors will be eliminated, in the usual sense: all digits of a value that are calculated are correctly calculated. But what about sampling errors? In arbitrary precision arithmetic equality is not decidable because, simplistically put, you don’t know how many digits to look at before you can figure out whether 0.999999… and 1.00000… are equal or not.

So what would happen to Simulink if we wanted to use arbitrary precision both in the computation and in the simulation? Something beautiful would happen! Simulink would no longer allow you to describe physically unrealisable models. Physical functions must be computable. The fix is very simple: whenever you sample a continuous value, the margin of error of the sampling process must be specified. The margin of error will tell you that you don’t need to bother looking at the digits that mean less than the sampling error because they are irrelevant.

This is a somewhat subtle point: even if you are sampling the continuous external value into a value which you represent as arbitrary precision, it still needs to be made computable through the specification of a sampling error. The mechanism of arbitrary precision used in the computation is only meant not to introduce additional errors, but it cannot compute the uncomputable! And sampling a real number with infinite precision is uncomputable because it requires an infinite amount of information to be transmitted in a finite amount of time.

How about the up construct: it is external and it only needs to communicate a finite amount of information to the program (1 bit). Why is it unrealistic? The uncomputability of up comes out of the infinitely precise timing of the signal. In other words, the amount of information is still infinite because in addition to the bit representing the signal we have the potentially infinite precision of the moment at which it is received.

Of course, specifying a sampling error is perfectly reasonable: any instrument that can be produced must have finite precision, therefore must introduce a sampling error. The use of arbitrary precision arithmetic only prevents you from writing simulations that are nonsensical anyway. This is a good thing!

In the case of our initial puzzle, the introduction of a sampling error would quickly lead to the identification of a special case in the simulation, when the cars A and B are so close that the fly cannot tell whether it is on A or on B. And in this case the simulation could stop, and the answer could be 42! However, in this case the answer would be correct, because this would be correct behaviour for a system in which the sensors have a (known) sampling error. The greater the precision of the instruments, the more turns the fly will be able to make before it enters the uncertainty area.

As my colleague and area expert Martín Escardó can attest, I was quite skeptical regarding the usefulness of arbitrary precision real numbers. I was clinging to the faulty intuition that the real real numbers are things that you can grasp and work with. They are more complicated than that, it turns out. And the fact that the arbitrary precision representation forces you to face their subtlety, and thus avoid unrealistic computation (and simulation) scenarios is to me a strong indication that arbitrary precision real numbers could be quite useful!

Further reading

  • M.H. Escardo. Computing with real numbers represented as infinite sequences of digits in Haskell. CCA’2009 tutorial [zipfile]
  • Zélus: A synchronous language with ODEs [link]
  • T. Burke and M. Pouzet. A slow afternoon chez PARKAS and a very fast fly [link]
Posted in General | 5 Comments

Searching/fumbling for a proof assistant

I was always very impressed with people who use proof assistants to do their maths. It seems like something from the future, mechanised, unerring, complicated, slightly incomprehensible, and it makes traditional pen-and-paper (well, LaTeX) maths seem quaint, artisanal, even obsolete. The contrast cannot be more striking when it comes to proving some complicated system (e.g. an operational semantics) satisfies some boring sanity property (e.g. reduction preserves typing). For such work lying at the border between mathematics and bureaucracy the use of proof assistants makes such a big difference that it is quite likely that in the not-too-distant future their use will be mandated. Mathematically, such proofs are rarely interesting (and the interesting bits can always be factored out into nice-to-present lemmas) but quite important. It’s a lot more like making sure a piece of code is free of bugs than showing that a piece of maths is interesting.

So I am sold on the importance of proof assistants, but which one to use? The first one I’ve tried was Coq. It is used very broadly, it has an established community of developers, and has been the engine behind some heroic achievements such as finally conclusively proving the Four Colours Theorem.

After a couple of weeks of reading about it, watching some tutorials, playing with it, I produced a tutorial/demo/taster where I prove some simple things about lists in Coq. There was enough information in this tutorial to allow some (smarter than average) first year students to work out some of their assignments using Coq, which was very impressive to me.

More recently I became exposed to Agda and, for the sake of comparison, I reworked the same tutorial/demo/taster in Agda this time. Agda is newer and smaller than Coq, and not quite as well known.

It is something a bit strange about doing a tutorial when yourself are a beginner. From a certain point of view it is dangerous because you may say or do things that are, for the expert, unusual or wrong. On the other hand, there is something positive to say about the innocent approach of the non-expert. Where I fumble you are likely to fumble, where I get stuck, you are likely to get stuck. These are the kind of tutorials where you don’t actually learn Coq/Agda, but you learn whether you want to learn Coq/Agda.

You can obviously draw your own conclusions from what you see in the videos, but I will also share my impressions. Both systems are wonderful and there is something about them that tells me that this is what the future should be like.

Working with Coq felt like driving a tank. Working with Agda felt like driving a Formula 1 car. Coq feels big and powerful, but not very elegant. Agda feels slick and futuristic but you can only wonder how the lack of a tactics language will work out in the trenches of a big boring project. Some of the more wonderful features of Agda (mixfix operators, seamless Unicode support, interactive development) I can see trickling into mainstream programming languages. The elegant proof-term syntax of Agda I can see more appealing to mathematically inclined users. Whereas the nuclear-powered tactic language of Coq is bound to appeal to hackers who just want to get (impressive) things done. But that’s just my opinion, and I don’t think it’s a very original or unique one.

Perhaps the main difference, for my money, between the two is the way I get stuck trying to get things done. In Coq it is quite often the case that I know how to achieve a proof goal but I cannot figure out what is the tactic I should use to achieve it or how to find it — so I blame Coq. Whereas in Agda, when I get stuck it’s because I made a mistake and I get an incomprehensible error message — but then I blame myself. Because of that, I end up being quite frustrated working with Coq whereas I derive a lot of satisfaction working with Agda.

 

Posted in proof assistants | 4 Comments

Seamless computing

In an earlier post (“Remember machine independence?“) I pointed out my dissatisfaction with the so-called domain specific languages (DSL) which are popping up everywhere nowadays, in support of new and esoteric programming platforms. The name DSL bugs me because I find it misleading. The right name would be ASL: architecture-specific language. They are not meant to solve problems in a particular domain, they are meant to program a particular architecture. Many such languages have some of the trappings of higher-level programming languages but also enjoy unconventional syntax and semantics, designed to match the platform.

I am not buying it. I think it is a mistake. I think the first and foremost principle of a higher-level programming language is that of machine independence. A higher-level programming language should allow the programmer to solve a problem rather than drive a physical system. This was a crazy idea in the 1950s, which went through all the stages that a crazy idea goes through:

First they ignore you. Then they ridicule you. And then they attack you and want to burn you. And then they build monuments to you. (Nicholas Klein)

The you in this quote was John Backus, who I am not sure whether he got a monument out of it (or whether anyone really wanted to burn him), but he did get a Turing Award, which is not bad. Not accidentally, the title of his lecture on the occasion of the award was Can Programming Be Liberated From the von Neumann Style? [PDF]. And machine independence figured high on the agenda.

Now the idea of machine independence is getting ridiculed again. As it was back in the 1950s the reason is exactly the same: computer programming experts are comfortable with machine-dependence in a programming language, and are worried that machine-independence will cause a substantial loss of efficiency. Which is both true and misguided at the same time, unless you are working on system-level programming. For application-level programming (scientific, corporate, etc) it rarely helps that your program is twice as fast if it takes twice as long to write.

In a 1958 paper called Languages, Logic, Learning and Computers, (J. of Computers and Automation) J.W. Carr III writes this:

One would like to be able to use these computers up to the day of the next one’s arrival, with interchange of problems and yet efficiency. Can this be done?

Carr is referring to the upgrade from IBM-650 to the “totally different” IBM-701 and IBM-704. This is before FORTRAN had been invented so the answer was not obvious at the time. But can we extend this point to our situation and replace ‘computers’ with ‘architectures’. Should our programs, written for the CPU, not work for GPU, FPGA, cloud, etc? The skeptics say ‘no’ and often argue that a mismatch between the programming language model and the system model leads to inefficiency. Perhaps, but in many circumstances loss of performance is a price worth paying. If we look at the languages of today, especially functional languages, they do not match the CPU-based architecture very well at all. They are declarative where computers are imperative. They use functions where computers use jumps. To compile (efficiently) one for the other a lot of optimization work is put in and the native systems are augmented by runtime systems to provide essential services such as garbage collection. But this does not invalidate the principle of having good programming languages even if they don’t match the architecture on which they run. Designing machines especially to run functional languages is actually a cool and interesting idea (see the REDUCERON).

Here is the same objection, taken from Carr’s paper:

The problems of efficiency have been often emphasized by programmers who believe strongly in the continuance of hand-programming over translational techniques.

“Hand-programming” means programming in assembly or even binary. “Translational” means using what we now call compilers. It’s safe to say history vindicates Carr and not those overly concerned with efficiency.

Bob Harper’s blog has a number of excellent posts on the duality between programming in the language and programming for the machine. They are all quite relevant to the point I am making, so have a look.

Machine-independence is clearly the necessary ingredient to achieve the kind of portability considered desirable by Carr. So we are doing something about it. We are developing a compiler for a conventional imperative and functional programming language (we call it Verity) which would allow effective and efficient compilation to any architecture. We already have a back end for compiling to FPGAs, which we call the Geometry of Synthesis project. It supports higher-order functions, recursion, separate compilation and foreign function interfaces to native HDL, all in an architecture-agnostic way. I wrote about this project on this blog before.

Distributed seamless computing

Today I want to talk distributed computing. From the point of view of supporting architecture-independent programming, distributed computing is a mess. Lets start with a simplistic program, just to illustrate a point, a program that in a functional programming language I might write as:

let f x = x * x in f 3 + f 4

And suppose that I want to distribute this code on two nodes, one that would execute the function f, one that would execute the main program. I can choose a high-level distributed programming language such as ERLANG. Here is how the code looks:

c(A_pid) -> receive X -> A_pid ! X * X end, c(A_pid).
main() ->
 C_pid = spawn(f, c, [self()]), C_pid ! 3,
 receive X -> C_pid ! 4, receive Y -> X + Y end
 end.

In the first program I describe logical operations such as multiplication and addition, and a logical structure to the program. In the second program I describe a system: I spawn processes and send messages and handle messages. A program which should be trivial is no longer trivial.

Things get even worse if I choose to distribute my program on three nodes: one for f, one for the function applications, one for whatever is left of the main program. This is the ERLANG code for this:

c() -> receive {Pid, X} -> Pid ! X * X end, c().
b(A_pid, C_pid) ->
 receive
 request0 -> C_pid ! {self(), 3}, receive X -> A_pid ! X end;
 request1 -> C_pid ! {self(), 4}, receive X -> A_pid ! X end
 end,
 b(A_pid, C_pid).
main() ->
 C_pid = spawn(f2, c, []),
 B_pid = spawn(f2, b, [self(), C_pid]),
 B_pid ! request0,
 receive X -> B_pid ! request1, receive Y -> X + Y end
 end.

Can you tell that these two ERLANG programs compute the same result? If you initially wrote the first two-node version and decided to re-factor to three nodes how hard would the transition be? You need to rewrite most of the code.

There is a better way. The way you might want to indicate nodes to the compiler should not be more than that:

let f x = (x * x)@A in f 3 + f 4

or

let f x = (x * x)@A in (f 3)@B + (f 4)@B

Is this possible? To the programmer who needs to struggle with the intricacies of writing distributed code this may seem like a pipe dream. But it is possible. Theorists of programming languages have known since the late 80s and early 90s that computation and communication are closely related. Whether it was Milner’s Functions as Processes [PDF], Girard’s Geometry of Interaction [DOI] or the early work on Game Semantics [see my survey] the message is the same: function calls are nothing but structured communication protocols. And this is not just pure esoteric theory, we really knew how to write compilers and interpreters based on it. Nick Benton wrote a very cool tutorial which also applies functions-as-processes to distribution. Why didn’t we write this compiler yet? Because we were afraid it might be too inefficient.

I think we should be braver and more sanguine and have a go at it. This worry could be misplaced or exaggerated. With my student Olle Fredriksson we made the first step, which is writing a compiler for a core functional programming language which can be distributed using light-weight compiler annotations just as described above. We call it seamless computing because it is distributed computing where all the communication between components is via function calls rather than via explicit sending and handling of messages. Explicit communication is the seam in distributed computing. This distribution is communication-free.

A paper describing the compiler was presented at the International Symposium on Trustworthy Global Computing earlier this year [PDF] and there is a video based on it which you can watch online. The compiler is available for download from veritygos.org. It is based on Girard’s Geometry of Interaction and it compiles down to C and MPI. There are some good things and some bad things regarding its efficiency. On the one hand the compiler does not require a garbage collector, which is very desirable in the distributed tokens. On the other, the tokens being exchanged during the computation have a size proportional to the depth of the call stack, which can be large especially for recursive calls. But this can be avoided by using a better implementation, still without requiring a distributed garbage collector. This is ongoing work. We are also currently expanding the compiler to handle the whole of Verity, so that it has concurrency and shared state. We are making great progress and it’s a lot of fun both as a theoretical challenge and as an implementation and engineering one.

We will keep you posted as things develop!

Posted in seamless computing | 2 Comments