Inventing an algebraic knot theory for eight year olds (IV)


Here is a quote that captures the ethos against which our maths club militates:

We take other men’s knowledge and opinions upon trust; which is an idle and superficial learning. We must make them our own. We are just like a man who, needing fire, went to a neighbor’s house to fetch it, and finding a very good one there, sat down to warm himself without remembering to carry any back home. What good does it do us to have our belly full of meat if it is not digested, if it is not transformed into us, if it does not nourish and support us?          – Montagne, “Of Pedantry”.

This is how mathematics is usually delivered to students: knowledge upon trust. And it is knowledge of a particularly mystical variety, where magical symbols come together in incomprehensible ways. This is the myth we need to dispel, this is the knowledge we need our kids to digest. Sometimes Algebra can be nothing more than a way to write down knots and their properties.

Last session, using the notation we developed together, went well. One unexpected hick-up was the fact that my students had not seen parentheses before, but we took that in our stride. They also hadn’t use variables before, so most of the examples were concrete. But the little use of variables we made was not confusing.

I even introduced some proper maths terminology, the Big Maths Words. First I asked them to give names to the composition operation, and they came up with some reasonable suggestions such as gluing, linking or joining. When I told them the Big Math Word was composition they liked it — it was “like music”, where you “compose” a song from notes. Indeed! They also had reasonable suggestions for the tensor, things such as stacking or building, but they didn’t much like the Big Math Word tensor. I don’t like it very much myself, I confess.

For the first 20 minutes we went over the new and tidied up notation. The rest of  the time, 30 minutes or so, we worked out examples of going from a drawn knot to the formula. It is not easy, in fact it is quite a challenge (try the exercises) so they often got things wrong. One pervasive error was getting confused by repeated tensoring (“stacking”) of tangles, K⨂K’⨂K” which often was written as K⨂K’K” — an interesting error.

Tomorrow’s session we will finally reach equations. We have already seen the equation I*=I, which was noticed while developing the notation. First we will start with similar coherence equations, which basically mean that two notations denote the same knot. Things like associativity of composition or tensor are typical coherence equations. But because children have no appreciation of parantheses and operator precedence associativity is perhaps not the best example. Functoriality of the tensor is much more interesting.

  1. Draw the tangle for (C◦C*)⨂(C◦C*).
  2. Draw the tangle for (C⨂C)◦(C*⨂C*).
In both cases the result is a couple of loops. The difference is in how we put the loops together and not in what the result is:
Functoriality means that no matter what tangles H1, H2, H3, H4 (for some reason the kids chose H as the standard variable for knots) the following two tangles, if they can be constructed, are always equal:

The unit and the co-unit also suggest compact-closed-like coherence axioms, which have a more topological flavour. Try this knot: (I⨂C)◦(C*⨂I). It looks like this:

But this is only a wire with some irrelevant bends. We have the equation

(I⨂C)◦(C*⨂I) = I

There is another similar equation, where the shape is like an ‘S’ rather than like a ‘Z’. Can you guess it?

The trove of equations is deep for this algebraic theory of knots and tangles and there is plenty of scope for children to discover equations once they understand what they represent. In addition to compact-closed we can also find equations from traced, braided and closed-monoidal categories.

But most importantly, I think the point is not to just give such equations to the students in order for them to develop skill in calculating knot equivalences using equational reasoning, so I won’t do that. That’s the kind of mystical maths I want to avoid. What is important are these two points:

  1. Understanding the idea of equational reasoning, that the same structure can be expressed in several equivalent ways, and that equations are the mathematical way to show this.
  2. Exploring the space of equations. Ideally, after a while the issues of soundness and completeness should emerge naturally.

Now, a brief comment to the cognoscenti. The categorical structure of this algebra of tangles is not an obvious one. We are using both the unit and the co-unit of compact closed categories, but our category is braided rather than symmetric. However, compact closed categories are symmetric so we are not in an obvious setting. The interaction of braiding and “compact closure” (informally speaking) gives equations like this:

C◦X◦C* = C◦C*

which correspond to this topological isomorphism of tangles:

These are two ways to describing the unknot. The category is clearly non-degenerate because we seem to be able to describe any knot. So combinatorial completeness seems an achievable result. Equational completeness (i.e. any equivalent knots can be proved to be so)  however seems like a challenging and interesting question!

In terms of related work, this seems similar enough: Peter Freyd and David Yetter, Braided compact monoidal categories with applications to low dimensional topology, Adv. Math. 77, 156-182, 1989.

If you are interested in another diagrammatic explanation of algebra also have a look at Pawel Sobocinksi’s entertaining and informative blog.

Posted in teaching | Leave a comment

Inventing an algebraic knot theory for eight year olds (III)

Unlike previous posts, this one precedes the math club meeting. The reason is that we need to tighten some loose ends in our emerging knot theory notation. The first two meetings (I and II) were all about exploring knots and ways to describe them using notations, and we made some impressive strides. But at this stage I need to exercise some authority and steer them firmly in the right direction:

  • We are going to use I rather than 0 (zero) to denote the identity of composition. Even though 0 is a neutral element for addition, as correctly spotted by the children, using it as the neutral element for composition is not idiomatic. 1 (one) is more standard, for various reason I will not get into here.
  • Using a fraction-like notation for “parallel” (tensorial) composition is also quite unidiomatic, although it is intuitive. We shall introduce the standard ⨂ notation instead.
  • We will stick with a generic duality operation _* so that our “unit” and “counit” are C and C* rather than C and D as some of the children insisted on.

These are of course small matters which we could put up with, at the expense of becoming incomprehensible to the rest of the world. The more important thing is that the interplay between “sequential” (“functional”) composition and “parallel” (“tensorial”) composition is not very easy to get right. Look at our overhand knot:

The individual components are well identified but a nice way to put them together is unclear. There is a sequential composition at the top (X◦X◦X) but how to connect that with the C, L and C* underneath is not clear from the decomposition. X “interacts” with C and C with L but not obviously sequentially or in parallel.

The way out is to introduce types. What are types? To paraphrase John Reynolds’s succinct explanation, they are a syntactic discipline that guarantees correct composition. And this is what we need here.

We shall always give knots a type (m, n) ∈ N². The left projection represents how many “loose ends” stick out to the left and the right projection how many to the right. So this is a (4, 6)-knot:

Note the implicit restriction: we allow no loose ends to stick out from the top or from the bottom. Is this a reasonable restriction? Yes, because if a bit of string goes down or up we can always “bend” it left or right then “extend” it until it lines up nicely with the other loose ends. This topological invariance of knots should be intuitively obvious to my young students, I hope.

So the types of our basic knot parts are:

Now we can say that we only allow the composition K◦K’ for knots K:(m,n) and K:(m’, n’) if n=m’, resulting in a knot K◦K’:(m, n’). Here are two composable knots:

And this is their composition, with “composition links” emphasised:

Note that because these are “strings” the loose ends don’t need to be “geometrically” aligned. It is enough to have the right number of loose ends to be able to glue them. And this is their actual composition:

Any two knots K:(m, n) and K’:(m’, n’) can be composed in parallel in a knot K⨂K’:(m+m’, n+n’). The composition of the two knots above is:

So now everything is in place to properly describe algebraically our overhand knot:

Above we introduced -•- as a shortcut for repeated sequential composition, useful for concision. Note that we needed the identity string I not just on the bottom, but also to the left and to the right, in order to make the loose ends match properly in sequential composition. This kind of “padding” with identities will be a useful trick that will allow us to define a lot of knots compositionally!


  1. (easy) Is this the only way to describe an overhand knot? Give an alternative description.
  2. (easy) Is this the only kind of overhand knot? If no, write the definition of a different overhand knot.
  3. (medium) Write the definition of a Granny Knot and its variations: the reef, the thief and the grief knots.
  4. (medium) Write a definition for the Shoelace Knot.
  5. (hard) Is there a knot that cannot be described with our notation?
Posted in teaching | 1 Comment

Chalmers’s Digital Radio

In previous posts I gave arguments against the apparently widely held belief, at least amongst readers of this blog, that all mental states and phenomena, consciousness in particular, are reducible to computational processes in the brain: computationalism. Whether this is or isn’t the case divides the philosophical community, with famous names agreeing (Searle, Penrose) or disagreeing (Dennett, Chalmers).

This post is not another argument against computationalism but an attempt to be more scholarly. I will take a brief critical look at one of the more influential papers in the area, Chalmers’s “A Computational Foundation for the Study of Cognition“ (Journal of Cognitive Science 12:323-57, 2011). There are some things I admire about this paper. For once, it sets out quite meticulously to define what a computation is, making the important distinction between abstract computation and implemented computation. Even though as a computer scientist I find many points of disagreement in the detail of how this theory is constructed, the fact that Chalmers considers it in the first place is admirable. Many computationalist philosophers fail to make this distinction, which renders some of their arguments sloppy and implausible. But at least Chalmers answers definitively some rather silly (at least to a computer scientist) questions raised by Searle and Putnam about the relationship between computation in the abstract and the concrete. The opening sections of the paper found me nodding in agreement most of the time.

The main thrust of Chalmers’s computationalist explanation is in Section 3 which spells out his theory of “organizational invariance of mental properties”. Unlike the previous section, where technical words are used in their technical sense, here technical-sounding words are used in a rather vague and informal way. Sentences like this set off all sorts of alarm bells, as the amount of sense they make is questionable: “Causal topology can be thought of as a dynamic topology analogous to the static topology of a graph or a network.” I shall not quibble with language though but try to assign sensible and charitable meaning to the argument in order to make a sensible counterpoint.

The argument is conveniently summarised by Chalmers, so I can quote it in its entirety:

The argument for this, very briefly, is a reductio. Assume conscious experience is not organizationally invariant. Then there exist systems with the same causal topology but different conscious experiences. Let us say this is because the systems are made of different materials, such as neurons and silicon; a similar argument can be given for other sorts of differences. As the two systems have the same causal topology, we can (in principle) transform the first system into the second by making only gradual changes, such as by replacing neurons one at a time with I/O equivalent silicon chips, where the overall pattern of interaction remains the same throughout. Along the spectrum of intermediate systems, there must be two systems between which we replace less than ten percent of the system, but whose conscious experiences differ. Consider these two systems, N and S, which are identical except in that some circuit in one is neural and in the other is silicon.

The key step in the thought-experiment is to take the relevant neural circuit in N, and to install alongside it a causally isomorphic silicon back-up circuit, with a switch between the two circuits. What happens when we flip the switch? By hypothesis, the system’s conscious experiences will change: say, for purposes of illustration, from a bright red experience to a bright blue experience (or to a faded red experience, or whatever). This follows from the fact that the system after the change is a version of S, whereas before the change it is just N.

Before pointing out the logical fallacy of this argument (can you spot it?) lets travel in time, to a strange and dystopian future where in the wake of an untold cataclysm humanity will have lost some of its scientific knowledge. Among the forever lost items is all knowledge of radio communication. So when the archeologists of the future discovered a radio set in working condition it caused somewhat of a sensation. They surrendered the radio for examination to their contemporary scientists and philosophers. Moreover, unbeknownst to all, one fully automated nuclear-powered radio station, hidden among the ruins of the past, was still eternally broadcasting.

The scientists of the future figured out quickly that the radio was an electronic device, plugged it in and delicately experimented with the various knobs and buttons. The on/off button and the volume knob where easy to figure out. Their initial theory was that they discovered a white-noise machine. The philosophers of the day explained at length and convincingly why ancient folk might have wanted to build such a machine. But their speculations where quickly abandoned when the scientists figured out that the most mysterious knob, which we call the dial, had a function after all: If you point it to a particular number on the scale the device starts producing music instead of noise. Being able to listen to what was probably the sacred music of the ancient culture caused widespread fascination. Historians established with some confidence that Madonna was none other than the mythical figure revered as the mother of God in the ancient religion of Christianity.

This was a most interesting discovery not only culturally but also technologically. In the dystopian future digital devices such as computers and digital music players were still manufactured. So there was a solid understanding of music-producing electronic devices. The scientists figured out rather quickly that the radio was not digital, but analog. However, its input-output behaviour was precisely that of a digital music player, so they assumed it was some kind of analog equivalent of a digital music player. They even figured out what the dial was: a broken shuffle knob.

But they were not quite sure, and that was really annoying them. They knew that the ancient civilisation had mastered some lost technologies so they kept wondering. Was that really a music player, just analog? One of their prominent philosophers argued persuasively that ‘yes’:

All the components of the radio are electronic, but analog. So you could isolate each electronic component, interface it to the rest of the device via a kind of modem, then reimplement its functionality digitally. When two digital components are connected we can just remove the superfluous modems. So, in a step-by-step way, we can replace the entire analog device with a digital device so that we end up with a digital music player.

Persuaded, most scientists gave up further investigations into the nature of the radio. Except one well funded and talented scientist who questioned the argument and set out to actually put it in practice. Painstakingly, she figured out and replaced the analogue components of the radio with digital components — until she reached the antenna of the radio. She was shocked by the behaviour of this component. It was a very basic-looking arrangement of conductive elements which produced output out of apparently nothing.  The only plausible hypothesis was that this was the memory element of the device, which stored all the music coming out of it. But it was much too simple in structure to be a memory. Our scientist was baffled. She wrote a few research papers about the stupendous analog memory technology of the ancients, but they got a mixed reception. The tabloid magazines made quite a stir around this mysterious technology but the established research publications refused to publish such speculation.

We are not sure what happened next. Did our scientist give up? Did she eventually discover electromagnetic waves and unlocked the true mystery of the radio? Most likely her lack of publications resulted in her losing her research funding.

Lets return to Chalmers now. His recipe for building a silicone computer out of a brain is similar to the above recipe for building a digital music player out of a radio set. The fallacy of the argument is called begging the question: when you assume that all brain components can be replaced with little silicone computers you already assume that all relevant functionality of that brain component is of a computational nature. But this is what the argument set out to prove in the first place, that the activity of the brain is of a computational nature.

If not computation, then what else is there in the brain that could realize mental states, in particular consciousness? Perhaps not radio-waves either. My money is on the theory that we don’t have the faintest clue. Just like our talented scientist from the future could not imagine radio waves, so we cannot understand what is going on, really, in the brain. Just like her, we understand a lot about various phenomena in the brain — except for the essence of how and why it works at a fundamental level. Our attempts to explain it using whatever concepts we have at hand will be, I have little doubt, as laughable as the miasmatic theory of disease or the phlogistonic theory of combustions. I admit, my theory is not a fascinating one.

Other than that, the paper is nice and readable. With this (fallacious) argument in place and its (mistaken) conclusion reached everything else fits neatly together. Just don’t use it as a recipe if you are trying to build a digital radio.

Posted in anticomputationalism, armchair philosophy | Leave a comment

Inventing a knot theory for eight year olds (II)

So I am running a math club for 8 year olds and we are reinventing (an) algebraic knot theory. We are not trying to reinvent existing knot theory, just to make a journey of intellectual and mathematical discovery. In the first meeting I gently shocked the children by showing them that mathematics is not only about numbers, but about patterns in general — such as knots. Being only 8 they are very theatrical in expressing emotion. WHAT!? Maths about KNOTS!? I loved the wild face expressions and hand gestures. We spent most time making and drawing knots and concluded that drawing knots is quite hard. We needed a better notation.

A notation for knots: the knotation

One preliminary observation that I made which resonated well was that there are a lot of numbers out there but we only need 10 symbols to represent all of them: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Of course we could only use 0 and 1, or just 1, but let us not get distracted. It is about having a nice and convenient notation not just any notation. So I drew a couple of knots on the whiteboard and I encouraged the children to try to identify the “smallest pieces the knot is made of”, where by a “piece” I meant this: we draw a small circle on the whiteboard and the “piece” is the bit of knot we see inside the circle. If our circles are small enough it is easy to see that the pieces inside the circles are quite similar. So here is an overhand knot with some parts of it circled.

  •  We notice that even if we ‘zoom in’ on a knot piece, as in the case of the red circles, the interesting knot piece ‘looks the same’:

  • We notice that if a knot piece is too large, as in the case of the blue circle, it can be clearly decomposed into smaller basic knot pieces.
  • We also notice that many knot pieces which we select here and there look quite similar, as in the case of the green circles.

I don’t think there is a clear methodology one can use in inventing a good notation for a concept or set of concepts. It is a fun and creative endeavour. It is a design exercise in which many criteria must be balanced. On the one hand we have expressiveness, which means that we can use the notation to represent a large range of concepts. But we also have elegance, which is more subjective but not entirely arbitrary. We want a small (but not too small) set of notations. We want it to be also concise. We want it to be pretty.

So we explored a while and in the end, with some careful guidance, we narrowed in on these constants:

The names C and X were suggested because of the shape of the knot piece they represent. L was short for “line”. These shapes can be flipped over and result in 3 other constants:

Here there was some disagreement between me and the kids. They were happy with the flipping operation and they were happy with X* but they didn’t like C* because it looked more like a D (without the vertical bar). Some of them insisted that we call them D and X*. L* was clearly a non-issue because it was just like L. I had to put my foot down and we settled on C, X, L and the -* operator.

This exercise also presented us with our first equation: L = L*. We didn’t insist on equations because they will become more important in a few weeks.

The final part of the hour was dedicated to developing notations for assembling knots out of these constants. There are two obvious ways to compose knot pieces, horizontally and vertically. Here is how the overhand knot can be split into basic components:

How do we put them together? The bits at the top can be quite clearly written as X⋅X⋅X. The bit below seemed like C⋅C*. And at the bottom we have L. Because the bit at the bottom is quite long many kids wrote it as L⋅L or L⋅L⋅L but they quickly realised our second equation: L = L⋅L. This observation led to an unfortunate turn of events in which the kids decided that is like 0 because added with itself stays the same, and therefore changed it from L to 0. Next week we will need to change it to the correct one (1) even if I need to exercise my executive power.

This is where we ran out of time. So we didn’t have any good notation for vertical assembly. Because we didn’t have vertical assembly we couldn’t really test our notation — to the trained eye it’s quite obvious it is not quite good yet. The way C, C* and L are composed is not entirely clear because vertical and horizontal composition interact. But it is a start. I want the children to move away from how they are used to study mathematics, as a stipulated set of rules in which answers are either correct or incorrect. I don’t want them to execute the algorithm, but to invent the programming language in which it is written. Coming up with stuff that doesn’t work is OK, as long as we notice it doesn’t work and we improve on it.

But what did we accomplish? Quite a lot. In one hour a dozen 8 year olds rediscovered some of the basic combinators used in monoidal categories, such as the braid, or such as duality, unit and co-unit as used in compact closed categories. We also rediscovered composition and its identity, and some of its equations. We used none of the established names and notations, but we discovered the concepts. We are on our way to inventing a fairly original categorical model of knots and braids. Next week we will quickly figure out the monoidal tensor and move on to the really exciting bit: coherence equations.

Posted in teaching | 1 Comment

Inventing a knot theory for eight year olds (I)

Last year I experimented with teaching seven and eight year olds about monoidal categories using a very concrete model of cardboard rectangles and bits of string. The experiment was a success, with children becoming able to represent formally a model or to construct a model from a formal specification. They also developed an understanding of some key concepts, such as coherence. The physical manipulation of cardboard and strings was fun and relating it to notation and equations was surprising to the children and, I think, interesting.

This year I am reprising the experiment, with some changes. The first thing I wasn’t too happy with last year was the overhead of constructing models of cardboard and string. Too many things can go wrong, from poking holes too close to the margins so that they tear to making the wires too long or too short. Not to mention that some children did not know how to tie a knot! The motivation of building structures of cardboard and strings was also unclear, I didn’t even have a name for these structures. So this year we are only looking at knots. They are a simpler to build physical structure since all you need is the strings — no more cardboard and hole punchers. They are also something children already know. And if they don’t know how to tie them it’s OK, because that’s what we are learning. The second change I am making is slowing down the pace and making the process more exploratory. I will not try to teach them anything in particular but we will aim to develop together a term algebra for knots.

The first meeting of our club started with a general discussion about what is mathematics. Third-graders are taught a very narrow view of the subject, which they tend to identify with arithmetic. “Tricky problems” was another common view of what mathematics is about. When I asked whether a triangle is something that mathematics deals with the kids started to question their own understanding of what it is really about.

I proposed that mathematics is the general study of patterns and this was immediately accepted. Then we talked a while about the need in mathematics to represent concepts and I introduced the concept of “notation”, illustrated with a potted history of numerals, notations for the numbers. We spent some time on hieroglyphic numerals, a topic which turned out to be very enjoyable in its own right. I wanted children to discover the inconveniences of this system compared with our Indo-Arabic system and they indeed came to that conclusion quite soon.

The next step was to introduce knots. The fact that mathematics would have something to say about knots struck many children as implausible but even the most skeptical changed their tune immediately when reminded that “knots are some kind of patterns we make with strings” and “maths is about patterns”. This abrupt change of mind and wholehearted acceptance of a previously rejected thesis makes working with children a great pleasure.

We started by making knots. The first bit of technical terminology we established was calling anything we make with strings “a knot”, including braids or bows or even a plain bit of string. This was deemed acceptable. Making knots and experimenting with various ways of tying them was of course a lot of fun.

This is also a knot!

The second step was representing knots by drawing them. The challenge was to draw a half-knot:

The children under-estimated the difficulty of the task, producing something like this:

This drawing of course makes line crossing ambiguous. Which line is above and which line is below?  I suggested leaving small gaps to indicate line crossings:

The principle was readily understood but the execution was problematic. Drawing the knot correctly proved a real challenge, with more than half the children failing to draw it correctly.

The upshot of all this struggle was a consensus that we really need a better way to write down knots, a “notation” (knotation?). The groundwork has been laid for creating a term algebra for knots next week.


Posted in teaching | 2 Comments

Two kinds of mathematics

Preliminary reading:

  • “The Origins and Motivations of Univalent Foundations” by V. Voevodsky’s [PDF]
  • “Proofs and refutations” by I. Lakatos [link]
  • “On proofs and progress in mathematics” by W. Thurston [link]

I find the recent surge in interest in the use of proof assistants in mathematics exciting. Because of the nature of proof assistants — computer systems by necessity lack mathematical intuition and must be fully formal — there is a renewed interest in logic and foundations of mathematics. What is special this time around is that there seem to be echoes of this interest among mainstream mathematicians. Because history always repeats itself some of the old arguments are being rekindled:  ”conceptual” versus “formal” mathematics, or call it “platonism” vs. “logicism” if you have a philosophical bend, or, as G. C. Rota nicely put it “truth” vs. “the game of truth”.

As most philosophical arguments, these arguments arise out of confusion. This is not slagging philosophy, just highlighting its role and methodology: clarifying confused (or confusing) concepts by employing them in dialogue, like new and untested weapons in a military exercise. Once the concepts are clarified they can be passed over to science, mathematics, etc. This passing of the baton is of course not an official ceremony, but it is a slow osmosis of ideas through the many membranes that separate various fields and sub-fields of knowledge. This is my attempted contribution to clarifying the confusion.

There is no need for a conflict between “formal” and “conceptual” mathematics, because there is no need to have only one kind of mathematics. This is as unnecessary as forcing a choice between “constructive” and “classical” mathematics: they simply are two different worlds, as explained very nicely by Andrej Bauer.  The universe of mathematics would be poorer without any of them. Similarly, the universe of mathematics would be poorer without “formal” or without “conceptual” mathematics. But what are they and what purpose do they serve?

 ”Conceptual” mathematics

This is the conventional established mathematical tradition that Lakatos and Thurston so beautifully describe in their papers. Somewhat surprisingly, conceptual mathematics is not “just” about formulating and proving theorems. It is not automatically the case that the mathematician who proves the most theorems or the hardest-to-prove theorems is the highest rated. The fascinating beauty of conceptual mathematics comes out of the creative tension between definitions, theorems and proofs. It is not always the case that the definitions are produced, then, using proofs, theorems are churned out. Sometimes the desirable theorems are known well before the definitions have been clarified (see early work on mathematical analysis). Sometimes a proof is so interesting that both definitions and theorems are adjusted to fit it (See “Proofs and Refutations”). Everything is in a state of flux and mistakes are possible but not necessarily devastating: they show room for improvement and further work.

Computer-aided proof assistants can get in the way of conceptual mathematics. Because they force a full formalisation of all concepts, the concepts can become buried under too much book-keeping and thus lose their intuitiveness. Proof assistants are excellent at checking correctness but are not necessarily the best vehicle for communicating ideas. And this is what mathematicians use proofs for: communicating ideas, not checking correctness. This is one of the reason why mathematical papers in any field are so impenetrable even to mathematicians in other fields. They use a specialised language that relies heavily on the social context in which that particular mathematics happens. Lacking the context it is almost impossible to read the papers. The mathematics happen inside mental models that are poorly described by the formalisations used in mathematical papers. Conceptual mathematics requires more than correctness of proof, this is why the Four-Colour Theorem and its proof which relies heavily on computation makes conventional mathematicians scrunch their noses in dismay.

 ”Formal” mathematics

Exploring mathematical concepts is fascinating, but sometimes we really just need to know if a statement is true or not. This is often the case if we build things and we are concerned that these things behave in a desired way. Since mathematics is the language we use to talk about things in the real world (e.g. physics or engineering) there is a practical concern with the validity of mathematical statemants. In the same context, the aesthetics of the definition-proof-theorem interaction are often of no interest.

Software is perhaps the most obvious example in which mathematical concepts are designed and used in a way that has an immediate impact on the real world. A computer program is an abstract mathematical concept that drives the behaviour of a concrete physical device, the computer. Most errors in the behaviour of a program come not from the hardware but from the software itself — they are of a purely mathematical nature. But programming languages and programs (existing ones, not as a matter of necessity) are ugly mathematical objects, devoid of conceptual elegance and encumbered by complications and contingencies. Their genesis is not driven by a taste for elegance, as most conceptual mathematics, but mainly by how efficient they are in driving the physical computer, itself a complicated and ad hoc device, to perform certain tasks. However the correctness of software is of crucial importance, and the only realistic way in which it can be achieved is by using formal mathematics, as embodied by proof assistants.

But it’s not just software. Properties of security protocols and encryption schemes are also examples of mathematical properties where correctness trumps in importance conceptual coherence.

Conceptual mathematicians would of course rather that their theorems were true. This is why computer checking of conceptual mathematics is of some interest. And when a massive formalisation effort such as Gonthier et. al.’s formalisation of the Feit–Thompson (odd order) theorem turns out an insignificant amount of errors there is good reason for high-fives and some sighs of relief.

 ”Formal conceptual” or “Conceptual formal” mathematics

The area of overlap between formal and conceptual mathematics is a fascinating one. This is where the Univalent Foundations project is happening.  What is the right foundation for mathematics? Not just in terms of being sound but also in making conceptual mathematics more like formal mathematics (i.e. checkable) and formal mathematics more like conceptual mathematics (i.e. readable).

Other areas of mathematics that should be both formal and conceptual are those where the choice of foundational structure is important: type theory, logic, constructive mathematics. Unlike classical mathematics, which take an everything-goes approach to its logical foundations, these areas must be more discerning in what rules are or are not acceptable. A proof of a property must not trespass into illegal territory, and for informal proofs it can be quite difficult to assess that.

As most computer scientists my own mathematical work has elements of the formal and the conceptual. An example of the mostly formal is recent work on distributing the Krivine machine [link][PDF] whereas an example of the mostly conceptual is work on game semantics [link][PDF]. I cannot brag that I have done a lot of work in mathematics that is both formal and conceptual, but if you forced me to give the best example it’s perhaps work on formalising some of the structures encountered in game semantics to facilitate reasoning [link][PDF].

Posted in armchair philosophy, proof assistants | 1 Comment

Two views of programming language design

In computer science, the Boolean or logical data type is a data type, having two values (usually denoted true and false), intended to represent the truth values of logic and Boolean algebra. [Wikipedia]

In computer science, a tagged union, also called a variantvariant recorddiscriminated uniondisjoint union, or sum type, is a data structure used to hold a value that could take on several different, but fixed types. [Wikipedia]

In certain circles there is only one way to think of a programming language: lambda calculus and some extra stuff. Depending on where you add this extra stuff, you get two different, although not incompatible, views on programming language design.

I hope we can all agree that the simply typed lambda calculus is a useful starting point for a programming language. But it must be merely a starting point. Because the only values are functions, comparing results of computations requires some theoretical somersaults. It also makes operations such as input or output problematic. So we need to add stuff to it. Without undue theorising let us jump into an example: booleans.

The first way of doing it is a minimalistic one: we just add a new type to the language (lets call it bool ) and some typed constants for manipulating booleans:

true, false : bool
not         : bool -> bool
and, or     : bool -> bool -> bool 

Some are ground-type constants, and some are unary or binary first-order constants (a.k.a. operators).  For easier future reference, lets call this the axiomatic approach because, if you keep in mind the Curry-Howard correspondence, it is sort of like adding arbitrary axioms to a natural deduction system.

This works but it is unsatisfactory on several accounts. The first one is the rather arbitrary way in which we are adding types and constants to the language. The second one is that the behaviour of booleans and the associated constants is not (yet) specified, so we need to add this essential information somewhere (language reference, operational semantics, etc.).

The third objection is that whenever we require yet another data type we need to extend our programming language, with all the hassle that entails. We even need to change the parser. For example, adding natural numbers would require yet another new type nat and a bunch of constants:

0, 1, 2, ...           : nat
neg                    : nat -> nat 
eq, gt, lt             : nat -> nat -> bool

Then how about more complicated data types such as lists or trees, or types for dealing with state or other computational effects? There must be a better way.

A better way, which we shall call the type-theoretic approach, seems to be to provide the programming language with a way of constructing its own types, such as algebraic types or dependent types.  Many functional programming language offer this facility. In OCaml, for example, booleans and naturals and the associated operations can be defined:

type bool = True | False ;;
let not = function 
 | True -> False
 | False -> True;;
type nat = Zero | Suc of nat;;
let rec plus m n = match m with 
 | Zero -> n
 | Suc m -> Suc (plus m n);;

This second — type theoretic — approach is clearly more elegant, solving the problems of the axiomatic approach: no more adhockery, behaviour is defined, the language can be extended uniformly in a nice and elegant way.

But is it always better, from all points of view?

There is at least one dubious scenario: interfacing with the physical world, including the computer itself. The built-in data types of the computer (char, int, float, etc.) are terribly ad-hoc and lacking in uniformity. However, the computer as a physical device is at its most efficient when working with these built-in types, because it has been designed to work with them. But it is very tricky to make a compiler that just knows how to connect a user-defined type (e.g. for natural numbers) and its constants with hardware-supplied operations. For starters, the match may not be perfect. The user-defined natural numbers for example don’t have size restrictions, so you might say that compiling them as computer integers is just wrong. But in general, since the type-theoretic approach defines new concepts within the language, their implementation proceeds via the definitions, simplified into the core of the language. This unpacking of definitions obscures whatever intended connections might be with a physical reality.

A similar related scenario is when a program needs to interface with a physical device, which can usually be reduced to sending bits and receiving bits. Here the type-theoretic approach becomes uncomfortably constraining and some languages simply abandon typing discipline when I/O is concerned. A similar problem does not arise in the case of the axiomatic approach, mainly because the axiomatic approach is too simplistic to even ask the question in the first place.

Consider memory (RAM) as a physical device. How do you interface with it? The program either sends a tuple representing an address and some data to be written at that address, to which the memory replies with an acknowledgement; or the program sends an address and receives the data stored at that address. To add this memory to a lambda calculus in an axiomatic way is not a problem.

We introduce the type of variables simply as a tuple of a write and a read operation with the signatures as informally described above. And a ram (second-order) operation would take a function of a var, bind that to the memory, use it, then release the binding upon exiting the body of the function.

type mem = (nat -> nat -> nat) x (nat -> nat)
ram : (mem -> nat) -> nat

Read and write are just projections to select the right “method”. For example, a program such as

#new ram m;
m[10] := m[10] + 1

would be desugared with the syntax above as

ram (\m.(proj1 m)(10)((proj2 m)(10) + 1)

A type-theoretic approach to effects is perhaps possible, using the rather well known monads or the funkier lenses or the more sophisticated algebraic effects. However, it is very difficult in general to make types match whatever arbitrary interface your physical system or device might have. And it is even more difficult to make the compiler hook up the generated code to the device interface, because of the same reason mentioned above: in the type-theoretic approach complex types are first usually translated into a simpler core language, and in the translation process the type-theoretic hooks that allow the connection to the physical device can get obscured or even lost.

Note that the device doesn’t need to be physical, it can also be a data type that doesn’t have a convenient algebraic representation. For example, in the axiomatic approach I can give the right interface to a circular buffer but fully specifying it type-theoretically is very hard — maybe not impossible-hard, but research-topic-hard at best.

To summarise, the axiomatic approach views the underlying lambda calculus simply as middleware for hooking arbitrary stuff together, whereas the type-theoretic approach sees the (appropriately enhanced) lambda calculus as a proper programming language. Fortunately the two approaches are not incompatible, because there is nothing stopping us from postulating axioms in a type theory. It may be considered in questionable taste, but it can be helpful!

Posted in programming languages | 2 Comments

The category of cardboard rectangles and bits of wool string (II)

In the second session of our category-theory club for school children we took a giant leap from the informality of diagrams towards formal mathematical notations. Our starting point were the cardboard-and-string diagrams from the previous session which we have already established that they were the same even though the strings had different lengths and colours and the bits of cardboard slightly different sizes. The reason they were the same (I chose not to use “equal” because of possibly overloaded meaning to the children) was because they were created following the same instructions.

The next step was now to draw on paper these diagrams and each child was requested to do that, and there were relatively few problems in performing this task reasonably. As one would expect, the differences between the resulting diagrams drawn by each child were now quite staggering but the idea that they were still the same diagram thankfully persisted because these were obviously accurate (as much as possible) renderings on paper of things that were the same in reality.

With this part of the work concluded successfully we moved on to what I thought will be the most challenging step: introducing a combinator notation for diagrams, inspired by compact closed categories. As a preliminary step they were required to draw the diagrams using only straight lines and right angles (the example was chosen carefully so no braiding was needed) and then we went over the diagram and tried to identify the simplest components. I actually introduced the term “component” as “a thing out of which we make other things” and after some ooh-ing and aah-ing the children seemed happy enough to accept the Big Word. Unfortunately at this critical stage a logistical problem struck: the room did not have a whiteboard or a blackboard, just an electronic “smart”-board. The irony of the term is obvious, and I am preparing a separate rant for the faddish abandonment of perfectly good classical tools of learning for fickle and fragile technology-heavy alternatives. So instead of drawing a big diagram on a big whiteboard and identifying the components I had to walk around each table and do that with each group of children, which wasted a lot of time.

The initial set of combinators we identified were: I (a line, the identity), E (a feedback loop, the compact-closed unit), U (a feedforward loop, the compact-closed co-unit) and f (a rectangle box, the only morphism so far in our category). We also discovered two ways to combine components, writing them next to each other (composition) or stacked on top of each other (tensor).

The obvious two tasks were first to write the formula corresponding to our running example diagram then drawing a diagram from a given formula. Some typical results are below.

Posted in teaching | Leave a comment

Why computers can’t think (III)

This is the final instalment of a series of posts on machine consciousness. It’s a fascinating topic but now that the Easter holidays are over I will need to refocus my attention on the equally fascinating topics I am actually paid to research.

The motivation for this series of posts is the recent personal realisation that consciousness (having subjective first-person experiences) cannot be explained as a purely computational phenomenon. The consciousness-as-computation view is widely held, especially among computer scientists. It is a very convenient materialist and physicalist answer to some of the deepest questions about the mind. I used to subscribe to this view, but now I realise it is based on some misunderstandings of what computation is. The gist of the argument is that a computing device, if it’s complex enough, could somehow develop consciousness. Our own human consciousness is given as an instance of this, because our brains, as computing devices, have the required level of complexity. To emphasise: we are not talking about the ability to mimic a consciousness, we are talking about having a consciousness.

In the first post on this topic I argued that from the Church-Turing thesis it follows that any complex computing device can be reduced to a Universal Turing Machine, which is a simple device, with a very very long tape — so there is nothing qualitatively special about the complexity of the device. In the second post I argued that even the universality of a Turing Machine is not a special property, because given a conscious UTM we can construct a finite state machine which can have some of the subjective experiences of the UTM. Neither of these two arguments are conclusively proving that a computer cannot think, but they show that the common view that computers can think is based on some bad arguments, a misunderstanding of what computers are.

No doubt, our brain does a lot of computation. As a computer it is an enormous finite state machine. The question is: does our consciousness arise out of the computation it does, or does it arise out of it being a brain? In other words, if we were to simulate all its computations in a computer, would that computer have the subjective experiences we have (joy, love, despair, confusion, etc.) or is it a non-computational function of the brain biological processes? Considering the arguments in my previous two posts, I incline towards the latter. It’s not an idealist or dualist position, it’s still materialist and physicalist, but it entails that there is a lot about the brain that we don’t know.

I will attempt now a stronger argument that machines cannot think by trying to derive a contradiction. Because there is nothing special about UTMs as far as consciousness goes, and because the brain itself is a FSM, let us focus on the (im)possibility of FSMs to have a subjective experience. For simplicity let us further focus on deterministic FSMs since they are simpler and as expressive as the non-deterministic ones.

Before we proceed, let us set some things straight. A FSM is commonly understood as an abstract structure, whereas we will consider the physical realisation of a FSM as a physical system with inputs, outputs and changing internal state. The following is not obvious, but essential: if a FSM is to have a subjective experience (consciousness) and if consciousness is computational then the nature of that subjective experience will only depend on the abstract FSM and not on the way the FSM is physically realised. So if a particular FSM is realised as a brain and it has certain subjective experiences, then if we realised it as a silicone computer it will have precisely the same. This is required for the sake of argument: if we allow the subjective experience to depend on the physical realisation of the FSM then it is no longer a computational phenomenon and we are done. Another even less obvious assumption is that if a FSM can give rise to a subjective experience then that particular subjective experience is not associated with the FSM as a structure, but must be associated with a particular run of that FSM. In other words if the FSM has a subjective experience it should arise only out of its interaction with the environment and the evolution of its internal state, i.e. the computation. As before, the argument is that if it arises otherwise then it makes consciousness not of a computational nature, and we are done. A final and essential clarification, of what it means to implement an abstract FSM as a concrete physical system: it means to establish a FSM homomorphism between the latter and the former. This means that if we can observe the states, inputs and outputs of the concrete FSM we can “decode” the states, inputs and outputs of the abstract FSM (using the homomorphism) so that the transitions are preserved.

Let us now try to derive a contradiction out of the existence of consciousness-producing FSMs. More precisely, we will show that the consciousness of the brain cannot arise simply out of its computational activity. Computationally, we know that the brain must be a FSM. Let us take two runs of that FSM, of equal length, but corresponding to clearly distinct mental states (e.g. one is taken on a happy day, one on a sad day). These runs correspond to traces of inputs, outputs and internal states. If consciousness is computation, whenever we run a FSM isomorphic to the brain-as-a-FSM so that the two traces are realised, that isomorphic FSM will feel happy, respectively sad.

But if, as we established, the subjective experiences depends on the run rather than on the FSM, in each case we can discard all the states that are not used and end up with two non-isomorphic FSMs which can realise two distinct subjective experiences if fed with particular inputs, and some other behaviour we don’t care about if fed with other inputs. Now let us construct the product of these two FSMs, i.e. the FSM with the transition

f(s_1,s_2)(i_1, i_2)=p(f_1 s_1 i_1, f_2 s_2 i_2)

where f_i are the two transition functions and p is the isomorphism  p ((s_1, o_1), (s_2, o_2)) = ((s_1, s_2), (o_1, o_2))

And this gives us the contradiction: a physical realisation of the product automaton is an implementation of both automata, by the composition of the implementation homomorphism with each projection homomorphism. But if subjective experience is purely computational then the same run of the product automaton must give rise to two distinct yet simultaneous subjective experiences. But having two distinct subjective experiences simultaneously — assuming that even makes sense — is not the same experience as having just one of the two alone. To me, this is a contradiction.

A FSM cannot have a subjective experience just by virtue of its computational behaviour. I suppose this must be for the same reason that a computer doesn’t get wet if it simulates a liquid and it doesn’t become radioactive if it simulates a nuclear explosion — no matter how precise the simulation. Why is a brain different? Why or how does a brain “make” consciousness? We only know that it does, because we experience it first hand, but we know little else.

A final qualification: when I say that machines cannot think I mean that they cannot think qua machines. This argument doesn’t rule out panpsychism — the doctrine that everything is more-or-less conscious — or idealism — the doctrine that the mind has a non-physical nature. But this argument doesn’t require them either. For details, I refer the reader to the work of Searle which proposes a modest and sensible programme which starts from the doctrine that consciousness is a biological process specific to the brain.

Acknowledgements. Steve Zdancewic and Lennart Augstsson pointed out that the brain must be a FSM thus simplifying this argument quite a lot.

Posted in anticomputationalism, armchair philosophy | 6 Comments

The category of cardboard rectangles and bits of wool string (I)

I am conducting an educational experiment: teaching children in year 2 and 3 (7-8 years old) about categorical concepts, more precisely about monoidal categories. Monoidal categories are an appealing formalism because of their connections with diagrams [1]. So we just discover together a bunch of stuff about diagrams.

The first lesson started with a general conversation on what Mathematics is and, unsurprisingly, children were surprised to learn that it is not all about arithmetic. The idea of mathematics without numbers seemed strangely exciting at least to some of them — a boy clenched his fist and muttered ‘yes!’ to himself when I announced that we will do the mathematics of shapes. Because this is highly experimental only 10 children were invited to participate, among the more gifted in their year. I divided them up into teams and assigned them tasks. The first task was to cut rectangles out of cardboard, punch holes in the four corners and label both the rectangles and the corners: these were going to be the morphisms of the category. Preparing 8 such rectangles proved to be unexpectedly challenging and took quite a lot of time. The children in each team first spent quite some time deciding on how they will cooperate. Two common models were parallel (everyone does the same thing independently) and pipelined (each one does a specialised task). There was also the unexpected difficulty of operating a hole puncher so that you punch just one hole rather than a pair of holes, with at least one team being unable to figure out how to insert the cardboard into the puncher to achieve this. The same team struggled with 5 or 6 ruined “morphisms” because the hole was too close to the edge.

As is usually done in categories of diagrams, identities and other structural morphisms are represented by lines. In our case we were going to use wool strings, tied to the holes. The hole patterns are, of course, the type of the morphism, i.e. the objects.  The next task was just to get familiar with the mechanics of tying bits of string to little holes in cardboard rectangles. This was a skill essential in building more complex morphisms out of simpler morphisms and compositions. Just like before, the level of dexterity of the children was unexpectedly low. For example several did not know how to tie a double knot and they had to learn that first. In the end, however, all teams mastered the task.

The final task was for each team to construct a composite morphism, which was specified in a relational style as a list of connections between node labels: 1c to 2a, 1d to 2b, 2c to 1a, 2d to 1b. With the cardboard morphisms at the ready and the string-tying skills mastered, this task was completed relatively painlessly and error free. Each team displayed their creation.

In the conclusion of the lesson, with all the diagrams displayed, I asked the children whether these were the same diagrams or different diagrams. Intuitively, because they knew they built the diagrams using the same instructions, they all answered that these were the same diagrams. I challenged their answer, objecting that the cardboard was different colours and sizes and the strings were different colours and lengths, to which they replied that these don’t matter. Indeed, what we wanted to matter is just the what-is-connected-to-what-ness of the diagram.

To summarise, in this first lesson we covered the basic skills of constructing morphisms in the the category of cardboard rectangles and bits of wool string: making cardboard morphisms and composing by tying the bits of string, using a relational-style specification. The children also started to develop the basic intuitions which will lead up towards a topological notion of equality of diagrams.

Figure. Two equal morphisms in the category of cardboard rectangles and bits of wool string.

Posted in teaching | 2 Comments