In computer science, the Boolean or logical data type is a data type, having two values (usually denoted true and false), intended to represent the truth values of logic and Boolean algebra. [Wikipedia]
In computer science, a tagged union, also called a variant, variant record, discriminated union, disjoint union, or sum type, is a data structure used to hold a value that could take on several different, but fixed types. [Wikipedia]
In certain circles there is only one way to think of a programming language: lambda calculus and some extra stuff. Depending on where you add this extra stuff, you get two different, although not incompatible, views on programming language design.
I hope we can all agree that the simply typed lambda calculus is a useful starting point for a programming language. But it must be merely a starting point. Because the only values are functions, comparing results of computations requires some theoretical somersaults. It also makes operations such as input or output problematic. So we need to add stuff to it. Without undue theorising let us jump into an example: booleans.
The first way of doing it is a minimalistic one: we just add a new type to the language (lets call it bool ) and some typed constants for manipulating booleans:
true, false : bool not : bool -> bool and, or : bool -> bool -> bool
Some are ground-type constants, and some are unary or binary first-order constants (a.k.a. operators). For easier future reference, lets call this the axiomatic approach because, if you keep in mind the Curry-Howard correspondence, it is sort of like adding arbitrary axioms to a natural deduction system.
This works but it is unsatisfactory on several accounts. The first one is the rather arbitrary way in which we are adding types and constants to the language. The second one is that the behaviour of booleans and the associated constants is not (yet) specified, so we need to add this essential information somewhere (language reference, operational semantics, etc.).
The third objection is that whenever we require yet another data type we need to extend our programming language, with all the hassle that entails. We even need to change the parser. For example, adding natural numbers would require yet another new type nat and a bunch of constants:
0, 1, 2, ... : nat neg : nat -> nat eq, gt, lt : nat -> nat -> bool
Then how about more complicated data types such as lists or trees, or types for dealing with state or other computational effects? There must be a better way.
A better way, which we shall call the type-theoretic approach, seems to be to provide the programming language with a way of constructing its own types, such as algebraic types or dependent types. Many functional programming language offer this facility. In OCaml, for example, booleans and naturals and the associated operations can be defined:
type bool = True | False ;;
let not = function | True -> False | False -> True;;
type nat = Zero | Suc of nat;;
let rec plus m n = match m with | Zero -> n | Suc m -> Suc (plus m n);;
This second — type theoretic — approach is clearly more elegant, solving the problems of the axiomatic approach: no more adhockery, behaviour is defined, the language can be extended uniformly in a nice and elegant way.
But is it always better, from all points of view?
There is at least one dubious scenario: interfacing with the physical world, including the computer itself. The built-in data types of the computer (char, int, float, etc.) are terribly ad-hoc and lacking in uniformity. However, the computer as a physical device is at its most efficient when working with these built-in types, because it has been designed to work with them. But it is very tricky to make a compiler that just knows how to connect a user-defined type (e.g. for natural numbers) and its constants with hardware-supplied operations. For starters, the match may not be perfect. The user-defined natural numbers for example don’t have size restrictions, so you might say that compiling them as computer integers is just wrong. But in general, since the type-theoretic approach defines new concepts within the language, their implementation proceeds via the definitions, simplified into the core of the language. This unpacking of definitions obscures whatever intended connections might be with a physical reality.
A similar related scenario is when a program needs to interface with a physical device, which can usually be reduced to sending bits and receiving bits. Here the type-theoretic approach becomes uncomfortably constraining and some languages simply abandon typing discipline when I/O is concerned. A similar problem does not arise in the case of the axiomatic approach, mainly because the axiomatic approach is too simplistic to even ask the question in the first place.
Consider memory (RAM) as a physical device. How do you interface with it? The program either sends a tuple representing an address and some data to be written at that address, to which the memory replies with an acknowledgement; or the program sends an address and receives the data stored at that address. To add this memory to a lambda calculus in an axiomatic way is not a problem.
We introduce the type of variables simply as a tuple of a write and a read operation with the signatures as informally described above. And a ram (second-order) operation would take a function of a var, bind that to the memory, use it, then release the binding upon exiting the body of the function.
type mem = (nat -> nat -> nat) x (nat -> nat) ram : (mem -> nat) -> nat
Read and write are just projections to select the right “method”. For example, a program such as
#new ram m; m := m + 1
would be desugared with the syntax above as
ram (\m.(proj1 m)(10)((proj2 m)(10) + 1)
A type-theoretic approach to effects is perhaps possible, using the rather well known monads or the funkier lenses or the more sophisticated algebraic effects. However, it is very difficult in general to make types match whatever arbitrary interface your physical system or device might have. And it is even more difficult to make the compiler hook up the generated code to the device interface, because of the same reason mentioned above: in the type-theoretic approach complex types are first usually translated into a simpler core language, and in the translation process the type-theoretic hooks that allow the connection to the physical device can get obscured or even lost.
Note that the device doesn’t need to be physical, it can also be a data type that doesn’t have a convenient algebraic representation. For example, in the axiomatic approach I can give the right interface to a circular buffer but fully specifying it type-theoretically is very hard — maybe not impossible-hard, but research-topic-hard at best.
To summarise, the axiomatic approach views the underlying lambda calculus simply as middleware for hooking arbitrary stuff together, whereas the type-theoretic approach sees the (appropriately enhanced) lambda calculus as a proper programming language. Fortunately the two approaches are not incompatible, because there is nothing stopping us from postulating axioms in a type theory. It may be considered in questionable taste, but it can be helpful!