“Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary.”
— The Revised5 Report on the Algorithmic Language Scheme [1]
When studying programming languages, language theorists need an underlying model, to give a structure on which to prove or demonstrate concepts. Such models are usually defined in mathematical logic for a few reasons:
A good model should be as simple as possible, yet powerful enough that we can use it to model a large class of programming languages. One such model is the λ-calculus (Lambda calculus), and it is frequently used as a mathematical model for functional languages.
Aside: This is why many functional languages incorporate a λ in their logos.
Our goal in this chapter is to define the λ-calculus and demonstrate its utility in expressing different entities you should already be very familiar with while working with programming languages, e.g. booleans, lists, and natural numbers. λ-calculus itself is actually quite simple, and it uses the power of abstraction to represent all these features. We will look at the semantics of λ-calculus informally in this module. The following module will revisit concepts in this chapter and introduce formal semantics. The module after that will discuss adding types to λ-calculus.
What we will show is that even though λ-calculus has a paucity of concepts, it can nonetheless express all interesting computations. This fact gives programming language designers a baseline understanding for when features of their language are computationally powerful.
This short section will prepare you for the concepts that are to come, because the λ-calculus can seem extremely foreign. Basically, think of this as warning and mental preparation!
The λ-calculus is a mathematical language. In mathematical languages, we generally think of expressions as equivalent if they have the same value. For instance, in arithmetic, 2 + 2 is equivalent to 4; you can substitute 2 + 2 for 4, or vice-versa, and always result in the same value (so long as you’re careful about precedence). However, in the λ-calculus and other computational languages, there’s a further wrinkle: computation. Since the λ-calculus will be used to represent computation, equivalences can be complex. Thus, we tend not to think of equivalence, but of computation, with precise “from” and “to” states. In a mathematical sense, 2 + 2 is equivalent to 4, but also, if you’re actually performing the computation, you would replace 2 + 2 with 4, but never replace 4 with 2 + 2.
In the λ-calculus, instead of traditional mathematical operators (like +), we only have functions and function calls. More precisely, we have abstractions and applications. Abstractions are written with a λ, the name of the parameter, a dot (.), then the body. Applications work by substitution, so if we “call” the “function” λx.x + 2 with the argument 2, we get 2 + 2, by substituting x for 2. However, we have no + or 2 in the λ-calculus; you’ll see in the rest of this module how the lack of such basics doesn’t limit what we can do.
The syntax of the λ-calculus is as follows, presented in Backus Normal Form (BNF):
⟨Expr⟩ | ::= ⟨V ar⟩∣⟨Abs⟩∣⟨App⟩∣(⟨Expr⟩) | ||
⟨V ar⟩ | ::= a∣b∣c∣ | ||
⟨Abs⟩ | ::= λ⟨V ar⟩.⟨Expr⟩ | ||
⟨App⟩ | ::= ⟨Expr⟩⟨Expr⟩ | ||
The four rules define the four elements of the syntax of λ-calculus: expressions, variables, abstractions, and application.
To bridge these concepts with terms you may be more familiar with, “abstractions” are essentially functions, and “applications” are essentially function calls. But, don’t take this equivalence too far: the behavior of abstractions and applications may not match your expectations if you assume they behave exactly as in a programming language you’re familiar with.
Since applications are simply expressions concatenated together, we need precedence and associativity rules to understand how to read them. λ-terms are parsed as follows if without parentheses to indicate precedence:
Here are a few more examples to illustrate the precedence of λ-expressions:
λ-term | Equivalent λ-term, with least parenthesis necessary |
λx.((xy)λz.z) | λx.xyλz.z |
((λx.x)y) | (λx.x)y |
((xw)(zy)) | xw(zy) |
((xy)λx.z) | xyλx.z |
Exercise 1. Verify that the meaning of the expression will change when parentheses are removed for the terms in the right column.
We’ve now described the syntax of the λ-calculus, but syntax alone doesn’t tell us an expression’s meaning. We will now discuss the meaning of λ-expressions; more specifically, how to “compute” in the λ-calculus. Intuitively, we can see that a λ-expression consists of functions and calls, but more precisely:
Note that all abstractions have exactly one parameter. We’ll see soon that this does not limit the expressibility of the λ-calculus.
The only “type” in the λ-calculus is a function. So, all expressions are understood to be functions, and thus expressions like xy are always legal; in this case, the expression denotes the application of the function x to the argument y.
Aside: Note that we generally don’t give the functions defined in lambda calculus a name. That’s why many languages use the term “lambda” or “lambda functions” to refer to anonymous (nameless) functions.
For the following sections, we will be talking about the operational semantics of λ-calculus in an informal way. The formal introduction of operational semantics will be seen in the next module.
First we will start by discussing the simplest entity: variables. To be specific, we shall determine where variables obtain their meaning, and whether two occurrences of the same name refer to the same variable.
Consider the identity function: the simplest function, that just returns its only parameter. In the λ-calculus, it is denoted as λx.x. The x inside the body of the abstraction must refer to the same x in the variable position (the argument) of the abstraction. In formal terms, the latter is a binding occurrence of the former, and x is a bound variable. An occurrence of a variable that is not involved in a binding occurrence is called free.
In many programming languages, a free variable would be a semantic or runtime error. In the λ-calculus, it is valid, but just as meaningless; there simply are no per se errors in the λ-calculus.
Here are a few more examples to help build your intuition:
Example 2. In the λ-expression λx.x(λz.x)y, both occurrences of x are bound to the abstraction having variable x, and y is free.
Example 3. In the λ-expression (λx.x)(λz.x)y, the first occurrence is x is bound to the abstraction (λx.x). The second x and y are free variables.
Informally, we can see the set of bound variables for an expression E contains all variables which appear inside an abstractions that define them. This informal definition is fine for our understanding, but we will also define this property formally, as a formal definition can be used as a basis for proofs. Formal definitions tend to leverage the structural and recursive nature of the syntax; specifically, such definition will structurally and recursively define the property on every kind of expression. In this case, we need to have one definition each for abstraction, application and variable. Now let’s formally define the notions of free and bound:
Definition 1. The set of bound variables of an expression E, denoted by BV [E], is defined as follows:
BV [x] | = ∅ | ||
BV [λx.L] | = BV [L] ∪{x} | ||
BV [MN] | = BV [M] ∪ BV [N] | ||
Variable x is bound in expression E if x ∈ BV [E].
We can then define the set of free variables in a similar way:
Definition 2. The set of free variables of an expression E, denoted by FV [E], is defined as follows:
FV [x] | = {x} | ||
FV [λx.L] | = FV [L] ∖{x} | ||
FV [MN] | = FV [M] ∪ FV [N] | ||
Variable x is free in expression E if x ∈ FV [E]. An expression E is closed if FV [E] = ∅; that is, an expression is closed if it has no free variables. A closed expression is called a combinator.
Note that it is possible for a variable to be both free and bound. However, each occurrence of a variable in an expression is either free or bound, but not both. This is why our definition of “closed” depends on FV instead of BV .
Example 5. In the expression xλx.x, x is both free and bound. the first occurrence of x is free, and the second one is bound.
Example 6. Figure 1 is a λ-expression in which arrows are drawn from each bound variable occurrence to its binding occurrence. Variable occurrences with no corresponding arrow are free.
Exercise 2. Provide a modified definition of bound variables, so one can track which expression a bound variable is bound to. Note: a variable can be bound to multiple expressions!
Bound variables get their meaning from the binding occurrences; on the other hand, free variables do not have a meaning within an expression. For free variables to be meaningful, we would have to rely on an external definition, and of course, if we included that external definition as part of the expression, then the variable would now be bound. Thus, an expression being a combinator means that the computation can proceed without any additional information.
Example 7. For analogy, consider the following C function:
int f(int x){ return x + y; }
In the function f, x is bound to the only parameter and y is free. While f is valid in C, the computation can only proceed if y is defined externally.
Computation in the λ-calculus is based on the notion of reduction. An expression is reduced until no further reductions are possible, or some other stopping condition is reached. We will discuss these stopping conditions in detail in the next section. The primary reduction mechanism in the λ-calculus is known as the β-reduction (beta reduction) and is the subject of this section.
First, we identify which expression can be reduced.
Note that redex is short for reducible expressions, and the plural of redex is redices.
Consider the redex (λx.M)N in the definition. Following our usual analogy, λx.M is a function with parameter x and argument N. The expectation here is that this evaluates to M with every occurrence of x substituted for N. In fact, since there could be another abstraction in M which defines x again, we only replace each occurrence of x which is bound to outermost x in λx.M. This is the same as the free occurrences of x inside M.
Note that this substitution style is not how most programming languages are evaluated. Instead, languages have stacks or stores which contain the values of variables, and a variable’s value is looked up when the variable is encountered. Substitution is sufficient, and clearer, for the λ-calculus, but we’ll see when and why we need to actually store variables in later modules.
Since this process involves substitution, we need to have a formal notation for substitution: Let M[N∕x] be the substitution of N for all free occurrences of x in M. That is, M[N∕x] is the expression M, with all occurrences of the variable x replaced with the expression N. Using this notation, we define β-reduction as follows:
Definition 4. (β-reduction) Let M and N be λ-expressions, x a variable. The relation →β (β-reduction) is defined by the rule
Further, if C[(λx.M)N] denotes an expression C in which (λx.M)N appears as a subterm, then
We can describe these processes as “(λx.M)N β-reduces to M[N∕x]” and “in C, (λx.M)N β-reduces to M[N∕x]”, respectively.
There are a few things that are worth noting in this definition:
We will also introduce the following notations:
Repeated application of β reduction (→β n, →β ∗, →β +) is, in essence, our computation. The concepts of β-expansion and β-equivalence will give us a starting place to discuss whether two computations are equivalent, even if they’re not equivalently written.
Now, we need to take a step back and look at substitution. The intuition here is to just replace every free occurrence of a variable, say x, with an expression, say T, in an expression E. Remember that every expression is a variable, an abstraction, or an application, the latter two of which can contain subexpressions. So, we need to describe the process of substitution for each of these cases:
As usual, we want a formal definition:
Definition 5. (Substitution, provisional) Let E and T be λ-expression and x be a variable, Denote E[T∕x] the substitution of T for x in E, defined below:
x[T∕x] | = T | ||
y[T∕x] | = y (if y≠x) | ||
(MN)[T∕x] | = M[T∕x]N[T∕x] | ||
(λx.M)[T∕x] | = λx.M | ||
(λy.M)[T∕x] | = λy.M[T∕x] | ||
Here are a few examples of β-reduction and substitution in action:
Earlier, we said that defining abstractions so they only take one parameter will not hinder expressibility. The last two examples illustrate this point: λx.λy.x is an abstraction which can be used like a function which takes two expressions as arguments and produces the first one; similarly, λx.λy.y is an abstraction that takes two expressions as arguments and produces the second one. The difference between those two functions is that (λx.λy.x)a reduces to λy.a, which produces the first argument passed to it when any second argument is passed, while the second function produces an identity function (λy.y), regardless of what the argument for parameter x is. This style of reduction-by-substitution allows us to build multi-parameter functions as just a special case of single-parameter functions.
Aside: You might have heard the term currying, named after Haskell Curry, which is the process that converts a function taking multiple arguments into nested one-parameter functions which return functions accepting the remaining arguments. In λ-calculus, this is the most natural style of passing multiple arguments.
Also earlier, we said that the identity function is (λx.x), and now it is (λy.y). Similar to functions, if we replace the identifier in the variable part of an abstraction and their bounded occurrences with another identifier, the behavior of the abstraction should be the same. We express this observation with a principle known as α-conversion (alpha conversion):
Definition 6. (α-conversion) Let E be a λ-expression. We define the relation =α (α-equivalence) by the rule:
given that y∉FV [E]. If C[M] is an expression C containing M as a subterm, and M =α N, then C[M] =α C[N]. Further, =α is an equivalence relation. Finally, α-conversion is the replacement of a term with an α-equivalent term.
Applying α-conversions to an expression should not change its behavior; therefore, α-equivalent expressions should have the same meaning... or do they? Consider the following pairs of reductions:
(λx.λy.x)ab | →β (λy.x)[a∕x]b | ||
= (λy.x[a∕x])b | |||
= (λy.a)b | |||
→β a[b∕y] | |||
= a | |||
(λx.λy.x)yz | →β (λy.x)[y∕x]z | ||
= (λy.x[y∕x])z | |||
= (λy.y)z | |||
→β y[z∕y] | |||
= z | |||
As noted earlier, the expectation was λx.λy.x to behave like a function that given two arguments will produce the first one, yet for the second example we have seen the opposite behavior! The difference between the two examples is:
Of course, if we were to replace λx.λy.x with some α-equivalent expression such as λc.λd.c, then the two reductions would behaving correctly again.... but what about (λc.λd.c)dc?
What actually happened is that the binding occurrence of x in the right example is changing. Before reduction, x is bound to the x in the outer abstraction. However, after the reduction against y, the binding occurrence for x, which is now a y, changed, so that this x is now bound to the inner abstraction. We call this behavior dynamic binding. More specifically, dynamic binding refers to systems where its binding occurrence could change in the middle of a reduction. While dynamic binding is useful in some cases, it is undesirable for now. We want static binding instead: binding occurrences should never change throughout the computation. Thus, a change in the definition of substitution is required:
Definition 7. (Substitution, corrected) Let E and T be λ-expression and x be a variable, Denote E[T∕x] the substitution of T for x in E, defined below:
x[T∕x] | = T | ||
y[T∕x] | = y (if y≠x) | ||
(MN)[T∕x] | = M[T∕x]N[T∕x] | ||
(λx.M)[T∕x] | = λx.M | ||
(λy.M)[T∕x] | = λy.M[T∕x] (if y≠x,y∉FV [T]) | ||
(λy.M)[T∕x] | = λz.M[z∕y][T∕x] (if y≠x,y ∈ FV [T]; z is a “new” variable) | ||
This definition is mostly the same as the previous one, except for one part: instead of letting the abstraction capture the variable, we rename the variable in the abstraction and the bounded occurrences beforehand to some name that was never used before. Of course, a new variable would never have free occurrences in T above, so the capturing behavior will never occur.
We will demonstrate our new, now correct, substitution, by recomputing the β-reduction of (λx.λy.x)yz above:
We will now take a closer look at how computation proceeds in the λ-calculus. Essentially, computation in the λ-calculus is a series of reductions. But, we have not yet decided what to reduce and what not to reduce. Let’s start with an obvious option: reduce everything. If the expression contains a β-redex, we β-reduce it, and repeat the process until no β-redex is found in the expression.
However, this process may not terminate. It is possible for reduction of a β-redex to produce a β-redex ad infinitum. Thus, not all terms have a β-normal form:
This expression has no β-normal form since reduction by taking the only β-redex yields the original expression.
For that reason, it is worthwhile to consider other kinds of reduction rules, or reduce to other normal forms. For example, recall that sometimes you might be tempted to write this in Racket:
1(map (lambda (x) (f x)) lst)
While it is always recommended to do this instead:
1(map f lst)
Here is an intuition: if two functions accept the same set of values S as argument and produce the same value when
supplied the same argument, then these two functions are equal. Clearly, (lambda (x) (f x))
and f
are the same function
by this intuition, since the expression (equal? ((lambda (x) (f x)) a) (f a))
would be true for any given valid
input a
. Put in λ-calculus, (λx.fx)y = fy, since all we’re doing is passing through the y as x, and thus,
λx.fx = f.
Aside: This intuition is called function extensionality.
We will formalize this intuition in λ-calculus by defining another kind of reduction, called η-reduction (eta reduction):
Definition 9. (η-reduction) η-reduction is denoted by the following rule:
If C[M] denotes an expression in which M occurs as a subterm, and M →η M′ then C[M] →η C[M′].
Analogously to the definition of β-expansion and β-conversion, we can define an η-expansion relation, ←η and η-conversion relation, = η. Note that η-redices are not reduced during β-reduction, and β-redices are not reduced during η-reduction. A reduction in which both reductions may occur is called βη-reduction.
Our goal is to reach some stopping condition, and we wish to define that stopping condition syntactically. These conditions are, in general, referred as normal forms. The normal form which we will spend most of the time talking about is β-normal form; however, other normal forms exists as well. We can speak, for example, of η-normal form and of βη-normal form.
Among the various alternative definitions of normal form that have been studied, the most important for our purpose in known as weak normal form (WNF), defined below:
Definition 10. An expression E is in weak normal form (WNF) if every β-redex in E lies within the body of some abstraction.
The intuition behind WNF is that, in real programming languages, computation does not occur inside a function until it is called. Hence, we do not consider redices that occur inside an abstraction as candidates for reduction until the abstraction itself has been supplied with an argument and reduced, at which point the redices inside become exposed.
Example 13. The term λx.(λy.y)(λz.(λw.w)z) is in WNF but not in β-normal form. The term contains two β-redices: (λy.y)(λz.(λw.w)z) and (λw.w)z, but the former lies within the “ λx ” abstraction, and the latter lies within the “ λz ” abstraction.
Every term in β-normal form is also in WNF, as WNF is a strictly weaker criteria.
Whenever we speak of “normal form” without qualification, we are referring to β-normal form.
So far, we have not specified which redex to choose at any given time. For example, we have two choices of β-redex to reduce in the expression (λy.y)((λx.x)b).
(λy.y)((λx.x)b) | →β (λy.y)(x[b∕x]) | ||
= (λy.y)b | |||
→β y[b∕y] | |||
= b | |||
(λy.y)((λx.x)b) | →β y[(λx.x)b∕y] | ||
= (λx.x)b | |||
→β x[b∕x] | |||
= b | |||
In this case, the two reductions taking different paths have reduced to the same expression. Is this true for all of λ-expressions which have a normal form? Luckily, the answer is yes! This theorem was proved by Alonzo Church and J.Barkley Rosser [2].
Theorem 1. (Church-Rosser, 1936) For λ-expression E1,E2, and E3, if E1 →β ∗E2 and E1 →β ∗E3, then there exists an expression E4 such that E2 →β ∗E4 and E3 →β ∗E4 (up to α-equivalence).
The proof is provided in a separate document, but you don’t have to be familiar with it:
Aside: Proof of Church-Rosser Theorem ( https://student.cs.uwaterloo.ca/~cs442/W25/extras/c-r-thm-proof.pdf)
The Church-Rosser Theorem is one of the fundamental theoretical results about λ-calculus. As illustrated in Figure 2, it guarantees that when faced with a choice of redex, it is always possible to arrive at the same final expression regardless of your choice. More precisely, if an expression E1 can be reduced by zero or more reduction steps to either expression E2 and E3, then there exists some other expression E4 to which both E2 and E3 can be reduced.
This idea implies that there is a unique normal form for any expression. The theorem says that E2 and E3 can be reduced to some other expression, and normal forms are irreducible by definition. One important thing to note is that the Church-Rosser Theorem does not guarantee the existence of β-normal form; the theorem just indicates that if the reduction terminates, it will reach a unique normal form. We summarize this idea in the following corollary:
Proof. Let E1 be an expression that reduces to normal forms E2 and E3. By the Church-Rosser Theorem, there is an expression E4 such that E2 →β ∗E4 and E3 →β ∗E4 (up to α-equivalence). However, E2 and E3 are both normal forms, hence irreducible. Therefore, the only possible reduction to E4 from E2 and E3 must take zero steps, so we have E2 =α E3 =α E4. □
Although the Church-Rosser Theorem guarantees a unique β-normal form, a λ-expression may have several different instances of other kinds of normal forms.
Exercise 4. Prove or disprove the following statement:
For λ-expression E1,E2, and E3, if E1 →β E2 and E1 →β E3, then there exists an expression E4 such that E2 →β E4 and E3 →β E4 (up to α-equivalence).
Note the absence of asterisk as superscript for all →β operators.
Although the Church-Rosser Theorem guarantees that no matter how we choose redices to reduce, we can never reach an expression from which we can’t reach the unique β-normal form given that one exists. However, most real-world programming languages have much more clearly defined policies regarding the order in which computation proceeds, because order of evaluation can affect the meaning of programs in most languages. We are going to see that even in λ-calculus, the policy for choice of redex can matter when dealing with expressions that could have possibly infinite reduction paths. We will examine two such policies, known as reduction strategies, for the λ-calculus. A reduction strategy is a policy which, given a λ-term, decides which redex to reduce next. We will see that even in the presence of the Church-Rosser Theorem, the two strategies have quite different properties.
We first consider the reduction strategy you commonly see in “mainstream” programming languages called Applicative Order Reduction(AOR): we will always choose the leftmost, innermost redex at a step. A redex is innermost if it contains no other redices.
In this example we can see why AOR is similar to the programming languages you have seen2 : the argument to an abstraction is reduced to normal form before it is substituted. For this reason, AOR is sometimes dubbed as call-by-value, and demostrates a semantic property called eager evaluation.
Here is another example with AOR:
This example does, in fact, have a normal form: y. Yet we cannot reach it using the AOR strategy. The primary goal for reductions is to reach the normal form, but AOR fails to do so in this particular case.
Let us consider another reduction strategy, known as Normal Order Reduction(NOR). This time, we always choose the leftmost, outermost redex at each step. A redex is outermost if it is not contained in any other redex. Here is (λx.fx)((λy.gy)z) reduced in NOR:
One of the properties of NOR is that arguments to a function are not evaluated until they are needed. Above, at each reduction step, the formal parameter is replaced verbatim with the argument—that is, without simplifying the argument further before substitution. NOR is sometimes called call-by-name, and demonstrates a semantic property called lazy evaluation, which we will explore later.
Under NOR, (λx.y)((λx.xx)(λx.xx)) reduces to the normal form correctly:
As demonstrated, while NOR reduces this expression immediately to y, AOR immediately gets caught in an infinite reduction of the argument, and makes no progress.
In general, our goal in reducing an expression is to reach the normal form, but we’ve seen at least one example in which one obvious reduction strategy fails to do so. One might therefore ask whether there is always a reduction strategy that will reach a normal form. Luckily, there is. NOR always reaches the normal form, if one exists:
Theorem 2. (Standardization, 1958) If an expression has a normal form, then Normal Order Reduction is guaranteed to reach it.
The proof of the Standardization Theorem was due to Curry and Feys [3]. We do not present it here.
Thus, even though the Church-Rosser Theorem guarantees that no choice of redex puts the β-normal form (if it exists) out of reach, our choice of reduction strategy is nonetheless important.
Aside: In a purely functional setting, the Church-Rosser Theorem guarantees that the reduction will lead to the unique normal form. However, in any programming languages with state, such as mutable variables or fields of objects or I/O, a difference in the order of evaluation could cause completely different behaviors.
A notorious example in C and C++ is the order of evaluation for arguments of functions and operators. Consider the following C++ code snippet:
1int f(){ cout << 'f'; return 1; } 2int g(){ cout << 'g'; return 2; } 3int add(int x, int y){ return x + y; } 4 5int main(){ 6 int i = add(f(),g()); 7 cout << endl; 8 return i; 9}
What will be the order of the characters printed? The answer is “depends on the implementation”, since all possible permutations of the output are allowed according the C++ standard. The C++ standard does not specify the order of evaluation for arguments of functions and operators. Hence, the choice of which argument to evaluate first is completely up to the particular compiler. For example, if a particular compiler chooses to always evaluate the rightmost argument first, the output would be gf.
So far, we’ve introduced λ-calculus as a model of computation. However, we haven’t discussed how expressive it is compared to the programming languages you use. Alonzo Church, the inventor of λ-calculus, intended to use λ-calculus to provide a foundation for all of mathematics, but it was shown to be inconsistent for this purpose by Kleene and Rosser in 1935. Nonetheless, Church and Turing had proved that their models of computation, the λ-calculus and the Turing Machine, are equivalent in terms of expressive power. While computation in the Turing Machine is rather cumbersome to manipulate and analyze due to the stateful nature of it, programs in λ-calculus are stateless, making them much easier to perform formal analysis on.
But, it’s not obvious in λ-calculus alone whether such analysis would be useful. To prove that λ-calculus is useful, we need to show that it is expressive enough to represent the kinds of computation we might want in real languages.
In this section, we are going to discuss how to imitate real-world programming using λ-calculus, by showing how to implement different (functional) programming constructs. More specifically, we are going to introduce
Before we start our discussion, it would be nice to have some kind of shorthand notation for “the λ-calculus representation of x” where x is some entity you would see in modern programming languages. We are going to use the double square brackets (⟦,⟧) to denote this correspondence. For example, we show the λ-calculus representation for the identity function (id for short) here:
Note that this shorthand notation is not a part or extension of the language of λ-calculus; an expression only becomes a λ-expression after we expend all the shorthand notations into their corresponding λ-expression.
It is sometimes also useful to denote some arbitrary expressions; we will use upper case letters for them.
Before we introduce the primitive for Booleans, we should note that Booleans are often used in conditional expressions. Thus, it would be helpful if we picture how to represent conditional expressions before we designate what true and false are. The simplest conditional expression would look like this:
where B is the Boolean value, T and F are the expressions to take if B is true or false, respectively. As every expression is a function in λ-calculus, we can imagine it’s a function that takes three arguments: the boolean value and the two expressions. Since the boolean values are also functions, we can let those values be a selector which given two values as arguments, and produce one of them.
Alternatively, with the arguments applied:
Then it follows that our ⟦true⟧ and ⟦false⟧ would be, respectively, functions that given two values as arguments, produce the first or the second value.
These should look familiar!
To simplify it a bit further, we can just apply β-reduction three times and obtain a representation which is just a concatenation of B, T and F:
We are going to use this concatenation-based representation to cut down our number of reductions needed.
Modelling conditionals in this way might seem strange, but we can show that it works by tracing the reduction with B being true or false.
⟦if true then T else F⟧ | := ⟦true⟧ ⟦T⟧ ⟦F⟧ | ||
= (λx.λy.x) ⟦T⟧ ⟦F⟧ | |||
→β (λy. ⟦T⟧ ) ⟦F⟧ | |||
→β ⟦T⟧ | |||
⟦if false then T else F⟧ | := ⟦false⟧ ⟦T⟧ ⟦F⟧ | ||
= (λx.λy.y) ⟦T⟧ ⟦F⟧ | |||
→β (λy.y) ⟦F⟧ | |||
→β ⟦F⟧ | |||
Using the construction of boolean values and primitives, we can then build boolean operators as follows:
⟦and p q⟧ | = ⟦if p then q else false⟧ | ||
⟦and⟧ | = λp.λq.pq(λx.λy.y) |
⟦or p q⟧ | = ⟦if p then true else q⟧ | ||
⟦or⟧ | = λp.λq.p(λx.λy.x)q |
⟦not p⟧ | = ⟦if p then false else true⟧ | ||
⟦not⟧ | = λp.p(λx.λy.y)(λx.λy.x) |
We will show that our ⟦not⟧ is correct by showing that ⟦not true⟧ and ⟦not false⟧ behave correctly. Ideally, we want them to evaluate to ⟦false⟧ and ⟦true⟧ , respectively.
Example 19. Reductions of ⟦not true⟧ and ⟦not false⟧ .
⟦not true⟧ | = ⟦not⟧ ⟦true⟧ | ||
= (λb.(b ⟦false⟧ ⟦true⟧ )) ⟦true⟧ | |||
→β ⟦true⟧ ⟦false⟧ ⟦true⟧ | |||
= (λx.λy.x)(λx.λy.y) ⟦true⟧ | |||
=α (λx.λy.x)(λx.λz.z) ⟦true⟧ | |||
→β (λy.(λx.λz.z)) ⟦true⟧ | |||
→β (λx.λz.z) | |||
= ⟦false⟧ |
⟦not false⟧ | = ⟦not⟧ ⟦false⟧ | ||
= (λb.(b ⟦false⟧ ⟦true⟧ )) ⟦false⟧ | |||
→β ⟦false⟧ ⟦false⟧ ⟦true⟧ | |||
= (λx.λy.y)(λx.λy.y) ⟦true⟧ | |||
→β (λy.y) ⟦true⟧ | |||
→β ⟦true⟧ |
Here is a question: what is the difference between the if you see here and the one you see in real-life programming languages? Consider the AOR strategy. Our ⟦if true then T else F⟧ will actually reduce both the true and false branches before returning to the one we want. Similarly, our and and or constructions don’t perform short-circuit evaluation either. While this isn’t what a mainstream programming language would do, the Church-Rosser theorem provides us with the guarantee that as long as there are no infinite reductions on either branch, we will be able to reach the desired expression. Considering the absence of side effects in λ-calculus, the only downside of evaluating both branches would be the decrease in efficiency. We will discuss implementing short-circuit evaluation later.
Aside: The fact that booleans are represented by their behavior should also look familiar if you’ve read through the Smalltalk module. In a way, Smalltalk’s booleans are a lot like λ-calculus booleans!
Video 2.1 ( https://student.cs.uwaterloo.ca/~cs442/W25/videos/2.1/): Lambda calculus booleans
Programs generally require storage facilities in order to compute their results. As you saw in your first-year Racket course (if you’re a Waterloo undergrad), the simplest expandable storage facility is the list. However, we will start by talking about the fundamental data structure, pair, which is just a combination of two data values. Like in Racket, by nesting pairs within each other, we can create lists with arbitrary length, as well as trees. We’ll refer to the two elements of a pair as its head and its tail3 , respectively.
The intuition here is that we’re trying to select either head or tail, given a pair. Hence, a pair is basically a function that has the head and tail stored in it. In order to access the individual elements, the pair must accept some function, which is called a selector, as a parameter. The selectors will be functions that, given two parameters, produce the former (i.e. the head) or the latter (i.e. the tail). Hey, doesn’t that sound familiar? That’s what our true and false are doing!
⟦⟨h,t⟩⟧ | := λs. ⟦if s then h else t⟧ | ||
= λs.sht |
Therefore, the function head and tail, which actually extract the values from a list, should pass true or false into the list given as parameter.
⟦head⟧ | := λl.l ⟦true⟧ = λl.lλx.λy.x | ||
⟦tail⟧ | := λl.l ⟦false⟧ = λl.lλx.λy.y |
We then implement a function cons (from constructor) that takes two arguments (the head and the tail) and returns the pair containing them:
⟦cons⟧ | := λh.λt. ⟦⟨h,t⟩⟧ | ||
= λh.λt.λs.sht |
We will show that the pairs exhibit the correct behaviour, namely:
⟦head (cons A B)⟧ | |||
= ⟦head⟧ ( ⟦cons⟧ ⟦A⟧ ⟦B⟧ ) | |||
= ⟦head⟧ ((λh.λt.λs.sht) ⟦A⟧ ⟦B⟧ ) | |||
→β 2 ⟦head⟧ (λs.s ⟦A⟧ ⟦B⟧ ) | |||
= (λl.lλx.λy.x)(λs.s ⟦A⟧ ⟦B⟧ ) | |||
→β (λs.s ⟦A⟧ ⟦B⟧ )λx.λy.x | |||
→β (λx.λy.x) ⟦A⟧ ⟦B⟧ | |||
→β 2 ⟦A⟧ | |||
⟦tail (cons A B)⟧ | |||
= ⟦tail⟧ ( ⟦cons⟧ ⟦A⟧ ⟦B⟧ ) | |||
= ⟦tail⟧ ((λh.λt.λs.sht) ⟦A⟧ ⟦B⟧ ) | |||
→β 2 ⟦tail⟧ (λs.s ⟦A⟧ ⟦B⟧ ) | |||
= (λl.lλx.λy.y)(λs.s ⟦A⟧ ⟦B⟧ ) | |||
→β (λs.s ⟦A⟧ ⟦B⟧ )λx.λy.y | |||
→β (λx.λy.y) ⟦A⟧ ⟦B⟧ | |||
→β 2 ⟦B⟧ | |||
Finally, we need a way to denote the empty list ( ⟦nil⟧ ). In fact, we’re going to work with what we have first: how can we tell that a list is not empty? Well, the selectors used in a pair can choose to produce other expressions than the head or the tail; in this case, the selector will produce ⟦false⟧ when given the head and the tail as parameters:
This guarantees when given a non-empty list (i.e. a pair), it will always produce false (we are going to show that soon). Then it follows that we can just let ⟦nil⟧ be something that makes ⟦null?⟧ return true. As a reminder, a pair is a function that accepts a selector and passes the two parts of the pair into the selector, in the hope that the selector returns one of the parts. In this case, the empty list is just going to ignore the selector (i.e., not use it) and just produce true.
Again, as verification, we wish our constructions to behave correctly:
⟦null? nil⟧ | |||
= λl.lλh.λt. ⟦false⟧ (λs. ⟦true⟧ ) | |||
→β (λs. ⟦true⟧ )λh.λt. ⟦false⟧ | |||
→β ⟦true⟧ | |||
⟦null? (cons A B)⟧ | |||
= ⟦null?⟧ ( ⟦cons⟧ ⟦A⟧ ⟦B⟧ ) | |||
→β 2 ⟦null?⟧ λs.s ⟦A⟧ ⟦B⟧ | |||
= (λl.lλh.λt. ⟦false⟧ )λs.s ⟦A⟧ ⟦B⟧ | |||
→β (λs.s ⟦A⟧ ⟦B⟧ )λh.λt. ⟦false⟧ | |||
→β (λh.λt. ⟦false⟧ ) ⟦A⟧ ⟦B⟧ | |||
→β 2 ⟦false⟧ | |||
Similar to what you have seen in Racket, a nested construction would allow us to construct lists of arbitrary length. Here is an example:
Example 20. Construction of a list (list a b c):
⟦(list a b c)⟧ | = ⟦(cons a (cons b (cons c nil)))⟧ | ||
= ⟦cons⟧ a( ⟦cons⟧ b( ⟦cons⟧ c ⟦nil⟧ )) | |||
= (λh.λt.λs.sht)a((λh.λt.λs.sht)b((λh.λt.λs.sht)c ⟦nil⟧ )) | |||
→β 6λs.saλs.sbλs.sc ⟦nil⟧ | |||
= λs.saλs.sbλs.scλx.λx.λy.x | |||
Video 2.2 ( https://student.cs.uwaterloo.ca/~cs442/W25/videos/2.2/): Lambda calculus pairs and lists
Exercise 6. Define the function ⟦second⟧ 4 , which gets the second element from the list.
After introducing lists, it’s easy to represent numbers: just represent them using lists! An empty list would be 0, a list of one element would be 1, etc.
Exercise 7. Define the following entities using this idiom: ⟦0⟧ , ⟦1⟧ , ⟦pred⟧ , ⟦succ⟧ , ⟦isZero?⟧ , where ⟦pred n⟧ and ⟦succ n⟧ are functions that produce the number that’s one less or one more than n respectively. Verify that your solution works by showing that
However, there is a cleverer solution introduced by Alonzo Church called Church numerals. In Church’s representation of numbers, ⟦n⟧ is defined as a function that takes a function f and a value x5 , and produces the result of f applied n times to its argument x. More specifically:
⟦0⟧ | = λf.λx.x | ||
⟦1⟧ | = λf.λx.fx | ||
⟦2⟧ | = λf.λx.f(fx) | ||
⟦3⟧ | = λf.λx.f(f(fx)) | ||
We will consider how to perform basic arithmetic with the Church numerals (except division, which is rather complicated). First, let’s consider addition. Given two Church numerals m and n, we want to have
A way to get to this is to apply f n times on x, and then apply f m times to the result. Our addition function will accept two arguments, m and n:
Then, it must produce a number, which is a function that takes some f and some x:
Applying f n times on x would just be nfx, since n itself is a Church numeral that takes f and x:
At last, apply the result of nfx to f by m times:
To demonstrate that it works, we will show 2 + 3 = 5. Starting at this point, we will highlight some parts in a few reduction steps to clearly illustrate what reduction to take.
⟦2 + 3⟧ | = ⟦+⟧ ⟦2⟧ ⟦3⟧ | ||
= (λm.λn.λf.λx.mf(nfx)) ⟦2⟧ ⟦3⟧ | |||
→β 2λf.λx. ⟦2⟧ f( ⟦3⟧ fx) | |||
= λf.λx.function arg 1 arg 2 | |||
→β 2λf.λx.f(f( ⟦3⟧ fx)) | |||
= λf.λx.f(f(function arg 1 arg 2)) | |||
→β 2λf.λx.f(f(f(f(fx)))) | |||
= ⟦5⟧ |
A special case of addition is the ⟦succ⟧ function, which produces the number that’s one more than a given number (the successor):
Video 2.3 ( https://student.cs.uwaterloo.ca/~cs442/W25/videos/2.3/): Lambda calculus numbers and addition
Addition was pretty simple, but subtraction is harder. While it’s easy to take away a number in real-world mathematics, it’s impossible to just “take away” a function application from an expression. There is no way for us to “undo” a function application in λ-calculus. An alternative arises from the observation that succ is easy. We can instead use a pair ⟨a,b⟩ to track a number a and its successor. For each given pair ⟨a,b⟩, we can create a new pair ⟨a + 1,a⟩. Start with ⟨0,0⟩ and applying this procedure n times, we would obtain ⟨n,n − 1⟩. Taking the tail of the resulting pair gives us the predecessor of n. Then, we “simply” need to take the predecessor the correct number of times.
Let’s get started. A function ⟦pred⟧ should take one argument, the number n, and the value produced should be a number as well. That is, the value produced should be in the form of λf.λx.:
We need to get the tail of the complicated computation stated above. The result from that computation gives us a number which takes two parameters, so we need to also pass the parameter of our newly made number into there:
We can η-reduce now and get rid of f and x.
The computation itself would repeat n times. But, repeating n times is just the number n in Church numerals:
Then the computation should take a pair, and the starting pair is ⟨0,0⟩:
Then fill in the computation itself:
To keep our explanation brief, we will just stop here.
Exercise 8. Verify that our definition of ⟦pred⟧ is correct by working through an example of your own.
Subtraction is defined in the following way: given m and n, produce m applied to pred n times.
Video 2.4 ( https://student.cs.uwaterloo.ca/~cs442/W25/videos/2.4/): Lambda calculus predecessor and subtraction
Multiplication of m and n is defined as the n-fold repetition of f, m times:
Exponentiation (mn) is just the n-fold repetition of m itself:
Suppose one wants to find length of a list in λ-calculus. Being an expert in recursion, they write this down:
⟦len⟧ | := λl. ⟦if (null? l) then 0 else (succ (len (tail l)))⟧ | ||
= λl.( ⟦null?⟧ l) ⟦0⟧ ( ⟦succ⟧ ( ⟦len⟧ ( ⟦tail⟧ l))) |
It looks great, except ⟦len⟧ is defined in terms of itself. There is no way for us to replace ⟦len⟧ if it’s a self definition: remember, no functions in λ-calculus have a name, so we can’t simply have ⟦len⟧ refer to the name ⟦len⟧ , as that’s actually just a shorthand for the expansion of ⟦len⟧ . Hence, this definition is invalid. So what should we do now? We can try to solve the equation. We can “factor out” the ⟦len⟧ on the right hand side by β-expansion:
⟦len⟧ | = λl.( ⟦null?⟧ l) ⟦0⟧ ( ⟦succ⟧ ( ⟦len⟧ ( ⟦tail⟧ l))) | ||
←β(λf.λl.( ⟦null?⟧ l) ⟦0⟧ ( ⟦succ⟧ (f( ⟦tail⟧ l)))) ⟦len⟧ | |||
Now, we have the equation ⟦len⟧ = (λf.λl.( ⟦null?⟧ l) ⟦0⟧ ( ⟦succ⟧ (f( ⟦tail⟧ l)))) ⟦len⟧ . If we can solve this for ⟦len⟧ , then we’ll have a definition of ⟦len⟧ that does not include ⟦len⟧ . Let
So our equation just becomes ⟦len⟧ = F ⟦len⟧ . To solve the equation, we will use the theory of fixed points.
In mathematics, a fixed point of a function is a value that, when passed to the function, yields itself. In other words, the fixed points of a function f are the values of x such that f(x) = x. For example, for f(x) = x2 − 6, x = 3 is a fixed point.
Consider the following expression:
X | = (λx.f(xx))(λx.f(xx)) | ||
→β f((λx.f(xx))(λx.f(xx))) | |||
= fX |
Since X = fX, X is a fixed point of f, whatever f is. In this case, f is just a place holder for the function whose fixed point we wish to find. By setting f to the desired expression, we can find a fixed point for any λ-expression! We can construct a general fixed-point combinator6 by modifying X to accept an argument for specifying f. What we get is the Curry’s Paradoxical Combinator (or simply the Y combinator):
The critical property of the Y combinator is the following:
Y g | = (λf.(λx.f(xx))(λx.f(xx)))g | ||
→β (λx.g(xx))(λx.g(xx)) | |||
→β g((λx.g(xx))(λx.g(xx))) | |||
←βg(Y g) |
So for any g, we have Y g = βg(Y g). Note that this sequence can be repeated to give g(g(Y g)),g(g(g(Y g))) and so on. This sequence of repeated applications of g does appear to capture the general character of recursive computation. Y is in fact one of the broader class of combinators, defined below:
Let’s go back to the list length example. Remember that we had arrived at ⟦len⟧ = F ⟦len⟧ , by factoring out ⟦len⟧ . We observe that solving our defining equation for len is equivalent to finding a fixed point for the function F. According to our observation, the fixed point is just Y F. Hence the following definition:
Let us step through an example. This example is quite long, so bear with us:
⟦len⟧ ( ⟦cons⟧ a( ⟦cons⟧ b ⟦nil⟧ )) | |||
= β ⟦len⟧ (λs.saλs.sb ⟦nil⟧ ) | |||
= Y F(λs.saλs.sb ⟦nil⟧ ) | |||
= (λf.(λx.f(xx))(λx.f(xx)))F(λs.saλs.sb ⟦nil⟧ ) | (1) | ||
Let’s pause this example for a bit here and think about a very important question: should we use AOR or NOR as our reduction strategy in this case? Well, the AOR redex here is (λx.f(xx))(λx.f(xx)), which does not have a normal form; continuing to reduce it results in something that looks like f(f(f())). AOR always chooses the leftmost innermost redex, which means we’ll fall into this infinite reduction trap while reducing in the step labeled (1). So, we have to use NOR here.
Let’s go back to our reduction now:
So far, we have completed one recursive “call”. Under NOR, we would substitute the whole expression after ⟦succ⟧ into the function succ, and this would lead us to the correct result. However, in order to illustrate the next recursive reduction, we will choose ( ⟦tail⟧ (λs.saλs.sb ⟦nil⟧ )) and ((λx.F(xx))(λx.F(xx))) as our next redices to reduce instead:
We have now seen that the Y combinator indeed works under NOR. However, we sometimes do want AOR in λ-calculus, since it more closely resembles how many real-world programming languages are evaluated, so we need to consider how to make recursion work with something like AOR. Here are a few observations we can make from the above example, and how we may need to change AOR to compensate:
Even with AOE, the step labeled (2) will still have an infinite reduction, since (λx.F(xx))(λx.F(xx)) part in (2) will be reduced to F(F(F((F((λx.F(xx))(λx.F(xx)))))) under AOE. Therefore, we need modify our fixed-point combinator to wrap the “repetition” part, (xx), in an abstraction7 to prevent AOE from choosing it as next redex to reduce:
We have mentioned that our ⟦if⟧ does not perform short-circuit evaluation, and it shouldn’t be a problem as long as both branches don’t have infinitely reducing expression. But, whoops, we do have an infinitely reducing expression in the if expression within the step labeled (3), and that branch is the one we don’t intend to take! As a result, we do need to have a short-circuited version of ⟦if⟧ . As you may have guessed, the same trick works again. Let’s wrap both branches in their respective abstractions:
The short-circuiting works, as ⟦B⟧ (λx. ⟦T⟧ )(λx. ⟦F⟧ ) reduces to either (λx. ⟦T⟧ ) or (λx. ⟦F⟧ ) before x is passed into either of those two functions under AOE.
With these three modifications, the reduction will work under a modified form of AOR, AOE. Note that the same modification could be made to NOR to yield NOE, which we will come back to in Module 5.
As our search for a proper definition of substitution has taught us, the names of variables can often get in the way of the true meaning of an expression. Also, as we have seen many times, a name can have different meanings. In the expression λy.a(λy.b(λy.cy)y)y, each y does not have the same meaning, because there are three different bindings of y. Of course, we can α-convert all of the repeated occurrences of y to prevent accidental capture of variables. Here, we present an alternative solution to this problem presented by deBruijn in 1972 [5], as part as his implementation of Automath [4], a tool for automatically verifying mathematical proofs.
In the deBruijn notation, variables are replaced with integers. Thus the expression λx.λy.x in the deBruijn notation is λ.λ.2. The integers indicate the number of function bodies that must be escaped to locate the binding for the variable. Note that a single binding can therefore be represented by several different integers, depending on how many further function bodies the reference is embedded in.
Example 21. The expression λx.(λy.x)x has the deBruijn equivalent λ.((λ.2)1), in which x is replaced by both 2 and 1.
Free variables can be handled in a number of ways, of which the simplest is to leave them unconverted. Importantly, in the deBruijn notation, unlike in our conventional notation, all α-equivalent expressions have exactly one representation.
The definition of β-reduction for the deBruijn notation is
We also need a new definition for substitution:
n[N∕m] | = | ||
(M1M2)[N∕m] | = M1[N∕m]M2[N∕m] | ||
(λ.M)[N∕m] | = λ.M[N∕m + 1] |
where
renamem,i(j) | = | ||
renamem,i(N1N2) | = renamem,i(N1)renamem,i(N2) | ||
renamem,i(λ.N) | = λ.renamem,i+1(N) |
The deBruijn notation is useful for avoiding the cost of α-conversion. Substitution and β-reduction in deBruijn notation are essentially operations on integers, which is often faster than operations on names. Furthermore, deBruijn notation exposes the concept of a “stack frame”: the integers can be regarded as indices to a stack of values mapping such indices to the expression those bound variables refer to after a reduction.
However, deBruijn notation is considerably harder to read and understand, and often a program evaluating λ-calculus in deBruijn notation needs to provide facilities to convert one notation to another. Such conversions can be costly, offsetting the performance benefit deBruijn notation itself provides.
Many programming languages are based on λ-calculus to some degree, so there is a long history of implementing λ-calculus concepts. Generally speaking, a λ-calculus expression itself can be stored as an abstract syntax tree, in which each node is labeled with its type (variables, abstractions, applications). For instance, the λ-calculus expression λb.λt.λf.btf (i.e., ⟦if⟧ ) could be represented as the following tree:
To evaluate it, you simply walk the tree, following the evaluation strategy of your choice (i.e., choose whether to enter an abstraction first), and replacing applications with their substitutions. To do this, you need to carry a list of bindings, so you know what to substitute. This can get complicated, since a subtree may reuse a variable name, and further complicated because you need to be able to generate a fresh name in order to avoid conflicts.
With deBruijn notation, you instead carry a stack of bindings, and a variable is simply an index into that stack. This makes evaluation simpler, but requires an extra step, and makes debugging more difficult. Modified to use deBruijn notation, our tree would instead look like this:
Of course, with deBruijn notation, it’s also easier to verify that a λ-calculus interpreter is correct, since you don’t need a final α-equivalence check on the final output.
In the next module, we will describe the behavior of λ-calculus expressions with formal rigor, by introducing formal semantics.
[1] Harold Abelson, R. Kent Dybvig, Christopher T. Haynes, Guillermo Juan Rozas, NI Adams, Daniel P. Friedman, E Kohlbecker, GL Steele, David H Bartley, Robert Halstead, et al. Revised 5 report on the algorithmic language scheme. Higher-order and symbolic computation, 11(1):7–105, 1998.
[2] Alonzo Church and J Barkley Rosser. Some properties of conversion. Transactions of the American Mathematical Society, 39(3):472–482, 1936.
[3] Haskell B. Curry and R. Feys. Combinatory Logic. Number Vol. I in Combinatory Logic. North-Holland Publishing Company, 1968.
[4] Nicolaas Govert De Bruijn. The mathematical language automath, its usage, and some of its extensions. In Symposium on automatic demonstration, pages 29–61. Springer, 1970.
[5] Nicolaas Govert De Bruijn. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the church-rosser theorem. In Indagationes Mathematicae (Proceedings), volume 75, pages 381–392. North-Holland, 1972.
Copyright © 2020–2025 Yangtian Zi, Gregor Richards, Brad Lushman, and Anthony Cox.
This module is intended for CS442 at University of Waterloo.
Any other use requires permission from the above named copyright holder(s).
© Gregor Richards. | Design by TEMPLATED.