title | filename | chapternum |
---|---|---|
Loops and infinity |
lec_06_loops |
6 |
- Learn the model of Turing machines, which can compute functions of arbitrary input lengths.
- See a programming-language description of Turing machines, using NAND-TM programs, which add loops and arrays to NAND-CIRC.
- See some basic syntactic sugar and equivalence of variants of Turing machines and NAND-TM programs.
"An algorithm is a finite answer to an infinite number of questions.", Attributed to Stephen Kleene.
"The bounds of arithmetic were however outstepped the moment the idea of applying the [punched] cards had occurred; and the Analytical Engine does not occupy common ground with mere "calculating machines."" ... In enabling mechanism to combine together general symbols, in successions of unlimited variety and extent, a uniting link is established between the operations of matter and the abstract mental processes of the most abstract branch of mathematical science. ", Ada Augusta, countess of Lovelace, 1843
The model of Boolean circuits (or equivalently, the NAND-CIRC programming language) has one very significant drawback: a Boolean circuit can only compute a finite function
Let us consider the case of the simple parity or XOR function
This code for computing
# s is the "running parity", initialized to 0
while i<len(X):
u = NAND(s,X[i])
v = NAND(s,u)
w = NAND(X[i],u)
s = NAND(v,w)
i+= 1
Y[0] = s
Generally an algorithm is, as we quote above, "a finite answer to an infinite number of questions". To express an algorithm we need to write down a finite set of instructions that will enable us to compute on arbitrarily long inputs. To describe and execute an algorithm we need the following components (see algcomponentfig{.ref}):
-
The finite set of instructions to be performed.
-
Some "local variables" or finite state used in the execution.
-
A potentially unbounded working memory to store the input as well as any other values we may require later.
-
While the memory is unbounded, at every single step we can only read and write to a finite part of it, and we need a way to adress which are the parts we want to read from and write to.
-
If we only have a finite set of instructions but our input can be arbitrarily long, we will need to repeat instructions (i.e., loop back). We need a mechanism to decide when we will loop and when we will halt.
In this chapter we will show how we can extend the model of Boolean circuits / straight-line programs so that it can capture these kinds of constructs. We will see two ways to do so:
-
Turing machines, invented by Alan Turing in 1936, are an hypothetical abstract device that yields a finite description of an algorithm that can handle arbitrarily long inputs.
-
The NAND-TM Programming language extends NAND-CIRC with the notion of loops and arrays to obtain finite programs that can compute a function with arbitrarily long inputs.
It turns out that these two models are equivalent, and in fact they are equivalent to a great many other computational models including programming languages you may be familiar with such as C, Lisp, Python, JavaScript, etc. This notion, known as Turing equivalence or Turing completeness, will be discussed in chapequivalentmodels{.ref}. See chaploopoverviewfig{.ref} for an overview of the models presented in this chapter and chapequivalentmodels{.ref}.
::: {.remark title="Finite vs infinite computation" #infinite}
Previously in this book we studied the computation of finite functions
In this chapter we consider functions that take inputs of unbounded size, such as the function
To contrast with the finite case, we will sometimes call a function
Some texts present the task of computing a function
"Computing is normally done by writing certain symbols on paper. We may suppose that this paper is divided into squares like a child's arithmetic book.. The behavior of the [human] computer at any moment is determined by the symbols which he is observing, and of his 'state of mind' at that moment... We may suppose that in a simple operation not more than one symbol is altered.",
"We compare a man in the process of computing ... to a machine which is only capable of a finite number of configurations... The machine is supplied with a 'tape' (the analogue of paper) ... divided into sections (called 'squares') each capable of bearing a 'symbol' ", Alan Turing, 1936
"What is the difference between a Turing machine and the modern computer? It's the same as that between Hillary's ascent of Everest and the establishment of a Hilton hotel on its peak." , Alan Perlis, 1982.
The "granddaddy" of all models of computation is the Turing Machine. Turing machines were defined in 1936 by Alan Turing in an attempt to formally capture all the functions that can be computed by human "computers" (see humancomputersfig{.ref}) that follow a well-defined set of rules, such as the standard algorithms for addition or multiplication.
Turing thought of such a person as having access to as much "scratch paper" as they need. For simplicity we can think of this scratch paper as a one dimensional piece of graph paper (or tape, as it is commonly referred to), which is divided to "cells", where each "cell" can hold a single symbol (e.g., one digit or letter, and more generally some element of a finite alphabet). At any point in time, the person can read from and write to a single cell of the paper, and based on the contents can update his/her finite mental state, and/or move to the cell immediately to the left or right of the current one.
{#steamturingmachine .margin }
Turing modeled such a computation by a "machine" that maintains one of
-
Initially the machine is at state
$0$ (known as the "starting state") and the tape is initialized to$\triangleright,x_0,\ldots,x_{n-1},\varnothing,\varnothing,\ldots$ . We use the symbol$\triangleright$ to denote the beginning of the tape, and the symbol$\varnothing$ to denote an empty cell. We will always assume that the alphabet$\Sigma$ is a (potentially strict) superset of${ \triangleright, \varnothing , 0 , 1 }$ . -
The location
$i$ to which the machine points to is set to$0$ . -
At each step, the machine reads the symbol
$\sigma = T[i]$ that is in the$i^{th}$ location of the tape, and based on this symbol and its state$s$ decides on:- What symbol
$\sigma'$ to write on the tape \ - Whether to move Left (i.e.,
$i \leftarrow i-1$ ), Right (i.e.,$i \leftarrow i+1$ ), Stay in place, or Halt the computation. - What is going to be the new state
$s \in [k]$
- What symbol
-
The set of rules the Turing machine follows is known as its transition function.
-
When the machine halts then its output is the binary string obtained by reading the tape from the beginning until the head position, dropping all symbolssuch as
$\triangleright$ ,$\varnothing$ , etc. that are not either$0$ or$1$ .
{#turingmachinecomponentsfig .margin }
Let
We now show a Turing Machine
In our case,
---
caption: ''
alignment: ''
table-width: ''
id: ''
---
State, Label
0, `START`
1,`RIGHT_0`
2,`RIGHT_1`
3,`LOOK_FOR_0`
4,`LOOK_FOR_1`
5,`RETURN`
6,`REJECT`
7,`ACCEPT`
8,`OUTPUT_0`
9,`OUTPUT_1`
10,`0_AND_BLANK`
11,`1_AND_BLANK`
12,`BLANK_AND_STOP`
We describe the operation of our Turing Machine
-
$M$ starts in stateSTART
and will go right, looking for the first symbol that is$0$ or$1$ . If we find$\varnothing$ before we hit such a symbol then we will move to theOUTPUT_1
state that we describe below. -
Once
$M$ finds such a symbol$b \in {0,1}$ ,$M$ deletes$b$ from the tape by writing the$\times$ symbol, it enters either theRIGHT_0
orRIGHT_1
mode according to the value of$b$ and starts moving rightwards until it hits the first$\varnothing$ or$\times$ symbol. -
Once we find this symbol we go into the state
LOOK_FOR_0
orLOOK_FOR_1
depending on whether we were in the stateRIGHT_0
orRIGHT_1
and make one left move. -
In the state
LOOK_FOR_
$b$ , we check whether the value on the tape is$b$ . If it is, then we delete it by changing its value to$\times$ , and move to the stateRETURN
. Otherwise, we change to theOUTPUT_0
state. -
The
RETURN
state means we go back to the beginning. Specifically, we move leftward until we hit the first symbol that is not$0$ or$1$ , in which case we change our state toSTART
. -
The
OUTPUT_
$b$ states mean that we are going to output the value$b$ . In both these states we go left until we hit$\triangleright$ . Once we do so, we make a right step, and change to the1_AND_BLANK
or0_AND_BLANK
states respectively. In the latter states, we write the corresponding value, and then move right and change to theBLANK_AND_STOP
state, in which we write$\varnothing$ to the tape and halt.
The above description can be turned into a table describing for each one of the
The formal definition of Turing machines is as follows:
::: {.definition title="Turing Machine" #TM-def}
A (one tape) Turing machine with
For every
-
We initialize
$T$ to be the sequence$\triangleright,x_0,x_1,\ldots,x_{n-1},\varnothing,\varnothing,\ldots$ , where$n=|x|$ . (That is,$T[0]=\triangleright$ ,$T[i+1]=x_{i}$ for$i\in [n]$ , and$T[i]=\varnothing$ for$i>n$ .) -
We also initialize
$i=0$ and$s=0$ . -
We then repeat the following process:
- Let
$(s',\sigma',D) = \delta_M(s,T[i])$ . - Set
$s \rightarrow s'$ ,$T[i] \rightarrow \sigma'$ . - If
$D=\mathsf{R}$ then set$i \rightarrow i+1$ , if$D=\mathsf{L}$ then set$i \rightarrow \max{i-1,0}$ . (If$D = \mathsf{S}$ then we keep$i$ the same.) - If
$D=\mathsf{H}$ then halt.
- Let
-
If the process above halts, then
$M$ 's output, denoted by$M(x)$ , is the string$y\in {0,1}^*$ obtained by concatenating all the symbols in${0,1}$ in positions$T[0],\ldots, T[i]$ where$i$ is the final head position. -
If The Turing machine does not halt then we denote
$M(x)=\bot$ . :::
::: { .pause } You should make sure you see why this formal definition corresponds to our informal description of a Turing Machine. To get more intuition on Turing Machines, you can explore some of the online available simulators such as Martin Ugarte's, Anthony Morphett's, or Paul Rendell's. :::
One should not confuse the transition function
In our formal definition, we identified the machine
We now turn to making one of the most important definitions in this book, that of computable functions.
::: {.definition title="Computable functions" #computablefuncdef}
Let $F:{0,1}^* \rightarrow {0,1}^$ be a (total) function and let $M$ be a Turing machine. We say that $M$ computes $F$ if for every $x\in {0,1}^$,
We say that a function
Defining a function "computable" if and only if it can be computed by a Turing machine might seem "reckless" but, as we'll see in chapequivalentmodels{.ref}, it turns out that being computable in the sense of computablefuncdef{.ref} is equivalent to being computable in essentially any reasonable model of computation. This is known as the Church Turing Thesis. (Unlike the extended Church Turing Thesis which we discussed in PECTTsec{.ref}, the Church-Turing thesis itself is widely believed and there are no candidate devices that attack it.)
::: {.bigidea #definecompidea } We can precisely define what it means for a function to be computable by any possible algorithm. :::
This is a good point to remind the reader that functions are not the same as programs:
A Turing machine (or program)
We will often pay special attention to functions
We define
::: {.remark title="Functions vs. languages" #decidablelanguagesrem}
Many texts use the terminology of "languages" rather than functions to refer to computational tasks.
The name "language" has its roots in formal language theory as pursued by linguists such as Noam Chomsky.
A formal language is a subset $L \subseteq {0,1}^$ (or more generally $L \subseteq \Sigma^$ for some finite alphabet
In this book we stick to the terminology of functions rather than languages, but all definitions and results can be easily translated back and forth by using the equivalence between the function
One crucial difference between circuits/straight-line programs and Turing machines is the following.
Looking at a NAND-CIRC program X
and Y
variables).
Furthermore, we are guaranteed that if we invoke
In contrast, given any Turing machine
If a machine
For example, consider the partial function
::: {.definition title="Computable (partial or total) functions" #computablepartialfuncdef}
Let
Note that if
::: {.remark title="Bot symbol" #botsymbol}
We often use
If a partial function C
programming language, on inputs
The name "Turing machine", with its "tape" and "head" evokes a physical object, while in contrast we think of a program as a piece of text.
But we can think of a Turing machine as a program as well.
For example, consider the Turing Machine
# Gets an array Tape initialized to
# [">", x_0 , x_1 , .... , x_(n-1), "∅", "∅", ...]
# At the end of the execution, Tape[1] is equal to 1
# if x is a palindrome and is equal to 0 otherwise
def PAL(Tape):
head = 0
state = 0 # START
while (state != 12):
if (state == 0 && Tape[head]=='0'):
state = 3 # LOOK_FOR_0
Tape[head] = 'x'
head += 1 # move right
if (state==0 && Tape[head]=='1')
state = 4 # LOOK_FOR_1
Tape[head] = 'x'
head += 1 # move right
... # more if statements here
The particular details of this program are not important. What matters is that we can describe Turing machines as programs.
Moreover, note that when translating a Turing machine into a program, the tape becomes a list or array that can hold values from the finite set
More generally we can think of every Turing Machine
# Gets an array Tape initialized to
# [">", x_0 , x_1 , .... , x_(n-1), "∅", "∅", ...]
def M(Tape):
state = 0
i = 0 # holds head location
while (True):
# Move head, modify state, write to tape
# based on current state and cell at head
# below are just examples for how program looks for a particular transition function
if Tape[i]=="0" and state==7: # δ_M(7,"0")=(19,"1","R")
i += 1
Tape[i]="1"
state = 19
elif Tape[i]==">" and state == 13: # δ_M(13,">")=(15,"0","S")
Tape[i]="0"
state = 15
elif ...
...
elif Tape[i]==">" and state == 29: # δ_M(29,">")=(.,.,"H")
break # Halt
If we wanted to use only Boolean (i.e., state
variables using Tape[]
with Tape0[]
,$\ldots$, Tape
[]
.
We now introduce the NAND-TM programming language, which aims to capture the power of a Turing machine in a programming language formalism. Just like the difference between Boolean circuits and Turing Machines, the main difference between NAND-TM and NAND-CIRC is that NAND-TM models a single uniform algorithm that can compute a function that takes inputs of arbitrary lengths. To do so, we extend the NAND-CIRC programming language with two constructs:
-
Loops: NAND-CIRC is a straight-line programming language- a NAND-CIRC program of
$s$ lines takes exactly$s$ steps of computation and hence in particular cannot even touch more than$3s$ variables. Loops allow us to capture in a short program the instructions for a computation that can take an arbitrary amount of time. -
Arrays: A NAND-CIRC program of
$s$ lines touches at most$3s$ variables. While we can use variables with names such asFoo_17
orBar[22]
, they are not true arrays, since the number in the identifier is a constant that is "hardwired" into the program.
Thus a good way to remember NAND-TM is using the following informal equation:
As we will see, adding loops and arrays to NAND-CIRC is enough to capture the full power of all programming languages! Hence we could replace "NAND-TM" with any of Python, C, Javascript, OCaml, etc. in the lefthand side of eqnandloops{.eqref}. But we're getting ahead of ourselves: this issue will be discussed in chapequivalentmodels{.ref}.
Concretely, the NAND-TM programming language adds the following features on top of NANC-CIRC (see nandtmfig{.ref})):
-
We add a special integer valued variable
i
. All other variables in NAND-TM are Boolean valued (as in NAND-CIRC). -
Apart from
i
NAND-TM has two kinds of variables: scalars and arrays. Scalar variables hold one bit (just as in NAND-CIRC). Array variables hold an unbounded number of bits. At any point in the computation we can access the array variables at the location indexed byi
usingFoo[i]
. We cannot access the arrays at locations other the one pointed to byi
. -
We use the convention that arrays always start with a capital letter, and scalar variables (which are never indexed with
i
) start with lowercase letters. HenceFoo
is an array andbar
is a scalar variable. -
The input and output
X
andY
are now considered arrays with values of zeroes and ones. (There are also two other special arraysX_nonblank
andY_nonblank
, see below.) -
We add a special
MODANDJUMP
instruction that takes two boolean variables$a,b$ as input and does the following:- If
$a=1$ and$b=1$ thenMODANDJUMP(
$a,b$ )
incrementsi
by one and jumps to the first line of the program. - If
$a=0$ and$b=1$ thenMODANDJUMP(
$a,b$ )
decrementsi
by one and jumps to the first line of the program. (Ifi
is already equal to$0$ then it stays at$0$ .) - If
$a=1$ and$b=0$ thenMODANDJUMP(
$a,b$ )
jumps to the first line of the program without modifyingi
. - If
$a=b=0$ thenMODANDJUMP(
$a,b$ )
halts execution of the program.
- If
-
The
MODANDJUMP
instruction always appears in the last line of a NAND-TM program and nowhere else.
Default values. We need one more convention to handle "default values".
Turing machines have the special symbol Foo
a "companion array" Foo_nonblank
and set Foo_nonblank[i]
to i
'th location is initialized.
In particular we will use this convention for the input and output arrays X
and Y
.
A NAND-TM program has four special arrays X
, X_nonblank
, Y
, and Y_nonblank
.
When a NAND-TM program is executed on input X
are initialized to X_nonblank
are initialized to Y[
$0X
and X_nonblank
initialized to contain the input, and writes to Y
and Y_nonblank
to produce the output.
Formally, NAND-TM programs are defined as follows:
::: {.definition title="NAND-TM programs" #NANDTM}
A NAND-TM program consists of a sequence of lines of the form foo = NAND(bar,blah)
ending with a line of the form MODANDJMP(foo,bar)
, where foo
,bar
,blah
are either scalar variables (sequences of letters, digits, and underscores) or array variables of the form Foo[i]
(starting with capital letter and indexed by i
). The program has the array variables X
, X_nonblank
, Y
, Y_nonblank
and the index variable i
built in, and can use additional array and scalar variables.
If
-
The arrays
X
andX_nonblank
are initialized byX[
$i$]$ =x_i$ andX_nonblank[
$i$]$ =1$ for all$i\in [|x|]$ . All other variables and cells are initialized to$0$ . The index variablei
is also initialized to$0$ . -
The program is executed line by line, when the last line
MODANDJMP(foo,bar)
is executed then we do as follows:a. If
foo
$=1$ andbar
$=0$ then jump to the first line without modifying the value ofi
.b. If
foo
$=1$ andbar
$=1$ then incrementi
by one and jump to the first line.c. If
foo
$=0$ andbar
$=1$ then decrementi
by one (unless it is already zero) and jump to the first line.d. If
foo
$=0$ andbar
$=0$ then halt and outputY[
$0$]</code>, $\ldots$, <code>Y[$ m-1$]</code> where $m$ is the smallest integer such that <code>Y_nonblank[$ m$]$ =0$. :::
As the name implies, NAND-TM programs are a direct implementation of Turing machines in programming language form. We will show the equivalence below but you can already see how the components of Turing machines and NAND-TM programs correspond to one another:
---
caption: 'Turing Machine and NAND-TM analogs'
alignment: 'LL'
table-width: '1/1'
id: TMvsNANDTMtable
---
**Turing Machine** | **NAND-TM program**
*State:* single register that takes values in $[k]$ | *Scalar variables:* Several variables such as `foo`, `bar` etc.. each taking values in $\{0,1\}$.
*Tape:* One tape containing values in a finite set $\Sigma$. Potentially infinite but $T[t]$ defaults to $\varnothing$ for all locations $t$ that have not been accessed. | *Arrays:* Several arrays such as `Foo`, `Bar` etc.. for each such array `Arr` and index $j$, the value of `Arr` at position $j$ is either $0$ or $1$. The value defaults to $0$ for position that have not been written to.
*Head location:* A number $i\in \mathbb{N}$ that encodes the position of the head. | *Index variable:* The variable `i` that can be used to access the arrays.
*Accessing memory:* At every step the Turing machine has access to its local state, but can only access the tape at the position of the current head location. | *Accessing memory:* At every step a NAND-TM program has access to all the scalar variables, but can only access the arrays at the location `i` of the index variable
*Control of location:* In each step the machine can move the head location by at most one position. | *Control of index variable:* In each iteration of its main loop the program can modify the index `i` by at most one.
We now present some examples of NAND-TM programs
::: {.example title="XOR in NAND-TM" #XORENANDPP}
The following is a NAND-TM program to compute the XOR function
on inputs of arbitrary length.
That is
temp_0 = NAND(X[0],X[0])
Y_nonblank[0] = NAND(X[0],temp_0)
temp_2 = NAND(X[i],Y[0])
temp_3 = NAND(X[i],temp_2)
temp_4 = NAND(Y[0],temp_2)
Y[0] = NAND(temp_3,temp_4)
MODANDJUMP(X_nonblank[i],X_nonblank[i])
:::
::: {.example title="Increment in NAND-TM" #INCENANDPP}
We now present NAND-TM program to compute the increment function.
That is,
We start by showing the program using the "syntactic sugar" we've seen before of using shorthand for some NAND-CIRC programs we have seen before to compute simple functions such as IF
, XOR
and AND
(as well as the constant one
function as well as the function COPY
that just maps a bit to itself).
carry = IF(started,carry,one(started))
started = one(started)
Y[i] = XOR(X[i],carry)
carry = AND(X[i],carry)
Y_nonblank[i] = one(started)
MODANDJUMP(X_nonblank[i],X_nonblank[i])
The above is not, strictly speaking, a valid NAND-TM program. If we "open up" all of the syntactic sugar, we get the following valid program to compute this syntactic sugar.
temp_0 = NAND(started,started)
temp_1 = NAND(started,temp_0)
temp_2 = NAND(started,started)
temp_3 = NAND(temp_1,temp_2)
temp_4 = NAND(carry,started)
carry = NAND(temp_3,temp_4)
temp_6 = NAND(started,started)
started = NAND(started,temp_6)
temp_8 = NAND(X[i],carry)
temp_9 = NAND(X[i],temp_8)
temp_10 = NAND(carry,temp_8)
Y[i] = NAND(temp_9,temp_10)
temp_12 = NAND(X[i],carry)
carry = NAND(temp_12,temp_12)
temp_14 = NAND(started,started)
Y_nonblank[i] = NAND(started,temp_14)
MODANDJUMP(X_nonblank[i],X_nonblank[i])
:::
::: { .pause } Working out the above two examples can go a long way towards understanding the NAND-TM language. See the appendix and our GitHub repository for a full specification of the NAND-TM language. :::
Given the above discussion, it might not be surprising that Turing machines turn out to be equivalent to NAND-TM programs. Indeed, we designed the NAND-TM language to have this property. Nevertheless, this is an important result, and the first of many other such equivalence results we will see in this book.
For every
::: {.proofidea data-ref="TM-equiv-thm"}
To prove such an equivalence theorem, we need to show two directions. We need to be able to (1) transform a Turing machine
The idea of the proof is illustrated in tmvsnandppfig{.ref}.
To show (1), given a Turing machine Tape
for the tape of state
for the state of state_
state_
Tape_
Tape_
i
respectively.
We show (2) using very similar ideas. Given a program
::: {.proof data-ref="TM-equiv-thm"}
We start by proving the "if" direction of TM-equiv-thm{.ref}. Namely we show that given a Turing machine
The key observation is that by NAND-univ-thm{.ref} we can compute every finite function using a NAND-CIRC program.
In particular, consider the transition function
-
We encode
$[k]$ using${0,1}^\ell$ and$\Sigma$ using${0,1}^{\ell'}$ , where$\ell = \ceil{\log k}$ and$\ell' = \ceil{\log |\Sigma|}$ . -
We encode the set
${\mathsf{L},\mathsf{R}, \mathsf{S},\mathsf{H} }$ using${0,1}^2$ . We will choose the encode$\mathsf{L} \mapsto 01$ ,$\mathsf{R} \mapsto 11$ ,$\mathsf{S} \mapsto 10$ ,$\mathsf{H} \mapsto 00$ . (This conveniently corresponds to the semantics of theMODANDJUMP
operation.)
Hence we can identify ComputeM
that computes this function
INPUT: $x\in \{0,1\}^*$
OUTPUT: $M(x)$ -if $M$ halts on $x$. Otherwise go into infinite loop
# We use variables `state_`$0$ $\ldots$ `state_`$\ell-1$ to encode $M$'s state
# We use arrays `Tape_`$0$`[]` $\ldots$ `Tape_`$\ell'-1$`[]` to encode $M$'s tape
# We omit the initial and final "book keeping" to copy input to `Tape` and copy output from `Tape`
# Use the fact that transition is finite and computable by NAND-CIRC program:
`state_`$0$ $\ldots$ `state_`$\ell-1$, `Tape_`$0$`[i]`$\ldots$ `Tape_`$\ell'-1$`[i]`, `dir0`,`dir1` $\leftarrow$ `TRANSITION(` `state_`$0$ $\ldots$ `state_`$\ell-1$, `Tape_`$0$`[i]`$\ldots$ `Tape_`$\ell'-1$`[i]`, `dir0`,`dir1` `)`
`MODANDJMP(dir0,dir1)`
Every step of the main loop of the above program perfectly mimics the computation of the Turing Machine
For the other direction, suppose that
Specifically, consider the function i
in the beginning of an iteration, outputs all the new values of these variables at the last line of the iteration, right before the MODANDJUMP
instruction is executed.
If foo
and bar
are the two variables that are used as input to the MODANDJUMP
instruction, then this means that based on the values of these variables we can compute whether i
will increase, decrease or stay the same, and whether the program will halt or jump back to the beginning.
Hence a Turing machine can simulate an execution of
-
The machine
$M_P$ encodes the contents of the array variables of$P$ in its tape, and the contents of the scalar variables in (part of) its state. Specifically, if$P$ has$\ell$ local variables and$t$ arrays, then the state space of$M$ will be large enough to encode all$2^\ell$ assignments to the local variables and the alphabet$\Sigma$ of$M$ will be large enough to encode all$2^t$ assignments for the array variables at each location. The head location corresponds to the index variablei
. -
Recall that every line of the program
$P$ corresponds to reading and writing either a scalar variable, or an array variable at the locationi
. In one iteration of$P$ the value ofi
remains fixed, and so the machine$M$ can simulate this iteration by reading the values of all array variables ati
(which are encoded by the single symbol in the alphabet$\Sigma$ located at thei
-th cell of the tape) , reading the values of all scalar variables (which are encoded by the state), and updating both. The transition function of$M$ can output$\mathsf{L},\mathsf{S},\mathsf{R}$ depending on whether the values given to theMODANDJMP
operation are$01$ ,$10$ or$11$ respectively. -
When the program halts (i.e.,
MODANDJMP
gets$00$ ) then the Turing machine will enter into a special loop to copy the results of theY
array into the output and then halt. We can achieve this by adding a few more states.
The above is not a full formal description of a Turing Machine, but our goal is just to show that such a machine exists. One can see that
::: {.remark title="Running time equivalence (optional)" #polyequivrem} If we examine the proof of TM-equiv-thm{.ref} then we can see that every iteration of the loop of a NAND-TM program corresponds to one step in the execution of the Turing machine. We will come back to this question of measuring number of computation steps later in this course. For now the main take away point is that NAND-TM programs and Turing Machines are essentially equivalent in power even when taking running time into account. :::
Once you understand the definitions of both NAND-TM programs and Turing Machines, TM-equiv-thm{.ref} is fairly straightforward.
Indeed, NAND-TM programs are not as much a different model from Turing Machines as they are simply a reformulation of the same model using programming language notation.
You can think of the difference between a Turing machine and a NAND-TM program as the difference between representing a number using decimal or binary notation.
In contrast, the difference between a function
---
caption: 'Specification vs Implementation formalisms'
alignment: 'LL'
table-width: ''
id: specvsimp
---
*Setting* ; *Specification* ; *Implementation*
_Finite computation_ ; __Functions__ mapping $\{0,1\}^n$ to $\{0,1\}^m$ ; __Circuits__, __Straightline programs__
_Infinite computation_ ; __Functions__ mapping $\{0,1\}^*$ to $\{0,1\}$ or to $\{0,1\}^*$. ; __Algorithms__, __Turing Machines__, __Programs__
Just like we did with NAND-CIRC in finiteuniversalchap{.ref}, we can use "syntactic sugar" to make NAND-TM programs easier to write. For starters, we can use all of the syntactic sugar of NAND-CIRC, and so have access to macro definitions and conditionals (i.e., if/then). But we can go beyond this and achieve for example:
-
Inner loops such as the
while
andfor
operations common to many programming language.s -
Multiple index variables (e.g., not just
i
but we can addj
,k
, etc.). -
Arrays with more than one dimension (e.g.,
Foo[i][j]
,Bar[i][j][k]
etc.)
In all of these cases (and many others) we can implement the new feature as mere "syntactic sugar" on top of standard NAND-TM, which means that the set of functions computable by NAND-TM with this feature is the same as the set of functions computable by standard NAND-TM. Similarly, we can show that the set of functions computable by Turing Machines that have more than one tape, or tapes of more dimensions than one, is the same as the set of functions computable by standard Turing machines.
We can implement more advanced looping constructs than the simple MODANDJUMP
.
For example, we can implement GOTO
.
A GOTO
statement corresponds to jumping to a certain line in the execution.
For example, if we have code of the form
"start": do foo
GOTO("end")
"skip": do bar
"end": do blah
then the program will only do foo
and blah
as when it reaches the line GOTO("end")
it will jump to the line labeled with "end"
.
We can achieve the effect of GOTO
in NAND-TM using conditionals.
In the code below, we assume that we have a variable pc
that can take strings of some constant length.
This can be encoded using a finite number of Boolean variables pc_0
, pc_1
, pc_
pc = "label"
what we mean is something like pc_0 = 0
,pc_1 = 1
, "label"
as a string of length if
statements), which we can emulate using syntactic sugar in the same way as we did in NAND-CIRC.
To emulate a GOTO statement, we will first modify a program P of the form
do foo
do bar
do blah
to have the following form (using syntactic sugar for if
):
pc = "line1"
if (pc=="line1"):
do foo
pc = "line2"
if (pc=="line2"):
do bar
pc = "line3"
if (pc=="line3"):
do blah
These two programs do the same thing.
The variable pc
corresponds to the "program counter" and tells the program which line to execute next.
We can see that if we wanted to emulate a GOTO("line3")
then we could simply modify the instruction pc = "line2"
to be pc = "line3"
.
In NAND-CIRC we could only have GOTO
s that go forward in the code, but since in NAND-TM everything is encompassed within a large outer loop, we can use the same ideas to implement GOTO
's that can go backwards, as well as conditional loops.
Other loops. Once we have GOTO
, we can emulate all the standard loop constructs such as while
, do .. until
or for
in NAND-TM as well. For example, we can replace the code
while foo:
do blah
do bar
with
"loop":
if NOT(foo): GOTO("next")
do blah
GOTO("loop")
"next":
do bar
::: {.remark title="GOTO's in programming languages" #gotorem}
The GOTO
statement was a staple of most early programming languages, but has largely fallen out of favor and is not included in many modern languages such as Python, Java, Javascript.
In 1968, Edsger Dijsktra wrote a famous letter titled "Go to statement considered harmful." (see also xkcdgotofig{.ref}).
The main trouble with GOTO
is that it makes analysis of programs more difficult by making it harder to argue about invariants of the program.
When a program contains a loop of the form:
for j in range(100):
do something
do blah
you know that the line of code do blah
can only be reached if the loop ended, in which case you know that j
is equal to do blah
from any other point in the code, then it's very hard for you as the programmer to know what you can rely upon in this code.
As Dijkstra said, such invariants are important because "our intellectual powers are rather geared to master static relations and .. our powers to visualize processes evolving in time are relatively poorly developed" and so "we should ... do ...our utmost best to shorten the conceptual gap between the static program and the dynamic process."
That said, GOTO
is still a major part of lower level languages where it is used to implement higher level looping constructs such as while
and for
loops.
For example, even though Java doesn't have a GOTO
statement, the Java Bytecode (which is a lower level representation of Java) does have such a statement.
Similarly, Python bytecode has instructions such as POP_JUMP_IF_TRUE
that implement the GOTO
functionality, and similar instructions are included in many assembly languages.
The way we use GOTO
to implement a higher level functionality in NAND-TM is reminiscent of the way these various jump instructions are used to implement higher level looping constructs.
:::
While NAND-TM adds extra operations over NAND-CIRC, it is not exactly accurate to say that NAND-TM programs or Turing machines are "more powerful" than NAND-CIRC programs or Boolean circuits.
NAND-CIRC programs, having no loops, are simply not applicable for computing functions with an unbounded number of inputs.
Thus, to compute a function
The key difference between NAND-CIRC and NAND-TM is that NAND-TM allows us to express the fact that the algorithm for computing parities of length-$100$ strings is really the same one as the algorithm for computing parities of length-$5$ strings (or similarly the fact that the algorithm for adding
This notion of a single algorithm that can compute functions of all input lengths is known as uniformity of computation and hence we think of Turing machines / NAND-TM as uniform model of computation, as opposed to Boolean circuits or NAND-CIRC which is a nonuniform model, where we have to specify a different program for every input length.
Looking ahead, we will see that this uniformity leads to another crucial difference between Turing machines and circuits. Turing machines can have inputs and outputs that are longer than the description of the machine as a string and in particular there exists a Turing machine that can "self replicate" in the sense that it can print its own code. This notion of "self replication", and the related notion of "self reference" is crucial to many aspects of computation, as well of course to life itself, whether in the form of digital or biological programs.
For now, what you ought to remember is the following differences between uniform and non uniform computational models:
-
Non uniform computational models: Examples are NAND-CIRC programs and Boolean circuits. These are models where each individual program/circuit can compute a finite function
$f:{0,1}^n \rightarrow {0,1}^m$ . We have seen that every finite function can be computed by some program/circuit. To discuss computation of an infinite function$F:{0,1}^* \rightarrow {0,1}^*$ we need to allow a sequence${ P_n }_{n\in \N}$ of programs/circuits (one for every input length), but this does not capture the notion of a single algorithm to compute the function$F$ . -
Uniform computational models: Examples are Turing machines and NAND-TM programs. These are model where a single program/machine can take inputs of arbitrary length and hence compute an infinite function
$F:{0,1}^* \rightarrow {0,1}^*$ . The number of steps that a program/machine takes on some input is not a priori bounded in advance and in particular there is a chance that it will enter into an infinite loop. Unlike the nonuniform case, we have not shown that every infinite function can be computed by some NAND-TM program/Turing Machine. We will come back to this point in chapcomputable{.ref}.
- Turing machines capture the notion of a single algorithm that can evaluate functions of every input length.
- They are equivalent to NAND-TM programs, which add loops and arrays to NAND-CIRC.
- Unlike NAND-CIRC or Boolean circuits, the number of steps that a Turing machine takes on a given input is not fixed in advance. In fact, a Turing machine or a NAND-TM program can enter into an infinite loop on certain inputs, and not halt at all.
::: {.exercise title="Explicit NAND TM programming" #majoritynandtm}
Produce the code of a (syntactic-sugar free) NAND-TM program
::: {.exercise title="Computable functions examples" #computable} Prove that the following functions are computable. For all of these functions, you do not have to fully specify the Turing Machine or the NAND-TM program that computes the function, but rather only prove that such a machine or program exists:
-
$INC:{0,1}^* \rightarrow {0,1}$ which takes as input a representation of a natural number$n$ and outputs the representation of$n+1$ . -
$ADD:{0,1}^* \rightarrow {0,1}$ which takes as input a representation of a pair of natural numbers$(n,m)$ and outputs the representation of$n+m$ . -
$MULT:{0,1}^* \rightarrow {0,1}^*$ , which takes a representation of a pair of natural numbers$(n,m)$ and outputs the representation of$n\dot m$ . -
$SORT:{0,1}^* \rightarrow {0,1}^*$ which takes as input the representation of a list of natural numbers$(a_0,\ldots,a_{n-1})$ and returns its sorted version$(b_0,\ldots,b_{n-1})$ such that for every$i\in [n]$ there is some$j \in [n]$ with$b_i=a_j$ and$b_0 \leq b_1 \leq \cdots \leq b_{n-1}$ . :::
::: {.exercise title="Two index NAND-TM" #twoindexex}
Define NAND-TM' to be the variant of NAND-TM where there are two index variables i
and j
.
Arrays can be indexed by either i
or j
.
The operation MODANDJMP
takes four variables j
, decrement j
or keep it in the same value (corresponding to
::: {.exercise title="Two tape Turing machines" #twotapeex}
Define a two tape Turing machine to be a Turing machine which has two separate tapes and two separate heads. At every step, the transition function gets as input the location of the cells in the two tapes, and can decide whether to move each head independently.
Prove that for every function
::: {.exercise title="Two dimensional arrays" #twodimnandtmex}
Define NAND-TM'' to be the variant of NAND-TM where just like NAND-TM' defined in twoindexex{.ref} there are two index variables i
and j
, but now the arrays are two dimensional and so we index an array Foo
by Foo[i][j]
.
Prove that for every function
::: {.exercise title="Two dimensional Turing machines" #twodimtapeex}
Define a two-dimensional Turing machine to be a Turing machine in which the tape is two dimensional. At every step the machine can move $\mathsf{U}$p, $\mathsf{D}$own, $\mathsf{L}$eft,
$\mathsf{R}$ight, or $\mathsf{S}$tay.
Prove that for every function
::: {.exercise}
Prove the following closure properties of the set
-
If
$F \in \mathbf{R}$ then the function$G(x) = 1 - F(x)$ is in$\mathbf{R}$ . -
If
$F,G \in \mathbf{R}$ then the function$H(x) = F(x) \vee G(x)$ is in$\mathbf{R}$ . -
If
$F \in \mathbf{R}$ then the function $F^$ in in $\mathbf{R}$ where $F^$ is defined as follows:$F^*(x)=1$ iff there exist some strings$w_0,\ldots,w_{k-1}$ such that$x = w_0 w_1 \cdots w_{k-1}$ and$F(w_i)=1$ for every$i\in [k]$ . -
If
$F \in \mathbf{R}$ then the function $$ G(x) = \begin{cases} \exists_{y \in {0,1}^{|x|}} F(xy) = 1 \ 0 & \text{otherwise} \end{cases} $$ is in$\mathbf{R}$ . :::
::: {.exercise title="Oblivious Turing Machines (challenging)" #obliviousTMex}
Define a Turing Machine
Prove that for every function
Prove that for every $F:{0,1}^* \rightarrow {0,1}^$, the function $F$ is computable if and only if the following function $G:{0,1}^ \rightarrow {0,1}$ is computable, where
::: {.exercise title="Uncomputability via counting" #uncomputabilityviacountingex}
Recall that
::: {.exercise title="Not every function is computable" #uncountablefuncex}
Prove that the set of all total functions from
Augusta Ada Byron, countess of Lovelace (1815-1852) lived a short but turbulent life, though is today most well known for her collaboration with Charles Babbage (see [@stein1987ada] for a biography). Ada took an immense interest in Babbage's analytical engine, which we mentioned in compchap{.ref}. In 1842-3, she translated from Italian a paper of Menabrea on the engine, adding copious notes (longer than the paper itself). The quote in the chapter's beginning is taken from Nota A in this text. Lovelace's notes contain several examples of programs for the analytical engine, and because of this she has been called "the world's first computer programmer" though it is not clear whether they were written by Lovelace or Babbage himself [@holt2001ada]. Regardless, Ada was clearly one of very few people (perhaps the only one outside of Babbage himself) to fully appreciate how significant and revolutionary the idea of mechanizing computation truly is.
The books of Shetterly [@shetterly2016hidden] and Sobel [@sobel2017the] discuss the history of human computers (who were female, more often than not) and their important contributions to scientific discoveries in astronomy and space exploration.
Alan Turing was one of the intellectual giants of the 20th century. He was not only the first person to define the notion of computation, but also invented and used some of the world's earliest computational devices as part of the effort to break the Enigma cipher during World War II, saving millions of lives. Tragically, Turing committed suicide in 1954, following his conviction in 1952 for homosexual acts and a court-mandated hormonal treatment. In 2009, British prime minister Gordon Brown made an official public apology to Turing, and in 2013 Queen Elizabeth II granted Turing a posthumous pardon. Turing's life is the subject of a great book and a mediocre movie.
Sipser's text [@SipserBook] defines a Turing machine as a seven tuple consisting of the state space, input alphabet, tape alphabet, transition function, starting state, accepting state, and rejecting state. Superficially this looks like a very different definition than TM-def{.ref} but it is simply a different representation of the same concept, just as a graph can be represented in either adjacency list or adjacency matrix form.
One difference is that Sipser considers a general set of states
Sipser considers also functions with input in
Another definition used in the literature is that a Turing machine
One of the first programming-language formulations of Turing machines was given by Wang [@Wang1957]. Our formulation of NAND-TM is aimed at making the connection with circuits more direct, with the eventual goal of using it for the Cook-Levin Theorem, as well as results such as