frap_book.tex

\documentclass{amsbook}

\usepackage{hyperref,url,amsmath,amssymb,proof,stmaryrd,tikz-cd,mathabx}

\newtheorem{theorem}{Theorem}[chapter]
\newtheorem{lemma}[theorem]{Lemma}

\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{xca}[theorem]{Exercise}

\theoremstyle{remark}
\newtheorem{remark}[theorem]{Remark}

\numberwithin{section}{chapter}
\numberwithin{equation}{chapter}

\makeindex

\begin{document}

\frontmatter

\title{Formal Reasoning About Programs}

\author{Adam Chlipala}
\address{MIT, Cambridge, MA, USA}
\email{adamc@csail.mit.edu}

\begin{abstract}
  \emph{Briefly}, this book is about an approach to bringing software engineering up to speed with more traditional engineering disciplines, providing a mathematical foundation for rigorous analysis of realistic computer systems. As civil engineers apply their mathematical canon to reach high certainty that bridges will not fall down, the software engineer should apply a different canon to argue that programs behave properly. As other engineering disciplines have their computer-aided-design tools, computer science has proof assistants, IDEs for logical arguments. We will learn how to apply these tools to certify that programs behave as expected.

  \emph{More specifically}: Introductions to two intertangled subjects: the Coq proof assistant, a tool for machine-checked mathematical theorem proving; and formal logical reasoning about the correctness of programs.
\end{abstract}

\maketitle

\newpage

For more information, see the book's home page:

\begin{center} \url{http://adam.chlipala.net/frap/} \end{center}

\thispagestyle{empty}
\mbox{}\vfill
\begin{center}

Copyright Adam Chlipala 2015-2021.


This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The license text is available at:

\end{center}

\begin{center} \url{https://creativecommons.org/licenses/by-nc-nd/4.0/} \end{center}

\newpage

\setcounter{page}{4}

\tableofcontents

\mainmatter

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Why Prove the Correctness of Programs?}

The classic engineering disciplines all have their standard mathematical techniques that are applied to the design of any artifact, before it is deployed, to gain confidence about its safety, suitability for some purpose, and so on.
The engineers in a discipline more or less agree on what are ``the rules'' to be followed in vetting a design.
Those rules are specified with a high degree of rigor, so that it isn't a matter of opinion whether a design is safe.
Why doesn't software engineering have a corresponding agreed-upon standard, whereby programmers convince themselves that their systems are safe, secure, and correct?
The concepts and tools may not quite be ready yet for broad adoption, but they have been under development for decades.
This book introduces one particular tool and a body of ideas for how to apply it to different tasks in program proof.

As this document is in a very early draft stage, no more will be said here, in favor of jumping right into the technical material.
Eventually, there will no doubt be some sort of historical overview here, as part of a general placing-in-context of the particular approach that will come next.
There will also be plenty of scholarly citations (here and throughout the book).
In this early version, you get to take the author's word for it that we are about to learn a promising approach!

However, one overarching element of our strategy is important enough to deserve to be called out here.
We will study a variety of different approaches for formalizing what a program should do and for proving that a program does what it should.
At every step, we will pay close attention to the \emph{common foundation} that underlies everything.
For one thing, we will be proving all of our theorems with the Coq proof assistant, a powerful framework for writing and machine-checking proofs.
Coq itself is based on a relatively small set of core features, much like a well-designed programming language, and in both we build up increasingly sophisticated abstractions as libraries.
Those features can be thought of as the core of all mathematical reasoning.

We will also apply a recipe specific to program proof.
When we encounter a new challenge, to prove a new kind of property about a new kind of program, we will generally be considering four broad elements that appear in nearly all techniques.

\begin{itemize}
  \item \index{encoding}\textbf{Encoding.}
    Every programming language has both \index{syntax}\emph{syntax}, which defines what programs look like, and \index{semantics}\emph{semantics}, which defines how programs behave when run.
    Even when these elements seem obvious intuitively, we often find that there are surprisingly subtle choices to be made in defining syntax and semantics at the highest level of rigor.
    Seemingly minor decisions can have big impacts on how smoothly our proofs go.

  \item \textbf{Invariants.}
    Nearly every theorem about a program is stated in terms of a \index{transition system}\emph{transition system}, with some set of states and a relation for stepping from one state to the next, moving forward in time.
    Nearly every program proof also works by finding an \index{invariant}\emph{invariant} of a transition system, or a property that always holds of every state reachable from some starting state.
    The concept of invariant is very close to being a direct reinterpretation of mathematical induction, that glue of every serious mathematical development, known and loved by all.

  \item \index{abstraction}\textbf{Abstraction.}
    Often a transition system is too complex to analyze directly.
    Instead, we \emph{abstract} it with another transition system that is somehow more tractable, proving that the new system preserves all relevant properties of the original.

  \item \index{modularity}\textbf{Modularity.}
    Similarly, when a transition system is too complex, we often break it into separate \emph{modules} and use some well-behaved composition operators to reassemble them into the whole.
    Often abstraction and modularity go together, as we decompose a system both \index{horizontal decomposition}\emph{horizontally} (i.e., with modularity), splitting it into more manageable parts, and \index{vertical decomposition}\emph{vertically} (i.e., with abstraction), simplifying parts in ways that preserve key properties.
    We can even alternate between strategies, breaking a system into parts, abstracting one as a simpler part, further decomposing that part into pieces, and so on.
\end{itemize}

\newcommand{\encoding}[0]{\marginpar{\fbox{\textbf{Encoding}}}}

In the course of the book, we will never quite define any of these meta-techniques in complete formality.
Instead, we'll meet many examples of each, called out by eye-catching margin notes.
Generalizing from the examples should help the reader start developing an intuition for when to use each element and for the common design patterns that apply.

The core subject matter of the book is often grouped under traditional disciplinary headers like \index{semantics}\emph{semantics}, \index{programming-languages theory}\emph{programming-languages theory}, \index{formal methods}\emph{formal methods}, and \index{verification}\emph{verification}.
Often these different traditions have their own competing terminology for shared concepts.
We'll follow one particular set of unified terminology and notation, cherry-picked from the conventions of different communities.
There really is a huge amount of commonality across everything that we'll study, so we don't want to distract by constantly translating between notations.
It is quite important to be literate in the standard notational conventions, which are almost always implemented with \index{\LaTeX{}}\LaTeX{}, and we stick entirely to that kind of notation in this book.
However, we follow another, much less usual convention: while we give theorem and lemma statements, we rarely give their proofs.
The reason is that the author and many other researchers today feel that proofs on paper have outlived their usefulness.
Instead, the proofs are all found in the parallel world of the accompanying Coq source code.

That is, each chapter of this book has a corresponding Coq source file, distributed with the general book source code.
The Coq sources are heavily commented and may even, in many cases, be feasible to read without also reading the book chapters.
More importantly, the Coq sources aren't just meant to be \emph{read}.
They are meant to be \emph{executed}.
We suggest stepping through them interactively, seeing intermediate states of proofs as appropriate.
The book proper can be read without the Coq sources, to learn the standard background material of program proof; and the Coq sources can be read without the book proper, to learn a particular concrete realization of those ideas.
However, they go better together.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Formalizing Program Syntax}\label{syntax}

\section{Concrete Syntax}

The definition of a program starts with the definition of a programming language, and the definition of a programming language starts with its \emph{syntax}\index{syntax}, which covers which sorts of phrases are basically well-formed.
In the next chapter, we turn to \emph{semantics}\index{semantics}, which, in the course of saying what programs \emph{mean}, may impose further validity conditions.
Turning to examples, let's start with \emph{concrete syntax}\index{concrete syntax}, which decrees which sequences of characters are acceptable.
For a simple language of arithmetic expressions, we might accept the following strings as valid.
$$\begin{array}{l}
  3 \\
  x \\
  3 + x \\
  y * (3 + x)
\end{array}$$

Plenty of other strings might be invalid, like these.
$$\begin{array}{l}
  1 + + \; 2 \\
  x \; y \; z
\end{array}$$

Rather than appeal to our intuition about grade-school arithmetic, we prefer to formalize concrete syntax with a \emph{grammar}\index{grammar}, following a style known as \emph{Backus-Naur Form (BNF)}\index{Backus-Naur Form}\index{BNF}.
We have a set of \emph{nonterminals}\index{nonterminal} (e.g., $e$ below), standing for sets of allowable strings.
Some are defined by appeal to existing sets, as below, when we define constants $n$ in terms of the well-known set $\mathbb N$\index{N@$\mathbb N$} of natural numbers\index{natural numbers} (nonnegative integers).
\encoding
$$\begin{array}{rrcl}
  \textrm{Constants} & n &\in& \mathbb N \\
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Expressions} & e &::=& n \mid x \mid e + e \mid e \times e
\end{array}$$

To interpret the grammar in plain English: we assume sets of constants and variables, based on well-known sets of natural numbers and strings, respectively.
We then define expressions to include constants, variables, addition, and multiplication.
Crucially, the last two cases are specified \emph{recursively}: we show how to build bigger expressions out of smaller ones.

Incidentally, we're already seeing how many different formal notations creep into the discussion of formal program proofs.
All of this content is typeset in \LaTeX{}\index{\LaTeX{}}, and it may be helpful to consult the book sources, to see how it's all done.

Throughout the subject, one of our most crucial tools will be \emph{inductive definitions}\index{inductive definition}, explaining how to build up bigger sets from smaller ones.
The recursive nature of the grammar above is implicitly giving an inductive definition.
A more general notation for inductive definitions provides a series of \emph{inference rules}\index{inference rules} that define a set.
Formally, the set is defined to be \emph{the smallest one that satisfies all the rules}.
Each rule has \emph{premises}\index{premise} and a \emph{conclusion}\index{conclusion}.
We illustrate with four rules that together are equivalent to the BNF grammar above, for defining a set $\mathsf{Exp}$ of expressions.
\encoding
$$\infer{n \in \mathsf{Exp}}{
  n \in \mathbb N
}
\quad \infer{x \in \mathsf{Exp}}{
  x \in \mathsf{Strings}
}
\quad \infer{e_1 + e_2 \in \mathsf{Exp}}{
  e_1 \in \mathsf{Exp}
  & e_2 \in \mathsf{Exp}
}
\quad \infer{e_1 \times e_2 \in \mathsf{Exp}}{
  e_1 \in \mathsf{Exp}
  & e_2 \in \mathsf{Exp}
}$$

The general reading of an inference rule is: \textbf{if} all the facts above the horizontal line are true, \textbf{then} the fact below the line is true, too.
The rule implicitly needs to hold for \emph{all} values of the \emph{metavariables}\index{metavariable} (like $n$ and $e_1$) that appear within it; we can model them more explicitly with a sort of top-level universal quantification.
Newcomers to semantics often react negatively to seeing this style of definition, but very quickly it becomes apparent as a remarkably compact notation for expressing many concepts.
Think of it as a domain-specific programming language for mathematical definitions, an analogy that becomes quite concrete in the associated Coq code!

\section{Abstract Syntax}

After that brief interlude with concrete syntax, we now drop all formal treatment of it, for the rest of the book!
Instead, we concern ourselves with \emph{abstract syntax}\index{abstract syntax}, the real heart of language definitions.
Now programs are \emph{abstract syntax trees}\index{abstract syntax tree} (\emph{ASTs}\index{AST}), corresponding to inductive type definitions in Coq or algebraic datatype\index{algebraic datatype} definitions in Haskell\index{Haskell}.
Such types can be defined by enumerating their \emph{constructor}\index{constructor} functions with types.
\encoding
\begin{eqnarray*}
  \mathsf{Const} &:& \mathbb{N} \to \mathsf{Exp} \\
  \mathsf{Var} &:& \mathsf{Strings} \to \mathsf{Exp} \\
  \mathsf{Plus} &:& \mathsf{Exp} \times \mathsf{Exp} \to \mathsf{Exp} \\
  \mathsf{Times} &:& \mathsf{Exp} \times \mathsf{Exp} \to \mathsf{Exp}
\end{eqnarray*}

Note that the ``$\times$'' here is not the multiplication operator of concrete syntax, but rather the Cartesian-product operator\index{Cartesian product} of set theory, to indicate a type of pairs!

Such a list of constructors defines the set $\mathsf{Exp}$ to contain exactly those terms that can be built up with the constructors.
In inference-rule notation:
\encoding
$$\infer{\mathsf{Const}(n) \in \mathsf{Exp}}{
  n \in \mathbb N
}
\quad \infer{\mathsf{Var}(x) \in \mathsf{Exp}}{
  x \in \mathsf{Strings}
}
\quad \infer{\mathsf{Plus}(e_1, e_2) \in \mathsf{Exp}}{
  e_1 \in \mathsf{Exp}
  & e_2 \in \mathsf{Exp}
}
\quad \infer{\mathsf{Times}(e_1, e_2) \in \mathsf{Exp}}{
  e_1 \in \mathsf{Exp}
  & e_2 \in \mathsf{Exp}
}$$

Actually, semanticists get tired of writing such verbose descriptions, so proofs on paper tend to use exactly the sort of notation that we associated with concrete syntax.
The trick is mental desugaring of the concrete-syntax notation into abstract syntax!
We will generally not dwell on the particularities of that process.
Instead, we repeatedly illustrate it by example, using Coq code that starts with abstract syntax, accompanied by \LaTeX{}-based ``code'' in this book that applies concrete syntax freely.

Abstract syntax is handy for writing \emph{recursive definitions}\index{recursive definition} of functions.
Here is one in the clausal\index{clausal function definition} style of Haskell\index{Haskell}.
\begin{eqnarray*}
  \mathsf{size}(\mathsf{Const}(n)) &=& 1 \\
  \mathsf{size}(\mathsf{Var}(x)) &=& 1 \\
  \mathsf{size}(\mathsf{Plus}(e_1, e_2)) &=& 1 + \mathsf{size}(e_1) + \mathsf{size}(e_2) \\
  \mathsf{size}(\mathsf{Times}(e_1, e_2)) &=& 1 + \mathsf{size}(e_1) + \mathsf{size}(e_2)
\end{eqnarray*}

It is important that we include \emph{one clause per constructor of the inductive type}.
Otherwise, the function would not be \emph{total}\index{total function}.
We also need to be careful to ensure \emph{termination}\index{termination of recursive definitions}, by making recursive calls only on the arguments of the constructors.
This termination criterion, adopted by Coq, is called \emph{primitive recursion}\index{primitive recursion}.

\newcommand{\size}[1]{{\left \lvert #1 \right \rvert}}

It is also common to associate a recursive definition with a new notation.
For example, we might prefer to write $\size{e}$ for $\mathsf{size}(e)$, as follows.
\begin{eqnarray*}
  \size{\mathsf{Const}(n)} &=& 1 \\
  \size{\mathsf{Var}(x)} &=& 1 \\
  \size{\mathsf{Plus}(e_1, e_2)} &=& 1 + \size{e_1} + \size{e_2} \\
  \size{\mathsf{Times}(e_1, e_2)} &=& 1 + \size{e_1} + \size{e_2}
\end{eqnarray*}

\newcommand{\depth}[1]{{\left \lceil #1 \right \rceil}}

Let's continue to exercise our creative license and write $\depth{e}$ for the \emph{depth} of $e$, that is, the length of the longest downward path from the syntax-tree root to any leaf.
\begin{eqnarray*}
  \depth{\mathsf{Const}(n)} &=& 1 \\
  \depth{\mathsf{Var}(x)} &=& 1 \\
  \depth{\mathsf{Plus}(e_1, e_2)} &=& 1 + \max(\depth{e_1}, \depth{e_2}) \\
  \depth{\mathsf{Times}(e_1, e_2)} &=& 1 + \max(\depth{e_1}, \depth{e_2})
\end{eqnarray*}


\section{Structural Induction Principles}

The main reason to prefer abstract syntax is that, while strings of text \emph{seem} natural and simple to our human brains, they are really a lot of trouble to treat in complete formality.
Inductive trees are much nicer to manipulate.
Considering the name, it's probably not surprising that the main thing we want to do on them is \emph{induction}\index{induction}, an activity most familiar in the form of \emph{mathematical induction}\index{mathematical induction} over the natural numbers.
In this book, we will not dwell on many proofs about natural numbers, instead presenting the more general and powerful idea of \emph{structural induction}\index{structural induction} that subsumes mathematical induction in a formal sense, based on viewing the natural numbers as one simple inductively defined set.

There is a general recipe to go from an inductive definition to its associated induction principle.
When we define set $S$ inductively, we gain an induction principle for proving that some predicate $P$ holds for all elements of $S$.
To make this conclusion, we must discharge one proof obligation per rule of the inductive definition.
Recall our last rule-based definition above, for the abstract syntax of $\mathsf{Exp}$.
To derive an $\mathsf{Exp}$ structural induction principle, we produce a new set of rules, cloning each rule with two key modifications:
\begin{enumerate}
  \item Replace each conclusion, of the form $E \in S$, with a conclusion $P(E)$.  That is, the obligations involve \emph{showing} that $P$ holds of certain terms.
  \item For each premise $E \in S$, add a companion premise $P(E)$.  That is, the obligation allows \emph{assuming} that $P$ holds of certain terms.  Each such assumption is called an \emph{inductive hypothesis}\index{inductive hypothesis} (\emph{IH}\index{IH}).
\end{enumerate}

That mechanical procedure derives the following four proof obligations, associated with an inductive proof that $\forall x \in \mathsf{Exp}. \; P(x)$.
$$\infer{P(\mathsf{Const}(n))}{
  n \in \mathbb N
}
\quad \infer{P(\mathsf{Var}(x))}{
  x \in \mathsf{Strings}
}$$
$$\quad \infer{P(\mathsf{Plus}(e_1, e_2))}{
  e_1 \in \mathsf{Exp}
  & P(e_1)
  & e_2 \in \mathsf{Exp}
  & P(e_2)
}
\quad \infer{P(\mathsf{Times}(e_1, e_2))}{
  e_1 \in \mathsf{Exp}
  & P(e_1)
  & e_2 \in \mathsf{Exp}
  & P(e_2)
}$$

In other words, to establish $\forall x \in \mathsf{Exp}. \; P(x)$, we need to prove that each of these inference rules is valid.

To see induction in action, we prove a theorem giving a sanity check on our two recursive definitions from earlier: depth can never exceed size.
\begin{theorem}
  For all $e \in \mathsf{Exp}$, $\depth{e} \leq \size{e}$.
\end{theorem}
\begin{proof}
  By induction on the structure of $e$.
\end{proof}

That sort of minimalist proof often surprises and frustrates newcomers.
Our position here is that proof checking is an activity fit for machines, not people, so we will leave out gory details, which are to be found in the accompanying Coq code, for this theorem and many others associated with this chapter.
Actually, even published proofs on paper tend to use ``proofs'' as brief as the one above, relying on the reader's experience to ``fill in the blanks''!
Unsurprisingly, fairly often there are logical errors in such arguments, leading to acceptance of bogus theorems.
For that reason, we stick to machine-checked proofs here, using the book chapters to introduce concepts, reasoning principles, and statements of key theorems and lemmas.

\section{\label{decidable}Decidable Theories}

We do, however, need to get all the proof details filled in somehow.
One of the most convenient cases is when a proof goal fits into some \emph{decidable theory}\index{decidable theory}.
We follow the sense from computability theory\index{computability theory}, where we consider some \emph{decision problem}\index{decision problem}, as a (usually infinite) set $F$ of formulas and some subset $T \subseteq F$ of \emph{true} formulas, possibly considering only those provable using some limited set of inference rules.
The decision problem is \emph{decidable} if and only if there exists some always-terminating program that, when passed some $f \in F$ as input, returns ``true'' if and only if $f \in T$.
Decidability of theories is handy because, whenever our goal belongs to the $F$ set of a decidable theory, we can discharge the goal automatically by running the deciding program that must exist.

One common decidable theory is \emph{linear arithmetic}\index{linear arithmetic}, whose $F$ set is generated by the following grammar as $\phi$.
$$\begin{array}{rrcl}
  \textrm{Constants} & n &\in& \mathbb Z \\
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Terms} & e &::=& x \mid n \mid e + e \mid e - e \\
  \textrm{Propositions} & \phi &::=& e = e \mid e < e \mid \neg \phi \mid \phi \land \phi
\end{array}$$

The arithmetic terms used here are \emph{linear} in the same sense as \emph{linear algebra}\index{linear algebra}: we never multiply together two terms containing variables.
Actually, multiplication is prohibited outright, but we allow multiplication by a constant as an abbreviation (logically speaking) for repeated addition.
Propositions are formed out of equality and less-than tests on terms, and we also have the Boolean negation (``not'') operator $\neg$ and conjunction (``and'') operator $\land$.
This set of propositional\index{propositional logic} operators is enough to encode the other usual inequality and propositional operators, so we allow them, too, as convenient shorthands.

Using decidable theories in a proof assistant like Coq, it is important to understand how a theory may apply to formulas that don't actually satisfy its grammar literally.
For instance, we may want to prove $f(x) - f(x) = 0$, for some fancy function $f$ well outside the grammar above.
However, we only need to introduce a new variable $y$, defined with the equation $y = f(x)$, to arrive at a new goal $y - y = 0$.
A linear-arithmetic procedure makes short work of this goal, and we may then derive the original goal by substituting back in for $y$.
Coq's tactics based on decidable theories do all that hard work for us.

\medskip

Another important decidable theory is of \emph{equality with uninterpreted functions}\index{theory of equality with uninterpreted functions}.
$$\begin{array}{rrcl}
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Functions} & f &\in& \mathsf{Strings} \\
  \textrm{Terms} & e &::=& x \mid f(e, \ldots, e) \\
  \textrm{Propositions} & \phi &::=& e = e \mid \neg \phi \mid \phi \land \phi
\end{array}$$

In this theory, we know nothing about the detailed properties of the variables or functions that we use.
Instead, we must reason solely from the basic properties of equality:
$$\infer[\mathsf{Reflexivity}]{e = e}{}
\quad \infer[\mathsf{Symmetry}]{e_1 = e_2}{
  e_2 = e_1
}
\quad \infer[\mathsf{Transitivity}]{e_1 = e_2}{
  e_1 = e_3
  & e_3 = e_2
}$$
$$\infer[\mathsf{Congruence}]{f(e_1, \ldots, e_n) = f'(e'_1, \ldots, e'_n)}{
  f = f'
  & e_1 = e'_1
  & \ldots
  & e_n = e'_n
}$$

\medskip

As one more example of a decidable theory, consider the algebraic structure of \emph{semirings}\index{semirings}, which may profitably be remembered as ``types that act like natural numbers.''
A semiring is any set containing two elements notated 0 and 1, closed under two binary operators notated $+$ and $\times$.
The notations are suggestive, but in fact we have free reign in choosing the set, elements, and operators, so long as the following axioms\footnote{The equations are taken almost literally from \url{https://en.wikipedia.org/wiki/Semiring}.} are satisfied:
\begin{eqnarray*}
  (a + b) + c &=& a + (b + c) \\
  0 + a &=& a \\
  a + 0 &=& a \\
  a + b &=& b + a \\
  (a \times b) \times c &=& a \times (b \times c) \\
  1 \times a &=& a \\
  a \times 1 &=& a \\
  a \times (b + c) &=& (a \times b) + (a \times c) \\
  (a + b) \times c &=& (a \times c) + (b \times c) \\
  0 \times a &=& 0 \\
  a \times 0 &=& 0
\end{eqnarray*}

The formal theory is then as follows, where we consider as ``true'' only those equalities that follow from the axioms.
$$\begin{array}{rrcl}
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Terms} & e &::=& 0 \mid 1 \mid x \mid e + e \mid e \times e \\
  \textrm{Propositions} & \phi &::=& e = e
\end{array}$$

Note how the applicability of the semiring theory is incomparable to the applicability of the linear-arithmetic theory.
That is, while some goals are provable via either, some are provable only via the semiring theory and some provable only by linear arithmetic.
For instance, by the semiring theory, we can prove $x(y + z) = xy + xz$, while linear arithmetic can prove $x - x = 0$.

\section{Simplification and Rewriting}

While we leave most proof details to the accompanying Coq code, it does seem important to introduce two key principles that are often implicit in proofs on paper.

The first is \emph{algebraic simplification}\index{algebraic simplification}, where we apply the defining equations of a recursive definition to simplify a goal.
For example, recall that our definition of expression size included this clause.
\begin{eqnarray*}
  \size{\mathsf{Plus}(e_1, e_2)} &=& 1 + \size{e_1} + \size{e_2}
\end{eqnarray*}
Now imagine that we are trying to prove this formula.
$$\size{\mathsf{Plus}(e, \mathsf{Const}(7))} = 2 + \size{e}$$
We may apply the defining equation to rewrite into a different formula, where we have essentially pushed the definition of $\size{\cdot}$ through the $\mathsf{Plus}$.
$$1 + \size{e} + \size{\mathsf{Const}(7)} = 2 + \size{e}$$
Another application of a different defining equation, this time for $\mathsf{Const}$, takes us to here.
$$1 + \size{e} + 1 = 2 + \size{e}$$
From here, the goal follows by linear arithmetic.

\medskip

Such a proof establishes a theorem $\forall e \in \mathsf{Exp}. \; \size{\mathsf{Plus}(e, \mathsf{Const}(7))} = 2 + \size{e}$.
We may use already-proved theorems via a more general \emph{rewriting}\index{rewriting} mechanism, applying whenever we know some quantified equality.
Within a new goal we are proving, we find some subterm that matches the lefthand side of that equality, after we choose the proper values of the quantified variables.
The process of finding those values automatically is called \emph{unification}\index{unification}.
Rewriting enables us to take the subterm we found and replace it with the righthand side of the equation.

As an example, assume that, for some $P$, we know $P(2 + \size{\mathsf{Var}(x)})$ and are trying to prove $P(\size{\mathsf{Plus}(\mathsf{Var}(x), \mathsf{Const}(7))})$.
We may use our earlier fact to rewrite the argument of $P$ in what we are trying to show, so that it now matches the argument from what we already know, at which point the proof is trivial to finish.
Here, unification found the assignment $e = \mathsf{Var}(x)$.

\medskip

\encoding
\label{metalanguage}
We close the chapter with an important note on terminology.
A formula like $P(\size{\mathsf{Plus}(\mathsf{Var}(x), \mathsf{Const}(7))})$ combines several levels of notation.
We consider that we are doing our mathematical reasoning in some \emph{metalanguage}\index{metalanguage}, which is often applicable to a wide variety of proof tasks.
We also happen to be applying it here to reason about some \emph{object language}\index{object language}, a programming language whose syntax is defined formally, here the language of arithmetic expressions.
We have $x$ as a variable of the metalanguage, while $\mathsf{Var}(x)$ is a variable expression of the object language.
It is difficult to use English to explain the distinction between the two in complete formality, but be on the lookout for places where formulas mix concepts of the metalanguage and object language!
The general patterns should soon become clear, as they are somehow already familiar to us from natural-language sentences like:
\begin{quote}
  The wise man said, ``it is time to prove some theorems.''
\end{quote}
The quoted remark could just as well be in Spanish instead of English, in which case we have two languages nested in a nontrivial way.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Data Abstraction}\label{adt}

All of the fully formal proofs in this book are worked out only in associated Coq code.
Therefore, before proceeding to more topics in program semantics and proof, it is important to develop some basic Coq competence.
Several heavily commented examples files are associated with this crucial point in the book.
We won't discuss details of Coq proving in this document, outside Appendix \ref{coqref}.
However, one of the possibilities shown off in the Coq code is worth drawing attention to, as a celebrated semantics idea in its own right, though we don't yet connect it to formalized syntax of programming languages.
That idea is \emph{data abstraction}\index{data abstraction}, one of the most central ideas in program structuring.
Let's consider the mathematical meaning of \emph{encapsulation}\index{encapsulation} in data structures.

\section{Algebraic Interfaces for Abstract Data Types}

\newcommand{\mt}[1]{\mathsf{#1}}

Consider the humble queue\index{queues}, a classic data structure that allows us to enqueue data elements and then dequeue them in the order received.
Perhaps surprisingly, there is already some complexity in efficient queue implementation.
So-called \emph{client code}\index{client code} that relies on queues shouldn't need to know about that complexity, though.
We should be able to formulate ``queue'' as an \emph{abstract data type}\index{abstract data type}, hiding implementation details.
In the setting of pure functional programming, as in Coq, here is our first cut at such a data type, as a set of types and operations, somewhat reminiscent of e.g. interfaces\index{interface} in Java\index{Java}.
Type $\mt{t}(\alpha)$ stands for queues holding data values in some type $\alpha$.

\begin{eqnarray*}
  \mt{t}(\alpha) &:& \mt{Set} \\
  \mt{empty} &:& \mt{t}(\alpha) \\
  \mt{enqueue} &:& \mt{t}(\alpha) \times \alpha \to \mt{t}(\alpha) \\
  \mt{dequeue} &:& \mt{t}(\alpha) \rightharpoonup \mt{t}(\alpha) \times \alpha
\end{eqnarray*}

A few notational conventions of note:
We declare that $\mt{t}(\alpha)$ is a type by assigning it the type $\mt{Set}$, which itself contains all the normal types of programming.
An empty queue exists for any $\alpha$, and enqueue and dequeue operations are also available for any $\alpha$.
The type of $\mt{dequeue}$ indicates function partiality\index{partial function} by the arrow $\rightharpoonup$: dequeuing yields no answer for an empty queue.
For partial function $f : A \rightharpoonup B$, we indicate lack of a mapping for $x \in A$ by writing $f(x) = \cdot$.

In normal programming, we stop at this level of detail in defining an abstract data type.
However, when we're after formal correctness proofs, we must enrich data types with \emph{specifications}\index{specifications} or ``specs.''\index{specs}  
One prominent spec style is \emph{algebraic}\index{algebraic specifications}: write out a set of \emph{laws}, quantified equalities that use the operations of the data type.
For queues, here are two reasonable laws.

$$\begin{array}{l}
  \mt{dequeue}(\mt{empty}) = \cdot \\
  \forall q. \; \mt{dequeue}(q) = \cdot \Rightarrow q = \mt{empty} \\
\end{array}$$

Actually, the inference-rule notation from last chapter also makes algebraic laws more readable, so here is a restatement.

$$\infer{\mt{dequeue}(\mt{empty}) = \cdot}{}
\quad \infer{q = \mt{empty}}{\mt{dequeue}(q) = \cdot}$$

One more rule suffices to give a complete characterization of behavior, with the familiar math notation for piecewise functions\index{piecewise functions}.

$$\infer{\mt{dequeue}(\mt{enqueue}(q, x)) = \begin{cases}
    (\mt{empty}, x), & \mt{dequeue}(q) = \cdot \\
    (\mt{enqueue}(q', x), y), & \mt{dequeue}(q) = (q', y)
  \end{cases}}{}$$

\newcommand{\concat}[2]{#1 \bowtie #2}

Now several implementations of this functionality are possible.
Here's one of the two ``obvious'' ones, where we enqueue to list fronts and dequeue from list backs.
We write $\mt{list}(\alpha)$ for the type of lists\index{lists} with data elements from $\alpha$, with $\concat{\ell_1}{\ell_2}$ for concatenation of lists $\ell_1$ and $\ell_2$, and with comma-separated lists inside square brackets for list literals.
\begin{eqnarray*}
  \mt{t(\alpha)} &=& \mt{list}(\alpha) \\
  \mt{empty} &=& [] \\
  \mt{enqueue}(q, x) &=& \concat{[x]}{q} \\
  \mt{dequeue}([]) &=& \cdot \\
  \mt{dequeue}(\concat{[x]}{q}) &=& ([], x)\textrm{, when $\mt{dequeue}(q) = \cdot$.} \\
  \mt{dequeue}(\concat{[x]}{q}) &=& (\concat{[x]}{q'}, y)\textrm{, when $\mt{dequeue}(q) = (q', y)$.}
\end{eqnarray*}

There is also a dual implementation where we enqueue to list backs and dequeue from list fronts.
\begin{eqnarray*}
  \mt{t(\alpha)} &=& \mt{list}(\alpha) \\
  \mt{empty} &=& [] \\
  \mt{enqueue}(q, x) &=& \concat{q}{[x]} \\
  \mt{dequeue}([]) &=& \cdot \\
  \mt{dequeue}(\concat{[x]}{q}) &=& (q, x)
\end{eqnarray*}

Proofs of the algebraic laws, for both implementations, appear in the associated Coq code.
Both versions actually take quadratic time in practice, assuming concatenation takes time linear in the length of its first argument.
There is a famous, more clever implementation that achieves amortized\index{amortized time} constant time (linear time to run a whole sequence of operations), but we will need to expand our algebraic style to accommodate it.


\section{Algebraic Interfaces with Custom Equivalence Relations}

We find it useful to extend the base interface of queues with a new, mathematical ``operation'':
\begin{eqnarray*}
  \mt{t}(\alpha) &:& \mt{Set} \\
  \mt{empty} &:& \mt{t}(\alpha) \\
  \mt{enqueue} &:& \mt{t}(\alpha) \times \alpha \to \mt{t}(\alpha) \\
  \mt{dequeue} &:& \mt{t}(\alpha) \rightharpoonup \mt{t}(\alpha) \times \alpha \\
  \mt{\approx} &:& \mathcal P(\mt{t}(\alpha) \times \mt{t}(\alpha))
\end{eqnarray*}

We use the ``powerset''\index{powerset} operation $\mathcal P$ to indicate that $\approx$ is a \emph{binary relation}\index{binary relation} over queues (of the same type).
Our intention is that $\approx$ be an \emph{equivalence relation}, as formalized by the following laws that we add.
$$\infer[\mathsf{Reflexivity}]{a \approx a}{}
\quad \infer[\mathsf{Symmetry}]{a \approx b}{b \approx a}
\quad \infer[\mathsf{Transitivity}]{a \approx c}{a \approx b & b \approx c}$$

Now we rewrite the original laws to use $\approx$ instead of equality.
We implicitly lift $\approx$ to apply to results of the partial function $\mt{dequeue}$: nonexistent results $\cdot$ are related, and existent results $(q_1, x_1)$ and $(q_2, x_2)$ are related iff $q_1 \approx q_2$ and $x_1 = x_2$.
$$\infer{\mt{dequeue}(\mt{empty}) = \cdot}{}
\quad \infer{q \approx \mt{empty}}{\mt{dequeue}(q) = \cdot}$$

$$\infer{\mt{dequeue}(\mt{enqueue}(q, x)) \approx \begin{cases}
    (\mt{empty}, x), & \mt{dequeue}(q) = \cdot \\
    (\mt{enqueue}(q', x), y), & \mt{dequeue}(q) = (q', y)
  \end{cases}}{}$$

What's the payoff from this reformulation?
Well, first, it passes the sanity check that the two queue implementations from the last section comply, with $\approx$ instantiated as simple equality.
However, we may now also handle the classic \emph{two-stack queue}\index{two-stack queue}.
Here is its implementation, relying on list-reversal function $\mt{rev}$ (which takes linear time).
\begin{eqnarray*}
  \mt{t(\alpha)} &=& \mt{list}(\alpha) \times \mt{list}(\alpha) \\
  \mt{empty} &=& ([], []) \\
  \mt{enqueue}((\ell_1, \ell_2), x) &=& (\concat{[x]}{\ell_1}, \ell_2) \\
  \mt{dequeue}(([], [])) &=& \cdot \\
  \mt{dequeue}((\ell_1, \concat{[x]}{\ell_2})) &=& ((\ell_1, \ell_2), x) \\
  \mt{dequeue}((\ell_1, [])) &=& (([], q'_1), x)\textrm{, when $\mt{rev}(\ell_1) = \concat{[x]}{q'_1}$.}
\end{eqnarray*}

The basic trick is to encode a queue as a pair of lists $(\ell_1, \ell_2)$.
We try to enqueue into $\ell_1$ by adding elements to its front in constant time, and we try to dequeue from $\ell_2$ by removing elements from its front in constant time.
However, sometimes we run out of elements in $\ell_2$ and need to \emph{reverse} $\ell_1$ and transfer the result into $\ell_2$.
The suitable equivalence relation formalizes this plan.
\begin{eqnarray*}
  \mt{rep}((\ell_1, \ell_2)) &=& \concat{\ell_1}{\mt{rev}(\ell_2)} \\
  q_1 \approx q_2 &=& \mt{rep}(q_1) = \mt{rep}(q_2)
\end{eqnarray*}

We can prove both that this $\approx$ is an equivalence relation and that the other queue laws are satisfied.
As a result, client code (and its correctness proofs) can use this fancy code, effectively viewing it as a simple queue, with the two-stack nature hidden.

Why did we need to go through the trouble of introducing custom equivalence relations?
Consider the following two queues.
Are they equal?
(We write $\pi_1$ for the function that projects out the first element of a pair.)
\begin{eqnarray*}
  \mt{enqueue}(\mt{empty}, 2) &\stackrel{?}{=}& \pi_1(\mt{dequeue}(\mt{enqueue}(\mt{enqueue}(\mt{empty}, 1), 2)))
\end{eqnarray*}

No, they aren't equal!  The first expression reduces to $([2], [])$, while the second reduces to $([], [2])$.
This data structure is \emph{noncanonical}\index{noncanonical}, in the sense that the same logical value may have multiple physical representations.
The equivalence relation lets us indicate which physical representations are equivalent.

\section{Representation Functions}

That last choice of equivalence relations suggests another specification style, based on \emph{representation functions}\index{representation functions}.
We can force every queue to include a function to convert to a standard, canonical representation.
Real executable programs shouldn't generally call that function; it's most useful to us in phrasing the algebraic laws.
Perhaps surprisingly, the mere existence of any compatible function is enough to show correctness of a queue implementation, and the approach generalizes to essentially all other data structures cast as abstract data types.

Here is how we revise our type signature for queues.
\begin{eqnarray*}
  \mt{t}(\alpha) &:& \mt{Set} \\
  \mt{empty} &:& \mt{t}(\alpha) \\
  \mt{enqueue} &:& \mt{t}(\alpha) \times \alpha \to \mt{t}(\alpha) \\
  \mt{dequeue} &:& \mt{t}(\alpha) \rightharpoonup \mt{t}(\alpha) \times \alpha \\
  \mt{rep} &:& \mt{t}(\alpha) \to \mt{list}(\alpha)
\end{eqnarray*}

And here are the revised axioms.

$$\infer{\mt{rep}(\mt{empty}) = []}{}
\quad \infer{\mt{rep}(\mt{enqueue}(q, x)) = \concat{[x]}{\mt{rep}(q)}}{}$$

$$\infer{\mt{dequeue}(q) = \cdot}{\mt{rep}(q) = []}
\quad \infer{\exists q'. \; \mt{dequeue}(q) = (q', x) \land \mt{rep}(q') = \ell}{\mt{rep}(q) = \concat{\ell}{[x]}}$$

Notice that this specification style can also be viewed as \emph{giving a reference implementation\index{reference implementations of data types} of the data type}, where $\mt{rep}$ shows how to convert back to the reference implementation at any point.

\section{Fixing Parameter Types for Abstract Data Types}

Here's another classic abstract data type: finite sets\index{finite sets}, where we write $\mathbb B$ for the set of Booleans.
\begin{eqnarray*}
  \mt{t}(\alpha) &:& \mt{Set} \\
  \mt{empty} &:& \mt{t}(\alpha) \\
  \mt{add} &:& \mt{t}(\alpha) \times \alpha \to \mt{t}(\alpha) \\
  \mt{member} &:& \mt{t}(\alpha) \times \alpha \to \mathbb B
\end{eqnarray*}

A few laws characterize expected behavior, with $\top$ and $\bot$ the respective elements ``true'' and ``false'' of $\mathbb B$.

$$\infer{\mt{member}(\mt{empty}, k) = \bot}{}
\quad \infer{\mt{member}(\mt{add}(s, k), k) = \top}{}
\quad \infer{\mt{member}(\mt{add}(s, k_1), k_2) = \mt{member}(s, k_2)}{k_1 \neq k_2}$$

There is a simple generic implementation of this data type with unsorted lists.
\begin{eqnarray*}
  \mt{t} &=& \mt{list} \\
  \mt{empty} &=& [] \\
  \mt{add}(s, k) &=& \concat{[k]}{s} \\
  \mt{member}([], k) &=& \bot \\
  \mt{member}(\concat{[k']}{s}, k) &=& k = k' \lor \mt{member}(s, k)
\end{eqnarray*}

However, we can build specialized finite sets for particular element types and usage patterns.
For instance, assume we are working with sets of natural numbers, where we know that most sets contain consecutive numbers.
In those cases, it suffices to store just the lowest and highest elements of sets, and all the set operations run in constant time.
Assume a fallback implementation of finite sets, with type $t_0$ and operations $\mt{empty}_0$, $\mt{add}_0$, and $\mt{member}_0$.
We implement our optimized set type like so, assuming an operation $\mt{fromRange} : \mathbb N \times \mathbb N \to \mt{t}_0$ to turn a range into an ad-hoc set.
\begin{eqnarray*}
  \mt{t} &=& \mt{Empty} \mid \mt{Range}(\mathbb N \times \mathbb N) \mid \mt{AdHoc}(\mt{t}_0) \\
  \mt{empty} &=& \mt{Empty} \\
  \mt{add}(\mt{Empty}, k) &=& \mt{Range}(k, k) \\
  \mt{add}(\mt{Range}(n_1, n_2), k) &=& \mt{Range}(n_1, n_2)\textrm{, when $n_1 \leq k \leq n_2$} \\
  \mt{add}(\mt{Range}(n_1, n_2), n_1-1) &=& \mt{Range}(n_1-1, n_2)\textrm{, when $n_1 \leq n_2$} \\
  \mt{add}(\mt{Range}(n_1, n_2), n_2+1) &=& \mt{Range}(n_1, n_2+1)\textrm{, when $n_1 \leq n_2$} \\
  \mt{add}(\mt{Range}(n_1, n_2), k) &=& \mt{AdHoc}(\mt{add}_0(\mt{fromRange}(n_1, n_2), k))\textrm{, otherwise} \\
  \mt{add}(\mt{AdHoc}(s), k) &=& \mt{AdHoc}(\mt{add}_0(s, k)) \\
  \mt{member}(\mt{Empty}, k) &=& \bot \\
  \mt{member}(\mt{Range}(n_1, n_2), k) &=& n_1 \leq k \leq n_2 \\
  \mt{member}(\mt{AdHoc}(s), k) &=& \mt{member}_0(s, k)
\end{eqnarray*}

This implementation can be proven to satisfy the finite-set spec, assuming that the baseline ad-hoc implementation does, too.
For workloads that only build sets of consecutive numbers, this implementation can be much faster than the generic list-based implementation, converting quadratic-time algorithms into linear-time.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{\label{interpreters}Semantics via Interpreters}

That's enough about what programs \emph{look like}.
Let's shift our attention to what programs \emph{mean}.

\section{Semantics for Arithmetic Expressions via Finite Maps}

\newcommand{\mempty}[0]{\bullet}
\newcommand{\msel}[2]{#1(#2)}
\newcommand{\mupd}[3]{#1[#2 \mapsto #3]}

To explain the meaning of one of Chapter \ref{syntax}'s arithmetic expressions, we need a way to indicate the value of each variable.
\encoding
A theory of \emph{finite maps}\index{finite map} is helpful here.
We apply the following notations throughout the book: \\

\begin{tabular}{rl}
  $\mempty$ & empty map, with $\emptyset$ as its domain \\
  $\msel{m}{k}$ & mapping of key $k$ in map $m$ \\
  $\mupd{m}{k}{v}$ & extension of map $m$ to also map key $k$ to value $v$
\end{tabular} \\

As the name advertises, finite maps are functions with finite domains, where the domain may be expanded by each extension operation.
Two axioms explain the essential interactions of the basic operators.

$$\infer{\msel{\mupd{m}{k}{v}}{k} = v}{}
\quad
\infer{\msel{\mupd{m}{k_1}{v}}{k_2} = m(k_2)}{
  k_1 \neq k_2
}$$

\newcommand{\denote}[1]{{\left \llbracket #1 \right \rrbracket}}

With these operators in hand, we can write a semantics for arithmetic expressions.
This is a recursive function that \emph{maps variable valuations to numbers}.
We write $\denote{e}$ for the meaning of $e$; this notation is often referred to as \emph{Oxford brackets}\index{Oxford brackets}.
Recall that we allow notations like this as syntactic sugar for arbitrary functions, even when giving the equations that define those functions.
We write $v$ for a valuation (finite map).
\encoding
\begin{eqnarray*}
  \denote{n}v &=& n \\
  \denote{x}v &=& v(x) \\
  \denote{e_1 + e_2}v &=& \denote{e_1}v + \denote{e_2}v \\
  \denote{e_1 \times e_2}v &=& \denote{e_1}v \times \denote{e_2}v
\end{eqnarray*}

Note how parts of the definition feel a little bit like cheating, as we just ``push notations inside the brackets.''
It's important to remember that plus \emph{inside} the brackets is syntax, while plus \emph{outside} the brackets is the normal addition of math!

\newcommand{\subst}[3]{[#3/#2]#1}

To test our semantics, we define a \emph{variable substitution} function\index{substitution}.
A substitution $\subst{e}{x}{e'}$ stands for the result of running through the syntax of $e$,
replacing every occurrence of variable $x$ with expression $e'$.

\begin{eqnarray*}
  \subst{n}{x}{e} &=& n \\
  \subst{x}{x}{e} &=& e \\
  \subst{y}{x}{e} &=& y \textrm{, when $y \neq x$} \\
  \subst{(e_1 + e_2)}{x}{e} &=& \subst{e_1}{x}{e} + \subst{e_2}{x}{e} \\
  \subst{(e_1 \times e_2)}{x}{e} &=& \subst{e_1}{x}{e} \times \subst{e_2}{x}{e}
\end{eqnarray*}

We can prove a key compatibility property of these two recursive functions.

\begin{theorem}
  For all $e$, $e'$, $x$, and $v$, $\denote{\subst{e}{x}{e'}}{v} = \denote{e}{(\mupd{v}{x}{\denote{e'}{v}})}$.
\end{theorem}

That is, in some sense, the operations of interpretation and substitution \emph{commute} with each other.
That intuition gives rise to the common notion of a \emph{commuting diagram}\index{commuting diagram}, like the one below for this particular example.

\[
\begin{tikzcd}
(e, v) \arrow{r}{\subst{\ldots}{x}{e'}} \arrow{d}{\mupd{\ldots}{x}{\denote{e'}v}} & (\subst{e}{x}{e'}, v) \arrow{d}{\denote{\ldots}} \\
(e, \mupd{v}{x}{\denote{e'}v}) \arrow{r}{\denote{\ldots}} & \denote{\subst{e}{x}{e'}}v
\end{tikzcd}
\]

We start at the top left, with a given expression $e$ and valuation $v$.
The diagram shows the equivalence of \emph{two different paths} to the bottom right.
Each individual arrow is labeled with some description of the transformation it performs, to get from the term at its source to the term at its destination.
The right-then-down path is based on substituting and then interpreting, while the down-then-right path is based on extending the valuation and then interpreting.
Since both paths wind up at the same spot, the diagram indicates an equality between the corresponding terms.

It's a matter of taste whether the theorem statement or the diagram expresses the property more clearly!

\section{A Stack Machine}

As an example of a very different language, consider a \emph{stack machine}\index{stack machine}, similar at some level to, for instance, the Forth\index{Forth} programming language, or to various postfix\index{postfix} calculators.
\encoding
$$\begin{array}{rrcl}
  \textrm{Instructions} & i &::=& \mathsf{PushConst}(n) \mid \mathsf{PushVar}(x) \mid \mathsf{Add} \mid \mathsf{Multiply} \\
  \textrm{Programs} & \overline{i} &::=& \cdot \mid i; \overline{i}
\end{array}$$

Though here we defined an explicit grammar for programs, which are just sequences of instructions, in general we'll use the notation $\overline{X}$ to stand for sequences of $X$'s, and the associated concrete syntax won't be so important.
We also freely use single instructions to stand for programs, writing just $i$ in place of $i; \cdot$.

\newcommand{\push}[2]{#1 \rhd #2}

Each instruction of this language transforms a \emph{stack}\index{stack}, a last-in-first-out list of numbers.
Rather than spend more words on it, here is an interpreter that makes everything precise.
Here and elsewhere, we overload the Oxford brackets $\denote{\ldots}$ shamelessly, where context makes clear which language or interpreter we are dealing with.
We write $s$ for stacks, and we write $\push{n}{s}$ for pushing number $n$ onto the top of stack $s$.

\encoding
\begin{eqnarray*}
  \denote{\mathsf{PushConst}(n)}(v,s) &=& \push{n}{s} \\
  \denote{\mathsf{PushVar}(x)}(v,s) &=& \push{\msel{v}{x}}{s} \\
  \denote{\mathsf{Add}}(v,\push{n_2}{\push{n_1}{s}}) &=& \push{(n_1 + n_2)}{s} \\
  \denote{\mathsf{Multiply}}(v,\push{n_2}{\push{n_1}{s}}) &=& \push{(n_1 \times n_2)}{s}
\end{eqnarray*}

The last two cases require the stack have at least a certain height.
Here we'll ignore what happens when the stack is too short, though it suffices, for our purposes, to add pretty much any default behavior for the missing cases.
We overload $\denote{\overline{i}}$ to refer to the \emph{composition} of the interpretations of the different instructions within $\overline{i}$, in order.

Next, we give our first example of what might be called a \emph{compiler}\index{compiler}, or a translation from one language to another.
Let's compile arithmetic expressions into stack programs, which then become easy to map onto the instructions of common assembly languages.
In that sense, with this translation, we make progress toward efficient implementation on commodity hardware.

\newcommand{\compile}[1]{{\left \lfloor #1 \right \rfloor}}

Throughout this book, we will use notation $\compile{\ldots}$ for compilation, where the floor-based notation suggests \emph{moving downward} to a lower abstraction level.
Here is the compiler that concerns us now, where we write $\concat{\overline{i_1}}{\overline{i_2}}$ for concatenation of two instruction sequences $\overline{i_1}$ and $\overline{i_2}$.
\encoding
\begin{eqnarray*}
  \compile{n} &=& \mathsf{PushConst}(n) \\
  \compile{x} &=& \mathsf{PushVar}(x) \\
  \compile{e_1 + e_2} &=& \concat{\compile{e_1}}{\concat{\compile{e_2}}{\mathsf{Add}}} \\
  \compile{e_1 \times e_2} &=& \concat{\compile{e_1}}{\concat{\compile{e_2}}{\mathsf{Multiply}}}
\end{eqnarray*}

The first two cases are straightforward: their compilations just push the obvious values onto the stack.
The binary operators are just slightly more tricky.
Each first evaluates its operands in order, where each operand leaves its final result on the stack.
With both of them in place, we run the instruction to pop them, combine them, and push the result back onto the stack.

The correctness theorem for compilation must refer to both of our interpreters.
From here on, we consider that all unaccounted-for variables in a theorem statement are quantified universally.

\begin{theorem}
  $\denote{\compile{e}}(v, \cdot) = \denote{e}v$.
\end{theorem}

Here's a restatement as a commuting diagram.

\[
\begin{tikzcd}
e \arrow{r}{\compile{\ldots}} \arrow{dr}{\denote{\ldots}} & \compile{e} \arrow{d}{\denote{\ldots}} \\
& \denote{e}
\end{tikzcd}
\]

As usual, we leave proof details for the associated Coq code, but the key insight of the proof is to strengthen the induction hypothesis via a lemma.

\begin{lemma}
  $\denote{\concat{\compile{e}}{\overline{i}}}(v, s) = \denote{\overline{i}}(v, \push{\denote{e}v}{s})$.
\end{lemma}

We strengthen the statement by considering both an arbitrary initial stack $s$ and a sequence of extra instructions $\overline{i}$ to be run after $e$.

\section{A Simple Higher-Level Imperative Language}

\newcommand{\repet}[2]{\mathsf{repeat} \; #1 \; \mathsf{do} \; #2 \; \mathsf{done}}

The interpreter approach to semantics is usually the most convenient one, when it applies.
Coq requires that all programs terminate, and that requirement is effectively also present in informal math, though it is seldom called out with the same terms.
Instead, with math, we worry about whether recursive systems of equations are well-founded, in appropriate senses.
From either perspective, extra encoding tricks are required to write a well-formed interpreter for a Turing-complete\index{Turing-completeness} language.
We will dodge those complexities for now by defining a simple imperative language with bounded loops, where termination is easy to prove.
We take the arithmetic expression language as a base.
\encoding
$$\begin{array}{rrcl}
  \textrm{Command} & c &::=& \mathsf{skip} \mid x \leftarrow e \mid c; c \mid \repet{e}{c}
\end{array}$$

Now the implicit state, read and written by a command, is a variable valuation, as we used in the interpreter for expressions.
A $\mathsf{skip}$ command does nothing, while $x \leftarrow e$ extends the valuation to map $x$ to the value of expression $e$.
We have simple command sequencing $c_1; c_2$, in addition to the bounded loop $\repet{e}{c}$, which executes $c$ a number of times equal to the value of $e$.

\newcommand{\id}[0]{\mathsf{id}}

To give the semantics, we need a few commonplace notations that are worth reviewing.
We write $\id$ for the identity function\index{identity function}, where $\id(x) = x$; and we write $f \circ g$ for composition of functions\index{composition of functions} $f$ and $g$, where $(f \circ g)(x) = f(g(x))$.
We also have iterated self-composition\index{self-composition}, written like \emph{exponentiation} of functions\index{exponentiation of functions} $f^n$, defined as follows.
\begin{eqnarray*}
  f^0 &=& \id \\
  f^{n+1} &=& f^n \circ f
\end{eqnarray*}

From here, $\denote{\ldots}$ is easy to define yet again, as a transformer over variable valuations.
\encoding
\begin{eqnarray*}
  \denote{\mathsf{skip}}v &=& v \\
  \denote{x \leftarrow e}v &=& \mupd{v}{x}{\denote{e}v} \\
  \denote{c_1; c_2}v &=& \denote{c_2}(\denote{c_1}v) \\
  \denote{\repet{e}{c}}v &=& \denote{c}^{\denote{e}v}(v)
\end{eqnarray*}

To put this semantics through a workout, let's consider a simple \emph{optimization}\index{optimization}, a transformation whose input and output programs are in the same language.
There's an additional, fuzzier criterion for an optimization, which is that it should improve the program somehow, usually in terms of running time, memory usage, etc.
The optimization we choose here may be a bit dubious in that respect, though it is related to an optimization found in every serious C\index{C programming language} compiler.

In particular, let's tackle \emph{loop unrolling}\index{loop unrolling}.
When the iteration count of a loop is a constant $n$, we can replace the loop with $n$ sequenced copies of its body.
C compilers need to work harder to find the iteration count of a loop, but luckily our language includes loops with very explicit iteration counts!
To define the transformation, we'll want a recursive function and notation for sequencing of $n$ copies of a command $c$, written $^nc$.
\begin{eqnarray*}
  ^0c &=& \mathsf{skip} \\
  ^{n+1}c &=& c; {^nc}
\end{eqnarray*}

\newcommand{\opt}[1]{{\left | #1 \right |}}

Now the optimization itself is easy to define.
We'll write $\opt{\ldots}$ for this and other optimizations, which move neither down nor up a tower of program abstraction levels.
\encoding
\begin{eqnarray*}
  \opt{\mathsf{skip}} &=& \mathsf{skip} \\
  \opt{x \leftarrow e} &=& x \leftarrow e \\
  \opt{c_1; c_2} &=& \opt{c_1}; \opt{c_2} \\
  \opt{\repet{n}{c}} &=& ^n\opt{c} \\
  \opt{\repet{e}{c}} &=& \repet{e}{\opt{c}}
\end{eqnarray*}

Note that, when multiple defining equations apply to some function input, by convention we apply the \emph{earliest} equation that matches.

Let's prove that this optimization preserves program behavior; that is, we prove that it is \emph{semantics preserving}\index{semantics preservation}.

\begin{theorem}\label{unroll}
  $\denote{\opt{c}}v = \denote{c}v$.
\end{theorem}

It all looks so straightforward from that statement, doesn't it?
Indeed, there actually isn't so much work to do to prove this theorem.
We can also present it as a commuting diagram much like the prior one.

\[
\begin{tikzcd}
c \arrow{r}{\opt{\ldots}} \arrow{dr}{\denote{\ldots}} & \opt{c} \arrow{d}{\denote{\ldots}} \\
& \denote{c}
\end{tikzcd}
\]

The statement of Theorem \ref{unroll} happens to already be in the right form to do induction directly, but we need a helper lemma, capturing the interaction of $^nc$ and the semantics.

\begin{lemma}
  $\denote{^nc} = \denote{c}^n$.
\end{lemma}

Let us end the chapter with the commuting-diagram version of the lemma statement.

\[
\begin{tikzcd}
c \arrow{r}{^n\ldots} \arrow{d}{\denote{\ldots}} & ^nc \arrow{d}{\denote{\ldots}} \\
\denote{c} \arrow{r}{\ldots^n} & \denote{c}^n
\end{tikzcd}
\]


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Inductive Relations and Rule Induction}\label{rule_induction}

We should pause here to consider another crucial mathematical tool that is not in common use outside the study of semantics but which will be essential for almost all language semantics we define from here on.
That tool is similar to the inductive \emph{set} or \emph{type} definitions we met in Chapter \ref{syntax}.
However, now we define \emph{relations}\index{inductive relations} (and \emph{predicates}\index{inductive predicates}, the colloquial name for single-argument relations) inductively.
Let us take some time to work through simple examples before moving on to cases more relevant to semantics.

\section{Finite Sets as Inductive Predicates}

\newcommand{\favs}[1]{\mathsf{favorites}(#1)}

Any finite set is easy to define as a predicate, with a set of inference rules that include no premises.
For instance, say my favorite numbers are 17, 23, and 42.
We can define a predicate $\mathsf{favorites}$ as follows.
$$\infer{\favs{17}}{}
\quad \infer{\favs{23}}{}
\quad \infer{\favs{42}}{}$$

As we defined inductive sets as the \emph{smallest} sets satisfying given inference rules, we now define inductive predicates as the \emph{least} predicates satisfying the rules.
The rules given here require acceptance of 17, 23, and 42 and no more, so those are exactly the values accepted by the predicate.

Any inductive predicate definition has an associated induction principle\index{inductive principle}, which we can derive much as we derived induction principles in Chapter \ref{syntax}.
Specifically, when we defined $P$ inductively and want to conclude $Q$ from it, we want to prove $\forall x. \; P(x) \Rightarrow Q(x)$.
We transform each rule into one obligation within the induction, by replacing $P$ with $Q$ in the conclusion, in addition to taking each premise $P(e)$ and pairing it with a new premise $Q(e)$ (an \emph{induction hypothesis}\index{induction hypothesis}).

Our example of $\mathsf{favorites}$ is a degenerate inductive definition whose principle requires no induction hypotheses.  Thus, to prove $\forall x. \; \favs{x} \Rightarrow Q(x)$, we must establish the following.
$$\infer{Q(17)}{}
\quad \infer{Q(23)}{}
\quad \infer{Q(42)}{}$$

That is, to prove that a predicate holds of all elements of a finite set, it suffices to check the predicate for each element.
In general, induction on proofs of relations is called \emph{rule induction}\index{rule induction}.
A simple example:

\begin{theorem}
  All of my favorite numbers are below 50, i.e. $\forall x. \; \favs{x} \Rightarrow x < 50$.
\end{theorem}
\begin{proof}
  By induction on the proof of $\favs{x}$, i.e. with $Q(x) = x < 50$.
\end{proof}

Note how it is quite natural to see rule induction as induction on proof trees, as if they were just any other tree data structure!
Indeed, data and proofs are unified in Coq's mathematical foundation.

\section{Transitive Closure of Relations}

Let $R$ be some binary relation.  We define its \emph{transitive closure}\index{transitive closure} $R^+$ by:
$$\infer{x \; R^+ \; y}{
  x \; R \; y
}
\quad \infer{x \; R^+ z}{
  x \; R^+ \; y
  & y \; R^+ \; z
}$$

That is, $R^+$ is the \emph{least} relation satisfying these rules.
It should accept argument pairs only if the rules force them to be accepted.
This relation happens to be the least that both contains $R$ and is transitive.
The second rule forces $R^+$ to be transitive quite explicitly.

We apply our recipe (changing rule conclusions and adding new inductive hypotheses) to find the induction principle for $R^+$.
To prove $\forall x, y. \; x \; R^+ \; y \Rightarrow Q(x, y)$, we must show:
$$\infer{Q(x, y)}{
  x \; R \; y
}
\quad \infer{Q(x, z)}{
  x \; R^+ \; y
  & Q(x, y)
  & y \; R^+ \; z
  & Q(y, z)
}$$

As an example with transitive closure, consider an alternative definition of ``less-than'' for natural numbers.
Let $\prec$ be the relation defined to accept $x$ and $y$ only when $x + 1 = y$.
Now $\prec^+$ means ``less-than,'' i.e. the second argument is reachable from the first by some finite number of increments.
We can make the connection formal.

\begin{theorem}
  If $x \prec^+ y$, then $x < y$.
\end{theorem}
\begin{proof}
  By induction on the proof of $x \prec^+ y$, e.g. with $Q(x, y) = x < y$, considering cases for the two rules defining transitive closure.
\end{proof}

\begin{lemma}\label{lt_lt''}
  For any $n$ and $k$, $n \prec^+ (1+k) + n$.
\end{lemma}
\begin{proof}
  By induction on $k$, using the first rule of transitive closure in the base case and discharging the inductive case using transitivity via $k + n$.
\end{proof}

\begin{theorem}
  If $n < m$, then $n \prec^+ m$.
\end{theorem}
\begin{proof}
  By Lemma \ref{lt_lt''}, with $k = m - n - 1$.
\end{proof}

Another manageable exercise is showing logical equivalence of $R^+$ and $R^{++}$ for any $R$, which requires rule induction in both directions of the proof.

\section{Permutations}

It may not be the most intuitively obvious formulation, but we can use an inductive relation to explain when one list is a permutation\index{permutations} of another, written here as infix relation $\sim$.
$$\infer{[] \sim []}{}
\quad \infer{\push{x}{\ell_1} \sim \push{x}{\ell_2}}{
  \ell_1 \sim \ell_2
}
\quad \infer{\push{y}{\push{x}{\ell}} \sim \push{x}{\push{y}{\ell}}}{}
\quad \infer{\ell \sim \ell''}{
  \ell \sim \ell'
  & \ell' \sim \ell''
}$$

We apply the usual recipe to derive its induction principle, showing $\forall \ell, \ell'. \; \ell \sim \ell' \Rightarrow Q(\ell, \ell')$:
$$\infer{Q([], [])}{}
\quad \infer{Q(\push{x}{\ell_1}, \push{x}{\ell_2})}{
  \ell_1 \sim \ell_2
  & Q(\ell_1, \ell_2)
}
\quad \infer{Q(\push{y}{\push{x}{\ell}}, \push{x}{\push{y}{\ell}})}{}
\quad \infer{Q(\ell, \ell'')}{
  \ell \sim \ell'
  & Q(\ell, \ell')
  & \ell' \sim \ell''
  & Q(\ell', \ell'')
}$$

A number of sensible algebraic properties now follow.

\begin{lemma}\label{Permutation_to_front}
  $\push{a}{\ell} \sim \concat{\ell}{[a]}$.
\end{lemma}
\begin{proof}
  By induction on $\ell$, with semi-intricate little combinations of the rules of $\sim$ in each case.
\end{proof}

\begin{theorem}
  $\ell \sim \mathsf{reverse}(\ell)$.
\end{theorem}
\begin{proof}
  By induction on $\ell$, with appeal to Lemma \ref{Permutation_to_front} in the inductive case.
\end{proof}

\begin{theorem}
  If $\ell_1 \sim \ell_2$, then $|\ell_1| = |\ell_2|$.
\end{theorem}
\begin{proof}
  By induction on the proof of $\ell_1 \sim \ell_2$, with each case falling neatly into the decidable theory of equality with uninterpreted functions.
\end{proof}

\begin{lemma}\label{Permutation_refl}
  $\ell \sim \ell$.
\end{lemma}
\begin{proof}
  By induction on $\ell$.
\end{proof}
  
\begin{lemma}\label{Permutation_app1}
  If $\ell_1 \sim \ell_2$, then $\concat{\ell_1}{\ell} \sim \concat{\ell_2}{\ell}$.
\end{lemma}
\begin{proof}
  By induction on the proof of $\ell_1 \sim \ell_2$, appealing to Lemma \ref{Permutation_refl} in the first case.
\end{proof}

\begin{lemma}\label{Permutation_app2}
  If $\ell_1 \sim \ell_2$, then $\concat{\ell}{\ell_1} \sim \concat{\ell}{\ell_2}$.
\end{lemma}
\begin{proof}
  By induction on $\ell$.
\end{proof}

\begin{theorem}
  If $\ell_1 \sim \ell'_1$ and $\ell_2 \sim \ell'_2$, then $\concat{\ell_1}{\ell_2} \sim \concat{\ell'_1}{\ell'_2}$.
\end{theorem}
\begin{proof}
  Combine Lemmas \ref{Permutation_app1} and \ref{Permutation_app2} using the transitivity rule of $\sim$.
\end{proof}

\section{A Minimal Propositional Logic}

Though we are mostly concerned with programming languages in this book, the same techniques are also often applied to formal languages of logical formulas.
In fact, there are strong connections between programming languages and logics, with many dual techniques across the sides.
We won't really dip our toes into those connections, but we can use the example of propositional logic\index{propositional logic} here, to get a taste of formalizing logics, at the same time as we practice with inductive relations.

As a warmup for reasoning about syntax trees of logical formulas, consider this basic language. 
$$\begin{array}{rrcl}
  \textrm{Formula} & \phi &::=& \top \mid \bot \mid \phi \land \phi \mid \phi \lor \phi
\end{array}$$

The constructors are, respectively: truth, falsehood, conjunction (``and''), and disjunction (``or'').

A simple inductive definition of a predicate $\vdash$ (for ``proves'') suffices to explain the meanings of the connectives.
$$\infer{\vdash \top}{}
\quad \infer{\vdash \phi_1 \land \phi_2}{
  \vdash \phi_1
  & \vdash \phi_2
}
\quad \infer{\vdash \phi_1 \lor \phi_2}{
  \vdash \phi_1
}
\quad \infer{\vdash \phi_1 \lor \phi_2}{
  \vdash \phi_2
}$$

That is, truth is obviously true, falsehood can't be proved, proving a conjunction requires proving both conjuncts, and proving a disjunction requires proving one disjunct or the other (expressed with two separate rules).

A simple interpreter also does the trick to explain this language.
\begin{eqnarray*}
  \denote{\top} &=& \top \\
  \denote{\bot} &=& \bot \\
  \denote{\phi_1 \land \phi_2} &=& \denote{\phi_1} \land \denote{\phi_2} \\
  \denote{\phi_1 \lor \phi_2} &=& \denote{\phi_1} \lor \denote{\phi_2}
\end{eqnarray*}
Each syntactic connective is explained in terms of the usual semantic connective in our metalanguage.
In the formal study of logic, this style of semantics is often associated with \emph{model theory}\index{model theory} and a definition via an inductive relation with \emph{proof theory}\index{proof theory}.

It is good to establish that the two formulations agree, and the two directions of logical equivalence have traditional names.

\begin{theorem}[Soundness of the inductive predicate]\index{soundness of a logic}
  If $\vdash p$, then $\denote{p}$.
\end{theorem}
\begin{proof}
  By induction on the proof of $\vdash p$, where each case then follows by the rules of propositional logic in the metalanguage (a decidable theory).
\end{proof}

\begin{theorem}[Completeness of the inductive predicate]\index{completeness of a logic}
  If $\denote{p}$, then $\vdash p$.
\end{theorem}
\begin{proof}
  By induction on $p$, combining the rules of $\vdash$ with propositional logic in the metalanguage.
\end{proof}

\section{Propositional Logic with Implication}

Extending to the rest of traditional propositional logic requires us to add a \emph{hypothesis context} to our judgment, mirroring a pattern we will see later in studying type systems (Chapter \ref{types}).
Our formula language is extended as follows.
$$\begin{array}{rrcl}
  \underline{\textrm{Variables}} & p \\
  \textrm{Formula} & \phi &::=& \top \mid \bot \mid \phi \land \phi \mid \phi \lor \phi \mid \underline{p \mid \phi \Rightarrow \phi}
\end{array}$$
Note how we add propositional variables $p$, which stand for unknown truth values.

What is fundamentally harder about modeling implication?
We need to perform \emph{hypothetical reasoning}\index{hypothetical reasoning}, trying to prove a formula with other formulas available as known facts.
It is natural to build up a list of hypotheses, as an input to the $\vdash$ predicate.
By convention, we use metavariable $\Gamma$ (Greek capital gamma) for such a list and call it a \emph{context}.
First, the context is threaded through the rules we already presented, which are called \emph{introduction rules}\index{introduction rules} because each explains how to prove a fact using a specific connective (introducing that connective to the proof).
$$\infer{\Gamma \vdash \top}{}
\quad \infer{\Gamma \vdash \phi_1 \land \phi_2}{
  \Gamma \vdash \phi_1
  & \Gamma \vdash \phi_2
}
\quad \infer{\Gamma \vdash \phi_1 \lor \phi_2}{
  \Gamma \vdash \phi_1
}
\quad \infer{\Gamma \vdash \phi_1 \lor \phi_2}{
  \Gamma \vdash \phi_2
}$$

Next, we have \emph{modus ponens}\index{mous ponens}, the classic rule for applying an implication, an example of an \emph{elimination rule}\index{elimination rules}, which explains how to take advantage of a fact using a specific connective.
$$\infer{\Gamma \vdash \phi_2}{
  \Gamma \vdash \phi_1 \Rightarrow \phi_2
  & \Gamma \vdash \phi_1
}$$

A \emph{hypothesis rule} gives the fundamental way of taking advantage of $\Gamma$'s contents.
$$\infer{\Gamma \vdash \phi}{
  \phi \in \Gamma
}$$

The introduction rule for implication is interesting for adding a new hypothesis to the context.
$$\infer{\Gamma \vdash \phi_1 \Rightarrow \phi_2}{
  \push{\phi_1}{\Gamma} \vdash \phi_2
}$$

Most of the remaining connectives have elimination rules, too.
The simplest case is conjunction, with rules pulling out the conjuncts.
$$\infer{\Gamma \vdash \phi_1}{
  \Gamma \vdash \phi_1 \land \phi_2
}
\quad \infer{\Gamma \vdash \phi_2}{
  \Gamma \vdash \phi_1 \land \phi_2
}$$

We eliminate a disjunction using reasoning by cases, extending the context appropriately in each case.
$$\infer{\Gamma \vdash \phi}{
  \Gamma \vdash \phi_1 \lor \phi_2
  & \push{\phi_1}{\Gamma} \vdash \phi
  & \push{\phi_2}{\Gamma} \vdash \phi
}$$

If we manage to prove falsehood, we have a contradiction, and any conclusion follows.
$$\infer{\Gamma \vdash \phi}{
  \Gamma \vdash \bot
}$$

Finally, the somewhat-controversial law of the excluded middle\index{law of the excluded middle}, which actually does not hold in general in Coq, though many developments postulate it as an extra axiom (which we take advantage of in our own Coq proof here).
We write negation $\neg \phi$ as shorthand for $\phi \Rightarrow \bot$.
$$\infer{\Gamma \vdash \phi \lor \neg \phi}{}$$

This style of inductive relation definition is called \emph{natural deduction}\index{natural deduction}.
We write $\vdash \phi$ as shorthand for $\cdot \vdash \phi$, for an empty context.

Note the fundamental new twist introduced compared to last section's language: it is no longer the case that the top-level connective of the goal formula gives us a small set of connective-specific rules that are the only we need to consider applying.
Instead, we may need to combine the hypothesis rule with elimination rules, taking advantage of assumptions.
The power of inductive relation definitions is clearer here, since we couldn't simply use a recursive function over formulas to express that kind of pattern explicitly.

A simple interpreter sets the stage for proving soundness and completeness.
The most important extension to the interpreter is that it now takes in a valuation $v$, just like in the previous chapter, though now the valuation maps variables to truth values, not numbers.
\begin{eqnarray*}
  \denote{p}v &=& \msel{v}{p} \\
  \denote{\top}v &=& \top \\
  \denote{\bot}v &=& \bot \\
  \denote{\phi_1 \land \phi_2}v &=& \denote{\phi_1}v \land \denote{\phi_2}v \\
  \denote{\phi_1 \lor \phi_2}v &=& \denote{\phi_1}v \lor \denote{\phi_2}v \\
  \denote{\phi_1 \Rightarrow \phi_2}v &=& \denote{\phi_1}v \Rightarrow \denote{\phi_2}v
\end{eqnarray*}

To indicate that $\phi$ is a \emph{tautology}\index{tautology} (that is, true under any values of the variables), we write $\denote{\phi}$, as a kind of abuse of notation expanding to $\forall v. \; \denote{\phi}v$.

\begin{theorem}[Soundness]
  If $\vdash \phi$, then $\denote{\phi}$.
\end{theorem}
\begin{proof}
  By appeal to Lemma \ref{valid_interp'}.
\end{proof}

\begin{lemma}\label{valid_interp'}
  If $\Gamma \vdash \phi$, and if we have $\denote{\phi'}v$ for every $\phi' \in \Gamma$, then $\denote{\phi}v$.
\end{lemma}
\begin{proof}
  By induction on the proof of $\Gamma \vdash \phi$, using propositional logic in the metalanguage to plumb together the case proofs.
\end{proof}

The other direction, completeness, is quite a bit more involved, and indeed its Coq proof strays outside the range of what is reasonable to ask students to construct at this point in the book, but it makes for an interesting exercise.
The basic idea is to do a proof by exhaustive case analysis over the truth values of all propositional variables $p$ that appear in a formula.

\begin{theorem}[Completeness]
  If $\denote{\phi}$, then $\vdash \phi$.
\end{theorem}
\begin{proof}
  By appeal to Lemma \ref{interp_valid'}.
\end{proof}

Say that a context $\Gamma$ and a valuation $v$ are \emph{compatible} if they agree on the truth of any variable included in both.
That is, when $p \in \Gamma$, we have $v(p) = \top$; and when $\neg p \in \Gamma$, we have $v(p) = \bot$.

\begin{lemma}\label{interp_valid'}
  Given context $\Gamma$ and formula $\phi$, if
  \begin{itemize}
  \item there is no variable $p$ such that both $p \in \Gamma$ and $\neg p \in \Gamma$, and
  \item for any valuation $v$ compatible with $\Gamma$, we have $\denote{\phi}v$,
  \end{itemize}
  then $\Gamma \vdash \phi$.
\end{lemma}
\begin{proof}
  By induction on the set of variables $p$ appearing in $\phi$ such that neither $p \in \Gamma$ nor $\neg p \in \Gamma$.
  If that set is empty, we appeal directly to Lemma \ref{interp_valid''}.
  Otherwise, choose some variable $p$ in $\phi$ that hasn't yet been assigned a truth value in $\Gamma$.
  Combine excluded middle on $p$ with the $\lor$ elimination rule of $\vdash$ to do a case split, so that it now suffices to prove both $\push{p}{\Gamma} \vdash \phi$ and $\push{\neg p}{\Gamma} \vdash \phi$.
  Each case can be proved by direct appeal to the induction hypothesis, since assigning $p$ a truth value in $\Gamma$ shrinks the set we induct over.
\end{proof}

We write $\denote{\phi}\Gamma$ to denote interpreting $\phi$ in a valuation that assigns $\top$ to exactly those variables that appear directly in $\Gamma$.

\begin{lemma}\label{interp_valid''}
  Given context $\Gamma$ and formula $\phi$, if
  \begin{itemize}
  \item for every variable $p$ appearing in $\phi$, we have either $p \in \Gamma$ or $\neg p \in \Gamma$; and
  \item there is no variable $p$ such that both $p \in \Gamma$ and $\neg p \in \Gamma$
  \end{itemize}
  then if $\denote{\phi}\Gamma$, then $\Gamma \vdash \phi$, otherwise $\Gamma \vdash \neg \phi$.
\end{lemma}
\begin{proof}
  By induction on $\phi$, with tedious combination of propositional logic in the metalanguage and in the rules of $\vdash$.
  Inductive cases make several appeals to Lemma \ref{valid_weaken}, and it is important that the base case for variables $p$ is able to assume that either $p$ or $\neg p$ appears in $\Gamma$.
\end{proof}

\begin{lemma}[Weakening]\label{valid_weaken}\index{weakening}
  If $\Gamma \vdash \phi$, and $\Gamma'$ includes a superset of the formulas from $\Gamma$, then $\Gamma' \vdash \phi$.
\end{lemma}
\begin{proof}
  By induction on the proof of $\Gamma \vdash \phi$.
\end{proof}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Transition Systems and Invariants}

For simple programming languages where programs always terminate, it is often most convenient to formalize them using interpreters, as in Chapter \ref{interpreters}.
However, many important languages don't fall into that category, and for them we need different techniques.
Nontermination isn't always a bug; for instance, we expect a network server to run indefinitely.
We still need to be able to talk about the correct behavior of programs that run forever, by design.
For that reason, in this chapter and in most of the rest of the book, we model programs using relations, in much the same way that may be familiar from automata theory\index{automata theory}.
An important difference, though, is that, while undergraduate automata-theory classes generally study \emph{finite-state machines}\index{finite-state machines}, for general program reasoning we want to allow infinite sets of states, otherwise referred to as \emph{infinite-state systems}\index{infinite-state systems}.

Let's start with an example that almost seems too mundane to be associated with such terms.

\section{Factorial as a State Machine}

We're familiar with the factorial operation, implemented as an imperative program with a loop.
\begin{verbatim}
factorial(n) {
  a = 1;
  while (n > 0) {
    a = a * n;
    n = n - 1;
  }
  return a;
}
\end{verbatim}

In the analysis to follow, consider some value $n_0 \in \mathbb N$ fixed, as the input passed to this operation.
A state machine is lurking within the surface syntax of the program.
\encoding
In fact, we have a variety of choices in modeling it as a state machine.
Here is the set of states that we choose to use here:
$$\begin{array}{rrcl}
  \textrm{Natural numbers} & n &\in& \mathbb N \\
  \textrm{States} & s &::=& \mathsf{AnswerIs}(n) \mid \mathsf{WithAccumulator}(n, n)
\end{array}$$

There are two types of states.
An $\mathsf{AnswerIs}(a)$ state corresponds to the \texttt{return} statement.
It records the final result $a$ of the factorial operation.
A $\mathsf{WithAccumulator}(n, a)$ records an intermediate state, giving the values of the two local variables, just before a loop iteration begins.

Following the more familiar parts of automata theory, let's define a set of \emph{initial states}\index{initial state} for this machine.
$$\infer{\mathsf{WithAccumulator}(n_0, 1) \in \mathcal F_0}{}$$
For consistency with the notation we will be using later, we define the set $\mathcal F_0$ using an inference rule.
Equivalently, we could just write $\mathcal F_0 = \{\mathsf{WithAccumulator}(n_0, 1)\}$, essentially reading off the initial variable values from the first lines of the code above.

Similarly, we also define a set of \emph{final states}\index{final state}.
$$\infer{\mathsf{AnswerIs}(a) \in \mathcal F_\omega}{}$$
Equivalently: $\mathcal F_\omega = \{\mathsf{AnswerIs}(a) \mid a \in \mathbb N\}$.
Note that this definition only captures when the program is \emph{done}, not when it \emph{returns the right answer}.
It follows from the last line of the code.

The last and most important ingredient of our state machine is its \emph{transition relation}, where we write $s \to s'$ to indicate that state $s$ advances to state $s'$ in one step, following the semantics of the program.
Here inference rules are more obviously a good fit.
$$\infer{\mathsf{WithAccumulator}(0, a) \to \mathsf{AnswerIs}(a)}{}$$
$$\infer{\mathsf{WithAccumulator}(n+1, a) \to \mathsf{WithAccumulator}(n, a \times (n+1))}{}$$
The first rule corresponds to the case where the program ends, because the loop test has failed and we now know the final answer.
The second rule corresponds to going once around the loop, following directly from the code in the loop body.

We can fit these ingredients into the general concept of a \emph{transition system}\index{transition system}, the term we will use throughout this book for this sort of state machine.
Actually, the words ``state machine'' suggest to many people that the state set must be finite, hence our preference for ``transition system,'' which is also used fairly frequently in semantics.

\newcommand{\angled}[1]{{\left \langle #1 \right \rangle}}

\begin{definition}
  A \emph{transition system} is a triple $\angled{S, S_0, \to}$, with $S$ a set of states, $S_0 \subseteq S$ a set of initial states, and $\to \; \subseteq S \times S$ a transition relation.
\end{definition}

For an arbitrary transition relation $\to$, not just the one defined above for factorial, we define its \emph{transitive-reflexive closure}\index{transitive-reflexive closure} $\to^*$ with two inference rules:
$$\infer{s \to^* s}{}
\quad \infer{s \to^* s''}{
  s \to s'
  & s' \to^* s''
}$$
That is, a formal claim $s \to^* s'$ corresponds exactly to the informal claim that ``starting from state $s$, we can reach state $s'$.''

\begin{definition}
  For transition system $\angled{S, S_0, \to}$, we say that a state $s$ is \emph{reachable} if and only if there exists $s_0 \in S_0$ such that $s_0 \to^* s$.
\end{definition}

Building on these notations, here is one way to state the correctness of our factorial program, which, defining $S$ according to the state grammar above, we model as $\mathcal F = \angled{S, \mathcal F_0, \to}$.

\begin{theorem}\label{factorial_ok}
  For any state $s$ reachable in $\mathcal F$, if $s \in \mathcal F_\omega$, then $s = \mathsf{AnswerIs}(n_0!)$.
\end{theorem}

That is, whenever the program finishes, it returns the right answer.
(Recall that $n_0$ is the initial value of the input variable.)

We could prove this theorem now in a relatively ad-hoc way.
Instead, let's develop the general machinery of \emph{invariants}.


\section{Invariants}

The concept of ``invariant'' may be familiar from such relatively informal notions as ``loop invariant''\index{loop invariant} in introductory programming classes.
Intuitively, an invariant is a property of program state that \emph{starts true and stays true}, but let's make that idea a bit more formal, as applied to our transition-system formalism.

\newcommand{\invariants}[0]{\marginpar{\fbox{\textbf{Invariants}}}}

\invariants
\begin{definition}
  An \emph{invariant} of a transition system is a property that is always true, in all of the system's reachable states.  That is, for transition system $\angled{S, S_0, \to}$, where $R$ is the set of all its reachable states, some $I \subseteq S$ is an invariant iff $R \subseteq I$.  (Note that here we adopt the mathematical convention that ``properties'' of states and ``sets'' of states are synonymous, so that in each case we can use what terminology seems most natural.  The ``property'' holds of exactly those states that belong to the ``set.'')
\end{definition}

At first look, the definition may appear a bit silly.
Why not always just take the reachable states $R$ as the invariant, instead of scrambling to invent something new?
The reason is the same as for strengthening induction hypotheses to make proofs easier.
Often it is easier to characterize an invariant that isn't fully precise, admitting some states that the system can never actually reach.
Additionally, it can be easier to prove existence of an approximate invariant by induction, by the method that the next key theorem formalizes.

\begin{theorem}\label{invariant_induction}
  Consider a transition system $\angled{S, S_0, \to}$ and its candidate invariant $I$.  The candidate is truly an invariant if (1) $S_0 \subseteq I$ and (2) for every $s \in I$ where $s \to s'$, we also have $s' \in I$.
\end{theorem}

That's enough generalities for now.
Let's define a suitable invariant for factorial.
\invariants
\begin{eqnarray*}
  I(\mathsf{AnswerIs}(a)) &=& n_0! = a \\
  I(\mathsf{WithAccumulator}(n, a)) &=& n_0! = n! \times a
\end{eqnarray*}

It is an almost-routine exercise to prove that $I$ really is an invariant, using Theorem \ref{invariant_induction}.
The key new ingredient we need is \emph{inversion}, a principle for deducing which inference rules may have been used to prove a fact.

For instance, at one point in the proof, we need to draw a conclusion from a premise $s \in \mathcal F_0$, meaning that $s$ is an initial state.
By inversion, because set $\mathcal F_0$ is defined by a single inference rule, that rule must have been used to conclude the premise, so it must be that $s = \mathsf{WithAccumulator}(n_0, 1)$.

Similarly, at another point in the proof, we must reason from a premise $s \to s'$.
The relation $\to$ is defined by two inference rules, so inversion leads us to two cases to consider.
In the first case, corresponding to the first rule, $s = \mathsf{WithAccumulator}(0, a)$ and $s' = \mathsf{AnswerIs}(a)$.
In the second case, corresponding to the second rule, $s = \mathsf{WithAccumulator}(n+1, a)$ and $s' = \mathsf{WithAccumulator}(n, a \times (n+1))$.
It's worth checking that these values of $s$ and $s'$ are read off directly from the rules.

Though a completely formal and exhaustive treatment of inversion is beyond the scope of this text, generally it follows standard intuitions about ``reverse-engineering'' a set of rules that could have been used to derive some premise.

Another important property of invariants formalizes the connection with weakening an induction hypothesis.

\begin{theorem}\label{invariant_weaken}
  If $I$ is an invariant of a transition system, then $I' \supseteq I$ (a superset of the original) is also an invariant of the same system.
\end{theorem}

Note that the larger $I'$ above may not be suitable to use in an inductive proof by Theorem \ref{invariant_induction}!
For instance, for factorial, we might define $I' = \mathcal \{\mathsf{AnswerIs}(n_0!)\} \cup \{\mathsf{WithAccumulator}(n, a) \mid n, a \in \mathbb N\}$, clearly a superset of $I$.
However, by forgetting everything that we know about intermediate $\mathsf{WithAccumulator}$ states, we will get stuck on the inductive step of the proof.
Thus, what we call invariants here needn't also be \emph{inductive invariants}\index{inductive invariants}, and there may be slight terminology mismatches with other sources.

Combining Theorems \ref{invariant_induction} and \ref{invariant_weaken}, it is now easy to prove Theorem \ref{factorial_ok}, establishing the correctness of our particular factorial system $\mathcal F$.
First, we use Theorem \ref{invariant_induction} to deduce that $I$ is an invariant of $\mathcal F$.
Then, we choose the very same $I'$ that we warned above is not an inductive invariant, but which is fairly easily shown to be a superset of $I$.
Therefore, by Theorem \ref{invariant_weaken}, $I'$ is also an invariant of $\mathcal F$, and Theorem \ref{factorial_ok} follows quite directly from that fact, as $I'$ is essentially a restatement of Theorem \ref{factorial_ok}.

\section{Rule Induction Applied}

Recall the technique of rule induction\index{rule induction} from last chapter.
We made stealthy use of it in the elided proof of Theorem \ref{invariant_induction}, and looking more into the details can give us more practice with rule induction.
Consider again the definition of transitive-reflexive closure by inference rules (which is very close to the definition of transitive closure from last chapter).
$$\infer{s \to^* s}{}
\quad \infer{s \to^* s''}{
  s \to s'
  & s' \to^* s''
}$$

The relation $\to^*$ is a subset of $S \times S$.
Imagine that we want to prove that some relation $P$ holds of all pairs of states, where the first can reach the second.
That is, we want to prove $\forall s, s'. \; (s \to^* s') \Rightarrow P(s, s')$, where $\Rightarrow$ is logical implication.
Following last chapter's recipe, we can derive a suitable induction principle.
We modify each defining rule of $\to^*$, replacing its conclusion with a use of $P$ and adding a $P$ induction hypothesis for each recursive premise.
$$\infer{P(s, s)}{}
\quad \infer{P(s, s'')}{
  s \to s'
  & s' \to^* s''
  & P(s', s'')
}$$
As before, where the defining rules of $\to^*$ show us how to \emph{conclude} facts, the two new rules here are \emph{proof obligations}.
To apply rule induction and establish $P$ for all reachability pairs, we must prove that each new rule is correct, as a kind of quantified implication.

As a simpler example than the invariant-induction theorem, consider transitivity for reachability.

\begin{theorem}
  If $s \to^* s'$ and $s' \to^* s''$, then $s \to^* s''$.
\end{theorem}
\begin{proof}
  By rule induction on the derivation of $s \to^* s'$, taking $P(s_1, s_2)$ to be that, if $s_2 = s'$, then $s_1 \to^* s''$.  We consider variables $s'$ and $s''$ fixed throughout the induction, along with their associated premise $s' \to^* s''$.

  \emph{Base case:} We must show $P(s, s)$ for an arbitrary $s$.  Given that (based on the definition of $P$) we may assume $s = s'$, our premise $s' \to^* s''$ precisely matches the desired conclusion $s \to^* s''$.

  \emph{Induction step:} Assume $s \to s_1$, $s_1 \to^* s'$, and $P(s_1, s')$.  We may apply the second rule defining $\to^*$, whose two premises become $s \to s_1$ and $s_1 \to^* s''$.  The first is one of the available premises of the induction step.  The second follows by the induction hypothesis about $P$.
\end{proof}

This sort of proof really is easier to follow in Coq code, so we especially encourage the reader to consult the mechanized version here!


\section{An Example with a Concurrent Program}

Imagine that we want to verify a multithreaded\index{multithreaded programs}, shared-memory program\index{shared-memory programming} where multiple threads run this code at once.
\begin{verbatim}
f() {
  lock();
  local = global;
  global = local + 1;
  unlock();
}
\end{verbatim}

Consider \texttt{global} as a variable shared across all threads, while each thread has its own version of variable \texttt{local}.
The meaning of \texttt{lock()} and \texttt{unlock()} is as usual\index{locks}, where at most one thread can hold the lock at once, claiming it via \texttt{lock()} and relinquishing it via \texttt{unlock()}.
When variable \texttt{global} is initialized to 0 and $n$ threads run this code at once and all terminate, we expect that \texttt{global} finishes with value $n$.
Of course, bugs in this program, like forgetting to include the locking, could lead to all sorts of wrong answers, with any value between 1 and $n$ possible with the right demonic thread interleaving.

\encoding
To prove that we got the program right, let's formalize it as a transition system.  First, our state set:
$$\begin{array}{rrcl}
  \textrm{States} & P &::=& \mathsf{Lock} \mid \mathsf{Read} \mid \mathsf{Write}(n) \mid \mathsf{Unlock} \mid \mathsf{Done}
\end{array}$$

Compared to the last example, here we see more clearly that kinds of states correspond to \emph{program counters}\index{program counters} in the imperative code.
The first four state kinds respectively mean that the program counter is right before the matching line in the program's code.
The last state kind means the program counter is past the end of the function.
Only $\mathsf{Write}$ states carry extra information, in this case the value of variable \texttt{local}.
At every other program counter, we can prove that the value of variable \texttt{local} has no effect on further transitions, so we don't bother to store it.
We will account for the value of variable \texttt{global} separately, in a way to be described shortly.

In particular, we will define a transition system for a single thread as $\mathcal L = \angled{(\mathbb N \times \mathbb B) \times P, \mathcal L_0, \to_{\mathcal L}}$.
We define the state to include not only the thread-local state $P$ but also the value of \texttt{global} (in $\mathbb N$) and whether the lock is currently taken (in $\mathbb B$, the Booleans, with values $\top$ [true] and $\bot$ [false]).
There is one designated initial state.

$$\infer{((0, \bot), \mathsf{Lock}) \in \mathcal L_0}{}$$

Four inference rules explain the four transitions between program counters that a single thread can make, reading and writing shared state as needed.

$$\infer{((g, \bot), \mathsf{Lock}) \to_{\mathcal L} ((g, \top), \mathsf{Read})}{}
\quad \infer{((g, \ell), \mathsf{Read}) \to_{\mathcal L} ((g, \ell), \mathsf{Write}(g))}{}$$
$$\infer{((g, \ell), \mathsf{Write}(n)) \to_{\mathcal L} ((n+1, \ell), \mathsf{Unlock})}{}
\quad \infer{((g, \ell), \mathsf{Unlock}) \to_{\mathcal L} ((g, \bot), \mathsf{Done})}{}$$

\smallskip

Note that these rules will allow a thread to read and write the shared state even without holding the lock.
The rules also allow any thread to unlock the lock, with no consideration for whether that thread must be the current lock holder.
We must use an invariant-based proof to show that there are, in fact, no lurking violations of the lock-based concurrency discipline.

Of course, with just a single thread running, there aren't any interesting violations!
However, we have been careful to describe system $\mathcal L$ in a generic way, with its state a pair of shared and private components.
We can define a generic notion of a multithreaded system, with two systems that share some state and maintain their own private state.

\encoding
\begin{definition}
  Let $T^1 = \angled{S \times P^1, S_0 \times P^1_0, \to^1}$ and $T^2 = \angled{S \times P^2, S_0 \times P^2_0, \to^2}$ be two transition systems, with a shared-state type $S$ in common between their state sets, also agreeing on the initial values $S_0$ for that shared state.  We define the \emph{parallel composition} $T^1 \mid T^2$ as $\angled{S \times (P^1 \times P^2), S_0 \times (P^1_0 \times P^2_0), \to}$, defining new transition relation $\to$ with the following inference rules, which capture the usual notion of thread interleaving.
  $$\infer{(s, (p_1, p_2)) \to (s', (p'_1, p_2))}{
    (s, p_1) \to^1 (s', p'_1)
  }
  \quad \infer{(s, (p_1, p_2)) \to (s', (p_1, p'_2))}{
    (s, p_2) \to^2 (s', p'_2)
  }$$
\end{definition}

Note that the operator $\mid$ is carefully defined so that its output is suitable as input to a further instance of itself.
As a result, while $\mathcal L \mid \mathcal L$ is a transition system modeling two threads running the code from above, we also have $\mathcal L \mid (\mathcal L \mid \mathcal L)$ as a three-thread system based on that code, $(\mathcal L \mid \mathcal L) \mid (\mathcal L \mid \mathcal L)$ as a four-thread system based on that code, etc.

Also note that $\mid$ constructs transition systems with our first examples of \emph{nondeterminism}\index{nondeterminism} in transition relations.
That is, given a particular starting state, there are multiple different places it may wind up after a given number of execution steps.
In general, with thread-interleaving concurrency, the set of possible final states grows exponentially in the number of steps, a fact that torments concurrent-software testers to no end!
Rather than consider all possible runs of the program, we will use an invariant to tame the complexity.

First, we should be clear on what we mean to prove about this program.
Let's also restrict our attention to the two-thread case for the rest of this section; the $n$-thread case is left as an exercise for the reader!
\begin{theorem}
  For any reachable state $((g, \ell), (p^1, p^2))$ of $\mathcal L \mid \mathcal L$, if $p^1 = p^2 = \mathsf{Done}$, then $g = 2$.
\end{theorem}
That is, when both threads terminate, \texttt{global} equals 2.

As a first step toward an invariant, define function $\mathcal C$ from private states to numbers, capturing the \emph{contribution of} a thread with that state, summarizing how much that thread has added to \texttt{globals}.
\begin{eqnarray*}
  \mathcal C(p) &=& \begin{cases}
    1 & p \in \{\mathsf{Unlock}, \mathsf{Done}\} \\
    0 & \mathrm{otherwise}
  \end{cases}
\end{eqnarray*}

Next, we define a function that, given a thread's private state, determines whether that thread \emph{holds the lock}.
\begin{eqnarray*}
  \mathcal H(p) &=& \begin{cases}
    \bot & p \in \{\mathsf{Lock}, \mathsf{Done}\} \\
    \top & \mathrm{otherwise}
  \end{cases}
\end{eqnarray*}

Now, the main insight: we can reconstruct the shared state uniquely from the two private states!
Function $\mathcal S$ does exactly that.
\begin{eqnarray*}
  \mathcal S(p^1, p^2) &=& (\mathcal H(p^1) \lor \mathcal H(p^2), \mathcal C(p^1) + \mathcal C(p^2))
\end{eqnarray*}

One last ingredient will help us write the invariant: a predicate $\mathcal O(p, p')$ capturing when, given the state $p$ of one thread, the state $p'$ is compatible with all of the implications of $p$'s state, primarily in terms of mutual exclusion\index{mutual exclusion} for the lock.
\begin{eqnarray*}
  \mathcal O(p, p') &=& \begin{cases}
    \top & p \in \{\mathsf{Lock}, \mathsf{Done}\} \\
    \neg \mathcal H(p') & p \in \{\mathsf{Read}, \mathsf{Unlock}\} \\
    \neg \mathcal H(p') \land n = \mathcal C(p') & p = \mathsf{Write}(n)
  \end{cases}
\end{eqnarray*}

Finally, we can write the invariant.
\invariants
\begin{eqnarray*}
  I(s, (p^1, p^2)) &=& \mathcal O(p^1, p^2) \land \mathcal O(p^2, p^1) \land s = \mathcal S(p^1, p^2)
\end{eqnarray*}

As is often the case, defining the invariant is the hard part of the proof, and the rest follows by the standard methodology that we used for factorial.
To recap that method, first we use Theorem \ref{invariant_induction} to show that $I$ really is an invariant of $\mathcal L \mid \mathcal L$.
Next, we use Theorem \ref{invariant_weaken} to show that $I$ implies the original property of interest, that finished program states have value 2 for \texttt{global}.
Most of the action is in the first step, where we must work through fussy details of all the different steps that could happen from a state within the invariant, using arithmetic reasoning in each case to either derive a contradiction (that step couldn't happen from this starting state) or show that a specific new state also belongs to the invariant.
We leave those details to the Coq code, as usual.

The reader may be worried at this point that coming up with invariants can be rather tedious!
In the next chapter, we meet a technique for finding invariants automatically, in some limited but important circumstances.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{\label{model_checking}Model Checking}

Our analyses so far have been tedious for at least two different reasons.
First, we've hand-crafted definitions of transition systems, rather than just writing programs in conventional programming languages.
The next chapter will clear that obstacle, by introducing operational semantics, for building transition systems automatically from programs.
The other inconvenience we've faced is defining invariants manually.
There isn't a silver bullet to get us out of this duty, when working with Turing-complete languages\index{Turing-completeness}, where almost all interesting questions, this one included, are undecidable.
However, when we can phrase problems in terms of transition systems with \emph{finitely many reachable states}, we can construct invariants automatically by \emph{exhaustive exploration of the state space}, an approach otherwise known as \emph{model checking}\index{model checking}.
Surprisingly many real programs can be reduced to finite state spaces, using the techniques introduced in this chapter.
First, though, let's formalize our intuitions about exhaustive state-space exploration as a sound way to find invariants.

\section{Exhaustive Exploration}

For an arbitrary binary relation $R$, we write $R^n$ for the $n$-times self-composition of $R$\index{self-composition of relations}.
Formally, where $\mathsf{id}$ is the identity relation\index{identity relation} that only relates values to themselves, we have:
\begin{eqnarray*}
  R^0 &=& \mathsf{id} \\
  R^{n+1} &=& R \circ R^n
\end{eqnarray*}

For some set $S$ and binary relation $R$, we also write $R(S)$ for the composition of $R$ and $S$\index{composition of a relation and a set}, namely $\{x \mid \exists y \in S. \; y \; R \; x\}$.

\newcommand{\ns}[0]{\hspace{-.05in}}

Which states of transition system $\angled{S, S_0, \to}$ are reachable after 0 steps?
That would be precisely the initial states $S_0$, which we can also write as $\to^0\ns(S_0)$.

Which states are reachable after exactly 1 step?
That is $\to\ns(S_0)$, or $\to^1\ns(S_0)$.

How about 2, 3, and 4 steps?
There we have $\to^2\ns(S_0)$, $\to^3\ns(S_0)$, and $\to^4\ns(S_0$).

It follows that the set of states reachable after $n$ steps is:
\begin{eqnarray*}
  \mathsf{reach}(n) &=& \bigcup_{i \leq n} \to^i\ns(S_0)
\end{eqnarray*}

This iteration process is not obviously executable yet, because, a priori, we seem to need to consider all possible $n$ values, to characterize the state space fully.
However, a crucial property allows us to terminate our search soundly under some conditions.

\begin{theorem}
  \invariants
  If $\mathsf{reach}(n+1) = \mathsf{reach}(n)$ for some $n$, then $\mathsf{reach}(n)$ is an invariant of the system.
\end{theorem}

Here we call $\mathsf{reach}(n)$ a \emph{fixed point}\index{fixed point} of the transition system, because it is closed under further exploration.
To find a fixed point with a concrete system, we start with $S_0$.
We repeatedly take the \emph{single-step closure}\index{single-step closure} corresponding to composition with $\to$.
At each step, we check whether the expanded set is actually equal to the previous set.
If so, our process of \emph{multi-step closure}\index{multi-step closure} has terminated, and we have an invariant, by construction.
Again, keep in mind that multi-step closure will not terminate for most transition systems, and there is an art to phrasing a problem in terms of systems where it \emph{will} terminate.


\section{\label{trs_simulation}Abstracting a Transition System}

When analyzing an infinite-state system, it is not necessary to give up hope for model checking.
For instance, consider this program.
\begin{verbatim}
int global = 0;

thread() {
 int local;

 while (true) {
   local = global;
   global = local + 2;
 }
}
\end{verbatim}

If we assume infinite-precision integers, then the state space is infinite.
Considering just the global variable, every even number is reachable, even if we only run a single thread.
However, there is a high degree of regularity across this state space.
In particular, those values really are all even.
Consider this other program, which is hauntingly similar to the last one, in a way that we will make precise shortly.
\begin{verbatim}
bool global = true;

thread() {
  bool local;

  while (true) {
    local = global;
    global = local;
  }
}
\end{verbatim}

We replaced every use of an integer with \emph{a Boolean that is true iff the integer is even}.
Notice that now the program has a finite state space, and model checking applies easily!
We can formalize such a transformation via the general principle of \emph{abstraction of a transition system}\index{abstraction}.

\newcommand{\simulate}[0]{\prec}

The key idea is that every state of the concrete system (with relatively many states) can be associated to one or more states of the abstract system (with relatively few states).
We formalize this association via a \emph{simulation relation}\index{simulation relation} $R$, and we define what makes a choice of $R$ sound, via a notion of \emph{simulation}\index{simulation} via a binary operator $\simulate$, subscripted by $R$.

$$\infer{\angled{S, S_0, \to} \simulate_R \angled{S', S'_0, \to'}}{
  (\forall s \in S_0. \; \exists s' \in S'_0. \; s \; R \; s')
  & (\forall s, s', s_1. \; s \; R \; s' \land s \to s_1 \Rightarrow \exists s'_1. \; s' \to' s'_1 \land s_1 \; R \; s'_1)
}$$

The simpler condition is that every concrete initial state must be related to at least one abstract initial state.
The second, more complex condition essentially says that every step in the concrete world must be matchable by some related step in the abstract world.
A commuting diagram may express the second condition more clearly.
\[
\begin{tikzcd}
s \arrow{r}{\to} \arrow{d}{R} & s_1 \arrow{d}{R} \\
s' \arrow{r}{\exists \to'} & s'_1
\end{tikzcd}
\]

At an even higher intuitive level, what simulation says is that every execution of the concrete system may be matched, step for step, by an execution of the abstract system.
The relation $R$ explains the rules for which states match across systems.
For our purposes, the key pay-off from this connection is that we may translate any invariant of the abstract system into an invariant of the concrete system.

\newcommand{\abstraction}[0]{\marginpar{\fbox{\textbf{Abstraction}}}}

\begin{theorem}\label{abstract_simulation}
  \abstraction
  If $\angled{S, S_0, \to} \simulate_R \angled{S', S'_0, \to'}$, and if $I$ is an invariant of $\angled{S', S'_0, \to'}$, then $R^{-1}(I)$ is an invariant of $\angled{S, S_0, \to}$.
\end{theorem}

We can apply this theorem to the two example programs from earlier in the section, now imagining that we run two parallel-thread copies of each program, using last chapter's approach to modeling threads with transition systems.
The concrete system can be represented with thread-local states $\{\mathsf{Read}\} \cup \{\mathsf{Write}(n) \mid n \in \mathbb N\}$ and the abstract system with $\{\mathsf{BRead}\} \cup \{\mathsf{BWrite}(b) \mid b \in \mathbb B\}$, for the Booleans $\mathbb B$.
We define compatibility between local states.

$$\infer{\mathsf{Read} \sim \mathsf{BRead}}{}
\quad \infer{\mathsf{Write}(n) \sim \mathsf{BWrite}(b)}{
  n \; \textrm{even} \Leftrightarrow b = \mathsf{true}
}$$

We also define the overall state simulation relation $R$, which also covers state shared by threads.

$$\infer{(n, (\ell_1, \ell_2)) \; R \; (b, (\ell'_1, \ell'_2))}{
  (n \; \textrm{even} \Leftrightarrow b = \mathsf{true})
  & \ell_1 \sim \ell'_1
  & \ell_2 \sim \ell'_2
}$$

By proving that $R$ is truly a simulation relation, we reduce the problem to finding an invariant for the abstract system, which is easy to do with model checking.

One crucial consequence of abstraction-by-simulation deserves mentioning:
We show that every concrete execution is matched abstractly, but there may also be additional abstract executions that don't match any concrete ones.
In model checking the abstract system, we may do extra work to handle these ``useless'' paths!
If we do manage to handle them all, then Theorem \ref{abstract_simulation} applies perfectly well.
However, we should be careful, in our choices of abstractions, to bias our designs toward those that don't introduce extra complexities.


\section{Modular Decomposition of Invariant-Finding}

Many transition systems are straightforward to abstract into others, as single global steps.
Other times, the right way to tame a complex system is to decompose it into others and analyze them separately for invariants.
In such cases, the key is a proof principle to combine the invariants of the component systems into an invariant of the overall system.
We will refer to this style of proof decomposition as \emph{modularity}\index{modularity}, and this section gives our first example of modularity, for multithreaded systems.

Imagine that we have a system consisting of $n$ different copies of a transition system $\angled{S, S_0, \to}$ running as concurrent threads, modeled in the way introduced in the previous chapter.
It's not obvious that we can analyze each thread separately, since, during that thread's execution, the other threads are constantly interrupting and modifying global state.
To make matters worse, we can only understand their patterns of state modification by analyzing their thread-local state.
The situation seems inherently unmodular.

However, consider the following construction on transition systems.
Given a transition relation $\to$ and an invariant $I$ on the global state shared by all threads, we define a new transition relation $\to^I$ as follows.
$$\infer{s \to^I s'}{
  s \to s'
}
\quad \infer{(g, \ell) \to^I (g', \ell)}{
  I(g')
}$$

The first rule says that any step of the original relation is also a step of the new relation.
However, the second rule adds a new kind of step: the global state may change \emph{arbitrarily}, so long as the new value satisfies invariant $I$.
We lift this operation to full transition systems, defining $\angled{S, S_0, \to}^I = \angled{S, S_0, \to^I}$.

This construction trivially acts as an abstraction.
\begin{theorem}\label{shared_invariant_abstract}
  \abstraction
  $\mathbb S \simulate_\mathsf{id} {\mathbb S}^I$, for any system $\mathbb S$ and property $I$.
\end{theorem}

\newcommand{\modularity}[0]{\marginpar{\fbox{\textbf{Modularity}}}}

However, we wouldn't want to make this abstraction step in a proof about a single thread.
We needlessly complicate our model checking by forcing ourselves to consider all modifications of the global state that obey $I$.
The payoff comes in analyzing multithreaded systems.
\begin{theorem}\label{shared_invariant_modular}
  \modularity
  Where $I$ is an invariant over only the shared state of a multithreaded system, let $I' = \{(g, \ell) \mid I(g)\}$ be the lifting of $I$ to cover full states, local parts included.  If $I'$ is an invariant for both ${\mathbb S}_1^I$ and ${\mathbb S}_2^I$, then $I'$ is also an invariant for $({\mathbb S}_1 \mid {\mathbb S}_2)^I$.
\end{theorem}

This theorem gives us a way to analyze the threads in a system separately.
As an example, consider this program, where multiple threads will run \texttt{f()} simultaneously.
\begin{verbatim}
int global = 0;

f() {
  int local = 0;

  while (true) {
    local = global;
    local = 3 + local;
    local = 7 + local;
    global = local;
  }
}
\end{verbatim}

Call the transition-system encoding of this code $\mathbb S$.
We can apply the Boolean-for-evenness abstraction to model a single thread with finite state, but we are left needing to account for interference by other threads.
However, we can apply Theorem \ref{shared_invariant_modular} to analyze threads separately.

For instance, we want to show that ``\texttt{global} is always even'' is an invariant of ${\mathbb S} \mid {\mathbb S}$.
By Theorem \ref{shared_invariant_abstract}, we can switch to analyzing system $({\mathbb S} \mid {\mathbb S})^I$, where $I$ is the evenness invariant.
By Theorem \ref{shared_invariant_modular}, we can switch to proving the same invariant separately for systems ${\mathbb S}^I$ and ${\mathbb S}^I$, which are, of course, the same system in this case.
We apply the Boolean-for-evenness abstraction to this system, to get one with a finite state space, so we can check the invariant automatically by model checking.
Following the chain of reasoning backward, we have proved the invariant for ${\mathbb S} \mid {\mathbb S}$.

Even better, that last proof includes the hardest steps that carry over to the proof for an arbitrary number of threads.
Define an exponentially growing system of threads ${\mathbb S}^n$ by:
\begin{eqnarray*}
  {\mathbb S}^0 &=& \mathbb S \\
  {\mathbb S}^{n+1} &=& {\mathbb S}^n \mid {\mathbb S}^n
\end{eqnarray*}

\begin{theorem}
  For any $n$, it is an invariant of ${\mathbb S}^n$ that the global variable is always even.
\end{theorem}

\begin{proof}
  By induction on $n$, repeatedly using Theorem \ref{shared_invariant_modular} to push the obligation down to the leaves of the tree of concurrent compositions, after applying Theorem \ref{shared_invariant_abstract} at the start to introduce the use of $\ldots^I$.
  Every leaf is the same system $\mathbb S$, for which we abstract and apply model checking, appealing to the step above where we ran the same analysis.
\end{proof}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{\label{operational_semantics}Operational Semantics}

It gets tedious to define a relation from first principles, to explain the behaviors of any concrete program.
We do more things with programs than just reason about them.
For instance, we compile them into other languages.
To get the most mileage out of our correctness proofs, we should connect them to the same program syntax that we pass to compilers.
\emph{Operational semantics}\index{operational semantics} is a family of techniques for automatically defining a transition system, or other relational characterization, from program syntax.

\newcommand{\assign}[2]{#1 \leftarrow #2}
\newcommand{\skipe}[0]{\mathsf{skip}}
\newcommand{\ifte}[3]{\mathsf{if} \; #1 \; \mathsf{then} \; #2 \; \mathsf{else} \; #3}
\newcommand{\while}[2]{\mathsf{while} \; #1 \; \mathsf{do} \; #2}

Throughout this chapter, we will demonstrate the different operational-semantics techniques on a single source language, defined like so.
$$\begin{array}{rrcl}
  \textrm{Numbers} & n &\in& \mathbb N \\
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Expressions} & e &::=& n \mid x \mid e + e \mid e - e \mid e \times e \\
  \textrm{Commands} & c &::=& \skipe \mid \assign{x}{e} \mid c; c \mid \ifte{e}{c}{c} \mid \while{e}{c}
\end{array}$$


\section{Big-Step Semantics}

\newcommand{\bigstep}[2]{#1 \Downarrow #2}

\emph{Big-step operational semantics}\index{big-step operational semantics} explains what it means to run a program to completion.
For our example language, we define a relation written $\bigstep{(v, c)}{v'}$, for ``command $c$, run with variable valuation $v$, terminates, modifying the valuation to $v'$.''

This relation is fairly straightforward to define with inference rules.
\encoding
$$\infer{\bigstep{(v, \skipe)}{v}}{}
\quad \infer{\bigstep{(v, \assign{x}{e})}{\mupd{v}{x}{\denote{e}v}}}{}
\quad \infer{\bigstep{(v, c_1; c_2)}{v_2}}{
  \bigstep{(v, c_1)}{v_1}
  & \bigstep{(v_1, c_2)}{v_2}
}$$
$$\infer{\bigstep{(v, \ifte{e}{c_1}{c_2})}{v'}}{
  \denote{e}{v} \neq 0
  & \bigstep{(v, c_1)}{v'}
}
\quad \infer{\bigstep{(v, \ifte{e}{c_1}{c_2})}{v'}}{
  \denote{e}{v} = 0
  & \bigstep{(v, c_2)}{v'}
}$$
$$\infer{\bigstep{(v, \while{e}{c_1})}{v_2}}{
  \denote{e}{v} \neq 0
  & \bigstep{(v, c_1)}{v_1}
  & \bigstep{(v_1, \while{e}{c_1})}{v_2}
}
\quad \infer{\bigstep{(v, \while{e}{c_1})}{v}}{
  \denote{e}{v} = 0
}$$

Notice how the definition is quite similar to a recursive interpreter\index{interpreters} written in a high-level programming language, though we write with the language of relations instead of functional programming.
For instance, consider the simple case of the rule for sequencing, ``;''.
We first ``call the interpreter'' on the first subcommand $c_1$ with the original valuation $v$.
The result of the ``recursive call'' is a new valuation $v_1$, which we then feed into another ``recursive call'' on $c_2$, whose result becomes the overall result.

Why write this interpreter relationally instead of as a functional program?
The most relevant answer applies to situations like ours as users of Coq or even of informal mathematics, where we must be very careful that all of our recursive definitions are well-founded.
The recursive version of this relation is clearly not well-founded, as it would run forever on a nonterminating $\mathsf{while}$ loop.
It is also easier to incorporate \emph{nondeterminism}\index{nondeterminism} in the relational style, a possibility that we will return to at the end of the chapter.

The big-step semantics is easy to apply to concrete programs.
For instance, define $\mathtt{factorial}$ as the program $\assign{\mathtt{output}}{1}; \while{\mathtt{input}}{(\assign{\mathtt{output}}{\mathtt{output} \times \mathtt{input}}; \assign{\mathtt{input}}{\mathtt{input} - 1})}$.

\begin{theorem}
  There exists $v$ such that $\bigstep{(\mupd{\mempty}{\mathtt{input}}{2}, \mathtt{factorial})}{v}$ and $\msel{v}{\mathtt{output}} = 2$.
\end{theorem}

\begin{proof}
  By repeated application of the big-step inference rules.
\end{proof}

We can even prove that $\mathtt{factorial}$ behaves correctly on all inputs, by way of a lemma about $\mathtt{factorial\_loop}$ defined as $\while{\mathtt{input}}{(\assign{\mathtt{output}}{\mathtt{output} \times \mathtt{input}}; \assign{\mathtt{input}}{\mathtt{input} - 1})}$.

\begin{lemma}\label{factorial_loop}
  If $\msel{v}{\mathtt{input}} = n$ and $\msel{v}{\mathtt{output}} = o$, then there exists $v'$ such that $\bigstep{(v, \mathtt{factorial\_loop})}{v'}$ and $\msel{v'}{\mathtt{output}} = n! \times o$.
\end{lemma}

\begin{proof}
  By induction on $n$.
\end{proof}

\begin{lemma}
  If $\msel{v}{\mathtt{input}} = n$, then there exists $v'$ such that $\bigstep{(v, \mathtt{factorial})}{v'}$ and $\msel{v'}{\mathtt{output}} = n!$.
\end{lemma}

\begin{proof}
  Largely by direct appeal to Lemma \ref{factorial_loop}.
\end{proof}

Most of our program proofs in this book establish \emph{safety properties}\index{safety properties}, or invariants of transition systems.
However, these last two examples with big-step semantics also establish program termination, taking us a few steps into the world of \emph{liveness properties}\index{liveness properties}.


\section{Small-Step Semantics}

Often it is convenient to break a system's execution into small sequential steps, rather than executing a whole program in one go.
Perhaps the most compelling example comes from concurrency, where it is difficult to give a big-step semantics directly.
Nonterminating programs are the other standard example.
We want to be able to establish invariants for those programs, all the same, and we need a semantics to help us state what it means to be an invariant.

\newcommand{\smallstep}[2]{#1 \to #2}

The canonical solution is \emph{small-step operational semantics}\index{small-step operational semantics}, probably the most common approach to formal program semantics in contemporary research.
Now we define a single-step relation $\smallstep{(v, c)}{(v', c')}$, meaning that one execution step transforms the first state into the second state.
Each state is a valuation $v$ and a current command $c$.

These inference rules give the details.
\encoding
$$\infer{\smallstep{(v, \assign{x}{e})}{(\mupd{v}{x}{\denote{e}v}, \skipe)}}{}
\quad \infer{\smallstep{(v, c_1; c_2)}{(v', c'_1; c_2)}}{
    \smallstep{(v, c_1)}{(v', c'_1)}
}
\quad \infer{\smallstep{(v, \skipe; c_2)}{(v, c_2)}}{}$$
$$\infer{\smallstep{(v, \ifte{e}{c_1}{c_2})}{(v, c_1)}}{
  \denote{e}v \neq 0
}
\quad \infer{\smallstep{(v, \ifte{e}{c_1}{c_2})}{(v, c_2)}}{
  \denote{e}v = 0
}$$
$$\infer{\smallstep{(v, \while{e}{c_1})}{(v, c_1; \while{e}{c_1})}}{
  \denote{e}v \neq 0
}
\quad \infer{\smallstep{(v, \while{e}{c_1})}{(v, \skipe)}}{
  \denote{e}v = 0
}$$

The intuition behind the rules may come best from working out an example.

\newcommand{\smallsteps}[2]{#1 \to^* #2}

\begin{theorem}
  There exists a valuation $v$ such that $\smallsteps{(\mupd{\mempty}{\mathtt{input}}{2}, \mathtt{factorial})}{(v, \skipe)}$ and $\msel{v}{\mathtt{output}} = 2$.
\end{theorem}

\begin{proof}
  Here is a step-by-step (literally!) derivation that finds $v$.
  $$\begin{array}{cl}
    & (\mupd{\mempty}{\mathtt{input}}{2}, \assign{\mathtt{output}}{1}; \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, \skipe; \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, (\assign{\mathtt{output}}{\mathtt{output} \times \mathtt{input}}; \assign{\mathtt{input}}{\mathtt{input} - 1}); \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{2}, (\skipe; \assign{\mathtt{input}}{\mathtt{input} - 1}); \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{2}, \assign{\mathtt{input}}{\mathtt{input} - 1}; \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{1}}{\mathtt{output}}{2}, \skipe; \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{1}}{\mathtt{output}}{2}, \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{1}}{\mathtt{output}}{2}, (\assign{\mathtt{output}}{\mathtt{output} \times \mathtt{input}}; \assign{\mathtt{input}}{\mathtt{input} - 1}); \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{1}}{\mathtt{output}}{2}, (\skipe; \assign{\mathtt{input}}{\mathtt{input} - 1}); \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{1}}{\mathtt{output}}{2}, \assign{\mathtt{input}}{\mathtt{input} - 1}; \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{0}}{\mathtt{output}}{2}, \skipe; \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{0}}{\mathtt{output}}{2}, \mathtt{factorial\_loop}) \\
    \to & (\mupd{\mupd{\mempty}{\mathtt{input}}{0}}{\mathtt{output}}{2}, \skipe)
  \end{array}$$

  Clearly the final valuation assigns $\mathtt{output}$ to 2.
\end{proof}

\subsection{Equivalence of Big-Step and Small-Step}

Different theorems are easier to prove with different semantics, so it is helpful to establish formally the intuitive connection between big and small steps.

\begin{lemma}
  If $\smallsteps{(v, c_1)}{(v', c'_1)}$, then $\smallsteps{(v, c_1; c_2)}{(v', c'_1; c_2)}$,
\end{lemma}

\begin{proof}
  By induction on the derivation of $\smallsteps{(v, c_1)}{(v', c'_1)}$.
\end{proof}

\begin{theorem}
  If $\bigstep{(v, c)}{v'}$, then $\smallsteps{(v, c)}{(v', \skipe)}$.
\end{theorem}

\begin{proof}
  By induction on the derivation of $\bigstep{(v, c)}{v'}$, appealing to the last lemma at two points.
\end{proof}

\begin{lemma}
  If $\smallstep{(v, c)}{(v', c')}$ and $\bigstep{(v', c')}{v''}$, then $\bigstep{(v, c)}{v''}$.  In other words, we can add a small step to the beginning of any big-step derivation.
\end{lemma}

\begin{proof}
  By induction on the derivation of $\smallstep{(v, c)}{(v', c')}$.
\end{proof}

\begin{lemma}
  If $\smallsteps{(v, c)}{(v', c')}$ and $\bigstep{(v', c')}{v''}$, then $\bigstep{(v, c)}{v''}$.  In other words, we can add any number of small steps to the beginning of any big-step derivation.
\end{lemma}

\begin{proof}
  By induction on the derivation of $\smallsteps{(v, c)}{(v', c')}$, appealing to the last lemma.
\end{proof}

\begin{theorem}
  If $\smallsteps{(v, c)}{(v', \skipe)}$, then $\bigstep{(v, c)}{v'}$.
\end{theorem}

\begin{proof}
  Largely by appeal to the last lemma, considering that $\bigstep{(v', \skipe)}{v'}$.
\end{proof}

\subsection{Transition Systems from Small-Step Semantics}

The small-step semantics is a natural fit with our working definition of transition systems.
We can define a transition system from any valuation and command, where $\mathbb V$ is the set of valuations and $\mathbb C$ the set of commands, by $\mathbb T(v, c) = \angled{\mathbb V \times \mathbb C, \{(v, c)\}, \to}$.
Now we bring to bear all of our machinery about invariants and their proof methods.

For instance, consider program $P = \while{\mathtt{n}}{\assign{\mathtt{a}}{\mathtt{a} + \mathtt{n}}; \assign{\mathtt{n}}{\mathtt{n} - 2}}$.

\invariants
\begin{theorem}
  Given even $n$, for $\mathbb T(\mupd{\mupd{\mempty}{\mathtt{n}}{n}}{\mathtt{a}}{0}, P)$, it is an invariant that the valuation maps variable $\mathtt{a}$ to an even number.
\end{theorem}

\begin{proof}
  First, we strengthen the invariant.
  We compute the set $\overline{P}$ of all commands that can be reached from $P$ by stepping the small-step semantics.
  This set is finite, even though the set of \emph{reachable valuations} is infinite, considering all potential $n$ values.
  Our strengthened invariant is $I(v, c) = c \in \overline{P} \land (\exists n. \; \msel{v}{\mathtt{n}} = n \land \textrm{even}(n)) \land (\exists a. \; \msel{v}{\mathtt{a}} = a \land \textrm{even}(a))$.
  In other words, we strengthen by adding the constraints that (1) we do not stray from the expected set of reachable commands and (2) variable \texttt{n} also remains even.

  The strengthened invariant is straightforward to prove by invariant induction, using repeated inversion on $\to$ facts.
\end{proof}


\section{Contextual Small-Step Semantics}

The reader may have noticed some tedium in certain rules of the small-step semantics, like this one.
$$\infer{\smallstep{(v, c_1; c_2)}{(v', c'_1; c_2)}}{
    \smallstep{(v, c_1)}{(v', c'_1)}
}$$
This rule is an example of a \emph{congruence rule}\index{congruence rule}, which shows how to take a step and \emph{lift} it into a step within a larger command, whose other subcommands are unaffected.
Complex languages can require many congruence rules, and yet we feel like we should be able to avoid repeating all this boilerplate logic somehow.
A common way to do so is switching to \emph{contextual small-step semantics}\index{contextual small-step semantics}.

We illustrate with our running example language.
The first step is to define a set of \emph{evaluation contexts}\index{evaluation contexts}, which formalize the spots within a larger command where steps are enabled.
\encoding
$$\begin{array}{rrcl}
  \textrm{Evaluation contexts} & C &::=& \Box \mid C; c
\end{array}$$

\newcommand{\plug}[2]{#1[#2]}
We define the operator of \emph{plugging}\index{plugging evaluation contexts} an evaluation context in the natural way.
\begin{eqnarray*}
  \plug{\Box}{c} &=& c \\
  \plug{(C; c_2)}{c} &=& \plug{C}{c}; c_2
\end{eqnarray*}

For this language, the only interesting case of evaluation contexts is the one that allows us to \emph{descend into the left subcommand}, because the old congruence rule invoked the step relation recursively for that position.

\newcommand{\smallstepo}[2]{#1 \to_0 #2}

The next ingredient is a reduced set of basic step rules, where we have dropped the congruence rule.
$$\infer{\smallstepo{(v, \assign{x}{e})}{(\mupd{v}{x}{\denote{e}v}, \skipe)}}{}
\quad \infer{\smallstepo{(v, \skipe; c_2)}{(v, c_2)}}{}$$
$$\infer{\smallstepo{(v, \ifte{e}{c_1}{c_2})}{(v, c_1)}}{
  \denote{e}v \neq 0
}
\quad \infer{\smallstepo{(v, \ifte{e}{c_1}{c_2})}{(v, c_2)}}{
  \denote{e}v = 0
}$$
$$\infer{\smallstepo{(v, \while{e}{c_1})}{(v, c_1; \while{e}{c_1})}}{
  \denote{e}v \neq 0
}
\quad \infer{\smallstepo{(v, \while{e}{c_1})}{(v, \skipe)}}{
  \denote{e}v = 0
}$$

\newcommand{\smallstepc}[2]{#1 \to_\mathsf{c} #2}

We regain the full coverage of the original rules with a new relation $\to_\mathsf{c}$, saying that we may apply $\to_0$ at the active subcommand within a larger command.
$$\infer{\smallstepc{(v, C[c])}{(v', C[c'])}}{
  \smallstepo{(v, c)}{(v', c')}
}$$

Let's revisit last section's example, to see contextual semantics in action, especially to demonstrate how to express an arbitrary command as an evaluation context plugged with another command.

\newcommand{\smallstepcs}[2]{#1 \to^*_\mathsf{c} #2}

\begin{theorem}
  There exists valuation $v$ such that $\smallstepcs{(\mupd{\mempty}{\mathtt{input}}{2}, \mathtt{factorial})}{(v, \skipe)}$ and $\msel{v}{\mathtt{output}} = 2$.
\end{theorem}

\begin{proof}
  $$\begin{array}{cl}
    & (\mupd{\mempty}{\mathtt{input}}{2}, \assign{\mathtt{output}}{1}; \mathtt{factorial\_loop}) \\
    = & (\mupd{\mempty}{\mathtt{input}}{2}, \plug{(\Box; \mathtt{factorial\_loop})}{\assign{\mathtt{output}}{1}}) \\
    \to_\mathsf{c} & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, \skipe; \mathtt{factorial\_loop}) \\
    = & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, \plug{\Box}{\skipe; \mathtt{factorial\_loop}}) \\
    \to_\mathsf{c} & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, \mathtt{factorial\_loop}) \\
    = & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, \plug{\Box}{\mathtt{factorial\_loop}}) \\
    \to_\mathsf{c} & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, (\assign{\mathtt{output}}{\mathtt{output} \times \mathtt{input}}; \assign{\mathtt{input}}{\mathtt{input} - 1}); \mathtt{factorial\_loop}) \\
    = & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{1}, \plug{((\Box; \assign{\mathtt{input}}{\mathtt{input} - 1}); \mathtt{factorial\_loop})}{\assign{\mathtt{output}}{\mathtt{output} \times \mathtt{input}}}) \\
    \to_\mathsf{c} & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{2}, (\skipe; \assign{\mathtt{input}}{\mathtt{input} - 1}); \mathtt{factorial\_loop}) \\
    = & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{2}, \plug{(\Box; \mathtt{factorial\_loop})}{\skipe; \assign{\mathtt{input}}{\mathtt{input} - 1})} \\
    \to_\mathsf{c} & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{2}, \assign{\mathtt{input}}{\mathtt{input} - 1}; \mathtt{factorial\_loop}) \\
    = & (\mupd{\mupd{\mempty}{\mathtt{input}}{2}}{\mathtt{output}}{2}, \plug{(\Box; \mathtt{factorial\_loop})}{\assign{\mathtt{input}}{\mathtt{input} - 1}}) \\
    \to_\mathsf{c} & (\mupd{\mupd{\mempty}{\mathtt{input}}{1}}{\mathtt{output}}{2}, \skipe; \mathtt{factorial\_loop}) \\
    = & (\mupd{\mupd{\mempty}{\mathtt{input}}{1}}{\mathtt{output}}{2}, \plug{\Box}{\skipe; \mathtt{factorial\_loop}}) \\
    \to^*_\mathsf{c} & \ldots \\
    \to_\mathsf{c} & (\mupd{\mupd{\mempty}{\mathtt{input}}{0}}{\mathtt{output}}{2}, \skipe)
  \end{array}$$

  Clearly the final valuation assigns $\mathtt{output}$ to 2.
\end{proof}

\subsection{Equivalence of Small-Step, With and Without Evaluation Contexts}

This new semantics formulation is equivalent to the other two, as we establish now.

\begin{theorem}
  If $\smallstep{(v, c)}{(v', c')}$, then $\smallstepc{(v, c)}{(v', c')}$.
\end{theorem}

\begin{proof}
  By induction on the derivation of $\smallstep{(v, c)}{(v', c')}$.
\end{proof}

\begin{lemma}
  If $\smallstepo{(v, c)}{(v', c')}$, then $\smallstep{(v, c)}{(v', c')}$.
\end{lemma}

\begin{proof}
  By cases on the derivation of $\smallstepo{(v, c)}{(v', c')}$.
\end{proof}

\begin{lemma}
  If $\smallstepo{(v, c)}{(v', c')}$, then $\smallstep{(v, C[c])}{(v', C[c'])}$.
\end{lemma}

\begin{proof}
  By induction on the structure of evaluation context $C$, appealing to the last lemma.
\end{proof}

\begin{theorem}
  If $\smallstepc{(v, c)}{(v', c')}$, then $\smallstep{(v, c)}{(v', c')}$.
\end{theorem}

\begin{proof}
  By inversion on the derivation of $\smallstepc{(v, c)}{(v', c')}$, followed by an appeal to the last lemma.
\end{proof}

\subsection{\label{eval_contexts}Evaluation Contexts Pay Off: Adding Concurrency}

To showcase the convenience of contextual semantics, let's extend our example language with a simple construct for running two commands in parallel\index{parallel composition of threads}, implicitly extending the definition of plugging accordingly.
$$\begin{array}{rrcl}
  \textrm{Commands} & c &::=& \ldots \mid c || c
\end{array}$$

To capture the idea that \emph{either} command in a parallel construct is allowed to step next, we extend evaluation contexts like so:
\encoding
$$\begin{array}{rrcl}
  \textrm{Evaluation contexts} & C &::=& \ldots \mid C || c \mid c || C
\end{array}$$

We need one more basic step rule, to ``garbage-collect'' threads that have finished.
$$\infer{\smallstepo{(v, \skipe || c)}{(v, c)}}{}$$

And that's it!
The new system faithfully captures our usual idea of threads executing in parallel.
All of the theorems proved previously about contextual steps continue to hold.
In fact, in the accompanying Coq code, literally the same proof scripts establish the new versions of the theorems, with no new human proof effort.
It's not often that concurrency comes for free in a rigorous proof!


\section{Determinism}

Our last extension with parallelism introduced intentional nondeterminism in the semantics: a single starting state can step to multiple different next states.
However, the three semantics for the original language are deterministic, and we can prove it.

\begin{theorem}
  If $\bigstep{(v, c)}{v_1}$ and $\bigstep{(v, c)}{v_2}$, then $v_1 = v_2$.
\end{theorem}

\begin{proof}
  By induction on the derivation of $\bigstep{(v, c)}{v_1}$ and inversion on the derivation of $\bigstep{(v, c)}{v_2}$.
\end{proof}

\begin{theorem}
  If $\smallstep{(v, c)}{(v_1, c_1)}$ and $\smallstep{(v, c)}{(v_2, c_2)}$, then $v_1 = v_2$ and $c_1 = c_2$.
\end{theorem}

\begin{proof}
  By induction on the derivation of $\smallstep{(v, c)}{(v_1, c_1)}$ and inversion on the derivation of $\smallstep{(v, c)}{(v_2, c_2)}$.
\end{proof}

\begin{theorem}
  If $\smallstepc{(v, c)}{(v_1, c_1)}$ and $\smallstepc{(v, c)}{(v_2, c_2)}$, then $v_1 = v_2$ and $c_1 = c_2$.
\end{theorem}

\begin{proof}
  Follows from the last theorem and the equivalence we proved between $\to$ and $\to_\mathsf{c}$.
\end{proof}

We'll stop, for now, in our tour of useful properties of operational semantics.
All of the rest of the book is based on small-step semantics, with or without evaluation contexts.
As we study new kinds of programming languages, we will see how to model them operationally.
Almost every new proof technique is phrased as an approach to establishing invariants of transition systems based on small-step semantics.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Abstract Interpretation and Dataflow Analysis}

The last two chapters showed us both how to build a transition system from a program automatically and how to find an invariant for a transition system automatically.
Let's now combine these ideas to find invariants for programs automatically, in a particular way associated with the technique of \emph{dataflow analysis}\index{dataflow analysis} used to drive many compiler optimizations.
Throughout, we'll stick with the example of the small imperative language whose semantics we studied in the last chapter.
We'll confine our attention to its basic small-step semantics via the $\to$ relation.

Model checking builds up increasingly larger finite sets of reachable states in a system.
A state $(v, c)$ of our imperative language combines \emph{control state}\index{control state} $c$ (the next command to execute) with \emph{data state} $v$ (the values of the variables), and so model checking will find invariants that restrict both components.
We say that model checking is \emph{path-sensitive}\index{path-sensitive analysis} because its invariants can distinguish between the different data states that can be associated with the same control state, reached along different paths in the program's executions.
Path-sensitive analyses tend to be much more computationally expensive than \emph{path-insensitive}\index{path-insensitive analysis} analyses, whose invariants collapse together all ways of reaching the same control state.
Dataflow analysis is one such path-insensitive approach, and its underlying theory is \emph{abstract interpretation}\index{abstract interpretation}.


\section{Definition of an Abstract Interpretation}

An abstract interpretation is a particular sort of abstraction, of the kind we met in studying model checking.
In that more general setting, we can represent concrete states with any sorts of abstract states.
In abstract interpretation, we most commonly associate each variable with an independent abstract description.
One example, which we'll formalize in more detail shortly, would be to label each variable as ``even,'' ``odd,'' or ``either.''

\newcommand{\join}[0]{\sqcup}

\begin{definition}
  An \emph{abstract interpretation} (for our example imperative language) is a tuple $\angled{\mathbb D, \top, \mathcal C, \hat{+}, \hat{-}, \hat{\times}, \join, \sim}$, where $\mathbb D$ is a set (the domain of the analysis); $\top \in \mathbb D$; $\mathcal C : \mathbb N \to \mathbb D$; $\hat{+}, \hat{-}, \hat{\times}, \join : \mathbb D \times \mathbb D \to \mathbb D$; and $\sim \; \subseteq \mathbb N \times \mathbb D$.
  The idea is that:
  \begin{itemize}
  \item Abstract versions of numbers are $\mathbb D$ values.
  \item $\top$ (``top'')\index{top element of an abstract interpretation} is the least specific abstract value, representing any concrete value.
  \item $\mathcal C$ maps any constant to its most precise abstraction.
  \item $\hat{+}$, $\hat{-}$, and $\hat{\times}$ push abstraction through arithmetic operators, calculating their most precise abstractions.
  \item $\join$ (``join'')\index{join operation of an abstract interpretation} computes the \emph{least upper bound}\index{least upper bound} of two abstract values: the most specific value that represents any value associated with either input.
  \item $\sim$ formalizes the idea of which concrete values are covered by which abstract values.
  \end{itemize}

  For $a, b \in \mathbb D$, define $a \sqsubseteq b$ to mean $\forall n \in \mathbb N. \; (n \sim a) \Rightarrow (n \sim b)$.  That is, $b$ is at least as general as $a$.
  An abstract interpretation must satisfy the following algebraic laws:
  \begin{itemize}
  \item $\forall a \in \mathbb D. \; a \sqsubseteq \top$
  \item $\forall n \in \mathbb N. \; n \sim \mathcal C(n)$
  \item $\forall n, m \in \mathbb N. \; \forall a, b \in \mathbb D. \; n \sim a \land m \sim b \Rightarrow (n + m) \sim (a \hat{+} b)$
  \item $\forall n, m \in \mathbb N. \; \forall a, b \in \mathbb D. \; n \sim a \land m \sim b \Rightarrow (n - m) \sim (a \hat{-} b)$
  \item $\forall n, m \in \mathbb N. \; \forall a, b \in \mathbb D. \; n \sim a \land m \sim b \Rightarrow (n \times m) \sim (a \hat{\times} b)$
  \item $\forall a, b, a', b' \in \mathbb D. \; a \sqsubseteq a' \land b \sqsubseteq b' \Rightarrow (a \hat{+} b) \sqsubseteq (a' \hat{+} b')$
  \item $\forall a, b, a', b' \in \mathbb D. \; a \sqsubseteq a' \land b \sqsubseteq b' \Rightarrow (a \hat{-} b) \sqsubseteq (a' \hat{-} b')$
  \item $\forall a, b, a', b' \in \mathbb D. \; a \sqsubseteq a' \land b \sqsubseteq b' \Rightarrow (a \hat{\times} b) \sqsubseteq (a' \hat{\times} b')$
  \item $\forall a, b \in \mathbb D. \; a \sqsubseteq (a \join b)$
  \item $\forall a, b \in \mathbb D. \; b \sqsubseteq (a \join b)$
  \end{itemize}
\end{definition}

\newcommand{\E}[0]{\mathsf{E}}
\renewcommand{\O}[0]{\mathsf{O}}

As an example, consider this formalization of even-odd analysis, whose proof of soundness is left as an exercise for the reader.
(While the treatment of subtraction may seem gratuitously imprecise, recall that we are working here with natural numbers and not integers, such that subtraction ``sticks'' at zero when the result would otherwise be negative.)
\begin{eqnarray*}
  \mathbb D &=& \{\E, \O, \top\} \\
  \mathcal C(n) &=& \textrm{$\E$ or $\O$, depending on parity of $n$} \\
  \E \; \hat{+} \; \E &=& \E \\
  \E \; \hat{+} \; \O &=& \O \\
  \O \; \hat{+} \; \E &=& \O \\
  \O \; \hat{+} \; \O &=& \E \\
  \_ \; \hat{+} \; \_ &=& \top \\
  \E \; \hat{-} \; \E &=& \E \\
  \O \; \hat{-} \; \O &=& \E \\
  \_ \; \hat{-} \; \_ &=& \top \\
  \E \; \hat{\times} \; \_ &=& \E \\
  \_ \; \hat{\times} \; \E &=& \E \\
  \O \; \hat{\times} \; \O &=& \O \\
  \_ \; \hat{\times} \; \_ &=& \top \\
  \E \join \E &=& \E \\
  \O \join \O &=& \O \\
  \_ \join \_ &=& \top \\
  n \sim \E &=& \textrm{$n$ is even} \\
  n \sim \O &=& \textrm{$n$ is odd} \\
  n \sim \top &=& \textrm{always}
\end{eqnarray*}

We generally think of an abstract interpretation as forming a \emph{lattice}\index{lattice} (actually a semilattice\index{semilattice}), which is roughly the algebraic structure characterized by operations like $\join$, when $\join$ truly returns the \emph{most specific} or \emph{least} upper bound of its two arguments.  We visualize the even-odd lattice like so.

\begin{center}\begin{tikzpicture}[node distance=1.5cm]
\node(top)                      {$\top$};
\node(E)   [below left of=top]  {$\E$};
\node(O)   [below right of=top] {$\O$};

\draw(top) -- (E);
\draw(top) -- (O);
\end{tikzpicture}\end{center}

The idea is that taking the join of two elements moves us \emph{up} the lattice to their lowest common ancestor.

An edge going up from $a$ to $b$ indicates that $a \sqsubseteq b$.
As another example, consider a lattice tracking prime factors of numbers, up to 5.
Then the picture version might go like so:

\begin{center}\begin{tikzpicture}[node distance=1.5cm]
\node(top)                              {$\{\}$};
\node(two)        [below left of=top]   {$\{2\}$};
\node(three)      [below of=top]        {$\{3\}$};
\node(five)       [below right of=top]  {$\{5\}$};
\node(twothree)   [below left of=two]   {$\{2, 3\}$};
\node(twofive)    [below of=three]      {$\{2, 5\}$};
\node(threefive)  [below right of=five] {$\{3, 5\}$};
\node(bot)        [below of=twofive]    {$\{2, 3, 5\}$};

\draw(top)       -- (two);
\draw(top)       -- (three);
\draw(top)       -- (five);
\draw(two)       -- (twothree);
\draw(two)       -- (twofive);
\draw(three)     -- (twothree);
\draw(three)     -- (threefive);
\draw(five)      -- (twofive);
\draw(five)      -- (threefive);
\draw(twothree)  -- (bot);
\draw(twofive)   -- (bot);
\draw(threefive) -- (bot);
\end{tikzpicture}\end{center}

Since $\sqsubseteq$ is clearly transitive, upward-moving paths across multiple nodes also imply $\sqsubseteq$ relationships between their endpoints.
It's worth verifying quickly that any two nodes in this graph have a unique lowest common ancestor, which is the proper result of the $\join$ operation on those nodes.

Another worthwhile exercise for the reader is to work out the proper definitions of $\hat{+}$, $\hat{-}$, and $\hat{\times}$ for this domain.


\section{Flow-Insensitive Analysis}

We now give our first recipe for building a program abstraction from an abstract interpretation.
We apply a \emph{flow-insensitive} abstraction, which means we find an invariant that doesn't depend at all on the control part $c$ of a full state $(v, c)$.
Alternatively, the invariant depends only on the data part $v$.
Concretely, with $\mathbb V$ the set of variables, we work with states $s \in \mathbb V \to \mathbb D$, taking the domain $\mathbb D$ of our chosen abstract interpretation.
An abstract state $s$ for a concrete valuation $v$ assigns to each $x$ an abstract value $s(x)$ such that $v(x) \sim s(x)$.
We overload the operator $\sim$ to denote this compatibility via $v \sim s$.

\newcommand{\absexp}[1]{[#1]}

As a preliminary, we define the abstract interpretation of an expression like so:
\begin{eqnarray*}
  \absexp{n}s &=& \mathcal C(n) \\
  \absexp{x}s &=& s(x) \\
  \absexp{e_1 + e_2}s &=& \absexp{e_1}s \hat{+} \absexp{e_2}s \\
  \absexp{e_1 - e_2}s &=& \absexp{e_1}s \hat{-} \absexp{e_2}s \\
  \absexp{e_1 \times e_2}s &=& \absexp{e_1}s \hat{\times} \absexp{e_2}s
\end{eqnarray*}

\begin{theorem}
  If $v \sim s$, then $\denote{e}v \sim \absexp{e}s$.
\end{theorem}

\newcommand{\asgns}[1]{\mathcal A(#1)}

Next, we model the possible effects of commands.
We already said that our flow-insensitive analysis will forget about control flow in a command, but what does that mean formally?
States of this language, without control flow taken into account, are just variable valuations, and the only way a command can affect a valuation is through executing assignments.
Therefore, forgetting the control flow of a command amounts to just \emph{recording which assignments it contains syntactically}, losing all context about which Boolean tests would need to pass to reach each assignment.
This simple syntactic extraction process can be formalized with an assignments-of function $\mathcal A$ for commands.
\begin{eqnarray*}
  \asgns{\skipe} &=& \{\} \\
  \asgns{\assign{x}{e}} &=& \{(x, e)\} \\
  \asgns{c_1; c_2} &=& \asgns{c_1} \cup \asgns{c_2} \\
  \asgns{\ifte{e}{c_1}{c_2}} &=& \asgns{c_1} \cup \asgns{c_2} \\
  \asgns{\while{e}{c_1}} &=& \asgns{c_1}
\end{eqnarray*}

As a final preliminary ingredient, for abstract states $s_1$ and $s_2$, define $s_1 \join s_2$ by $(s_1 \join s_2)(x) = s_1(x) \join s_2(x)$.

Now we define the flow-insensitive step relation, over abstract states alone, as:
$$\infer{s \to^c_\mathsf{FI} s}{}
\quad \infer{s \to^c_\mathsf{FI} s \join \mupd{s}{x}{\absexp{e}s}}{
  (x, e) \in \asgns{c}
}$$

We can establish formally how forgetting about the order of assignments is a valid abstraction technique.

\begin{theorem}\label{flow_insensitive_abstraction}
  \abstraction
  Given command $c$, initial valuation $v$, and initial abstract state $s$ such that $v \sim s$.  The transition system with initial state $s$ and step relation $\to^c_\mathsf{FI}$ simulates the system with initial state $(v, c)$ and step relation $\to$, according to a simulation relation enforcing $\sim$ between the valuation and abstract state.
\end{theorem}

Now a simple procedure can find an invariant for the abstracted system.
In particular:

\begin{enumerate}
\item Initialize $s$ with the abstract state from the theorem statement.
\item \label{flow_insensitive_loop}Compute $s' = s \join \bigsqcup_{(x, e) \in \asgns{c}} \mupd{s}{x}{\absexp{e}s}$.
\item If $s' \sqsubseteq s$, then we're done; $s$ is the invariant.
\item Otherwise, assign $s = s'$ and return to \ref{flow_insensitive_loop}.
\end{enumerate}

Every step in this outline is computable, since the abstract states will always be finite maps.

\begin{theorem}\label{flow_insensitive_iteration}
  \invariants
  If the outline above terminates, then it is an invariant of the flow-insensitive abstracted system that $s$ (its final value from the loop above) is an upper bound for every reachable state.  That is, for every reachable $s'$, $s' \sqsubseteq s$.
\end{theorem}

To check a concrete program, we first abstract it to a flow-insensitive version with Theorem \ref{flow_insensitive_abstraction}, then we find a guaranteed invariant with Theorem \ref{flow_insensitive_iteration}.
One wrinkle here is that it is not obvious that our informal loop above always terminates.
However, it always terminates if our abstract domain has \emph{finite height}\index{finite height of abstract domain}, meaning that there is no infinite ascending chain of distinct elements $a_i$ such that $a_i \sqsubseteq a_{i+1}$ for all $i$.
Our even-odd example trivially has that property, since it contains only finitely many distinct elements.

It is worth emphasizing that, when those conditions are met, our invariant-finding procedure is guaranteed to terminate, even though the underlying language is Turing-complete, so that most interesting analysis problems are uncomputable!
The catch is that it is always possible that the invariant found is a trivial one, where the abstract state maps every variable to $\top$.

Here is an example of a program where flow-insensitive even-odd analysis gives the most precise answer (relative to its simplifying assumption that we must assign the same description to a variable at every step of execution).
$$\assign{n}{10}; \assign{x}{0}; \while{n > 0}{\assign{x}{x + 2 \times n}; \assign{n}{n - 1}}$$

The abstract state we wind up with is $\mupd{\mupd{\mempty}{n}{\top}}{x}{\E}$.

\section{Flow-Sensitive Analysis}

We can only go so far with flow-insensitive invariants, which don't let us record different facts about the variables for different lines of the program code.
Such an analysis will get tripped up even by straightline code where parities of variables change as we go.
Here is a trivial example program where the flow-insensitive analysis returns the useless answer $\mupd{\mempty}{x}{\top}$, when the most precise answer (about program state after execution) would be $\mupd{\mempty}{x}{\O}$.
$$\assign{x}{0}; \assign{x}{1}$$

The solution to this problem can be to go to \emph{flow-sensitive}\index{flow-sensitive analysis} analysis, where an abstract state $S$ is a finite map from commands (all the intermediate ``program counters'' of an original command) to the abstract states of the previous section.

\newcommand{\absstep}[3]{\mathcal S(#1, #2, #3)}
\newcommand{\absstepo}[2]{\mathcal S(#1, #2)}

We define a function $\absstep{s}{c}{f}$ to compute all of the states of the form $(s', c')$ reachable in a single step from $(s, c)$.
Actually, for each $(s', c')$ covered by that informal description, this function returns a map from keys $f(c')$ to values $s'$.
The idea is that function $f$ wraps the step in any additional command context that isn't participating directly in this step.
See how $f$ is modified in the sequencing case below, for something of an intuition for its purpose.
\begin{eqnarray*}
  \absstep{s}{\skipe}{f} &=& \mempty \\
  \absstep{s}{\assign{x}{e}}{f} &=& \mupd{\mempty}{f(\skipe)}{\mupd{s}{x}{\absexp{e}s}} \\
  \absstep{s}{\skipe; c_2}{f} &=& \mupd{\mempty}{f(c_2)}{s} \\
  \absstep{s}{c_1; c_2}{f} &=& \absstep{s}{c_1}{\lambda c. \; f(c; c_2)} \\
  \absstep{s}{\ifte{e}{c_1}{c_2}}{f} &=& \mupd{\mupd{\mempty}{f(c_1)}{s}}{f(c_2)}{s} \\
  \absstep{s}{\while{e}{c_1}}{f} &=& \mupd{\mupd{\mempty}{f(\skipe)}{s}}{f(c_1; \while{e}{c_1})}{s}
\end{eqnarray*}

Note that the last two cases, for conditional control flow, ignore the test expression entirely, which is certainly sound, though it may lead to imprecision in the analysis.
This approximation is known as \emph{path insensitivity}\index{path-insensitive analysis}.
Define $\absstepo{s}{c}$ as shorthand for $\absstep{s}{c}{\lambda c_1. \; c_1}$.

Now we can define a new abstract step relation.
$$\infer{(s, c) \to_\mathsf{FS} (s', c')}{
  \absstepo{s}{c}(c') = s'
}$$

That is, we step from $(s, c)$ to $(s', c')$ precisely when, if we look up $c'$ in the result of running $c$ abstractly in $s$, we find $s'$.

Now we can follow an analogous path to the one we did in the last section.

\begin{theorem}\label{flow_sensitive_abstraction}
  \abstraction
  Given command $c$ and initial valuation $v$.  The transition system with initial state $(s, c)$ and step relation $\to_\mathsf{FS}$ simulates the system with initial state $(v, c)$ and step relation $\to$, according to a simulation relation enforcing equality of the commands, as well as $\sim$ between the valuation and abstract state.
\end{theorem}

Now another simple procedure can find an invariant for the abstracted system.
We write $S \join S'$ for joining of two flow-sensitive abstract states.
When $c$ is in the domain of exactly one of $S$ or $S'$, $S \join S'$ agrees with the corresponding mapping.
When $c$ is in neither domain, it isn't in the domain of $S \join S'$ either.
Finally, when $c$ is in both domains, we have $(S \join S')(c) = S(c) \join S'(c)$.

Also define $S \sqsubseteq S'$ to mean that, whenever $S(c) = s$, there exists $s'$ such that $S'(c) = s'$ and $s \sqsubseteq s'$.

Now our procedure works as follows.

\begin{enumerate}
\item Initialize $S = \mupd{\mempty}{c}{\lambda x. \; \top}$.
\item \label{flow_sensitive_loop}Compute $S' = S \join \bigsqcup_{S(c) = s} \absstepo{s}{c}$.
\item If $S' \sqsubseteq S$, then we're done; $S$ is the invariant.
\item Otherwise, assign $S = S'$ and return to \ref{flow_sensitive_loop}.
\end{enumerate}

Again, every step in this outline is computable, for the same reason as in the prior section.

\begin{theorem}\label{flow_sensitive_iteration}
  \invariants
  If the outline above terminates, then it is an invariant of the flow-sensitive abstracted system that, for reachable $(s, c)$, we have $S(c) = s'$ for some $s'$ with $s \sqsubseteq s'$.
\end{theorem}

Again, the last two theorems together give us a recipe for computing an invariant automatically, when the loop terminates.
The flow-sensitive procedure is guaranteed to give an invariant at least as strong as what the flow-insensitive procedure would come up with, and often it's much stronger.
However, flow-sensitive analysis is often much more computationally expensive (in time and memory), so there is a trade-off.


\section{Widening}

Consider an abstract interpretation of \emph{intervals}\index{interval analysis}, where each elements of the domain is either $[a, b]$ or $[a, \infty)$, for $a, b \in \mathbb N$.
Restricting our attention to $a$ and $b$ values between 0 and 1 for illustration purposes, we have this diagram of the domain, where the bottom element represents an empty set.

\begin{center}\begin{tikzpicture}[node distance=1.5cm]
\node(top)                             {$[0, \infty)$};
\node(zeroone)    [below left of=top]  {$[0, 1]$};
\node(oneinf)     [below right of=top] {$[1, \infty)$};
\node(zero)       [below of=zeroone]   {$[0, 0]$};
\node(one)        [below of=oneinf]    {$[1, 1]$};
\node(emp)        [below right of=zero]{$[1, 0]$};

\draw(top)       -- (zeroone);
\draw(top)       -- (oneinf);
\draw(zeroone)   -- (zero);
\draw(zeroone)   -- (one);
\draw(oneinf)    -- (one);
\draw(zero)      -- (emp);
\draw(one)       -- (emp);
\end{tikzpicture}\end{center}

The abstract operators have intuitive and simple definitions, like, flattening the different kinds of intervals into a common notation, defining $(a_1, b_1) \join (a_2, b_2) = (\min(a_1, a_2), \max(b_1, b_2))$ and $(a_1, b_1) \hat{+} (a_2, b_2) = (a_1 + a_2, b_1 + b_2)$, with usual conventions about what it means to do arithmetic with $\infty$.

Again, the lattice diagram above was simplified to cover only 0 and 1 as legal constant values.
We can define the interval lattice to draw from the full, infinite set of natural numbers.
In that case, we can quickly run into trouble with abstract interpretation.
For instance, consider this infinite-looping program:
$$\assign{\mathsf{a}}{7}; \while{\mathsf{a}}{\assign{\mathsf{a}}{\mathsf{a} + 3}}$$

One (flow-insensitive) invariant is that $\mathsf{a} \geq 7$, represented as the abstract state $\mupd{\mempty}{\mathsf{a}}{[7, \infty)}$.
However, even the flow-sensitive analysis will keep growing the range of $\mathsf{a}$, as it traverses the loop over and over!
We see $\mathsf{a}$ initialized to $[7, 7]$, then grown to $[7, 10]$ after one loop iteration, then to $[7, 13]$ after another, and so on indefinitely.

Notice that we wrote before that termination is guaranteed when the lattice has finite height, which we have just demonstrated is not true for general intervals, as our example program generates an infinite ascending chain of distinct intervals.

\newcommand{\widen}[0]{\triangledown}

The canonical solution to this problem is to employ a \emph{widening}\index{widening} operator $\widen$.
This operator has the same soundness requirements as $\join$, but we do not require that it gives the \emph{least} upper bound of its two operands.
It merely needs to give some upper bound.
In fact, we don't want it to give least upper bounds; we want it to \emph{skip ahead} in that ordering as necessary to promote termination.
In general, we don't want to replace all uses of $\join$ with $\widen$, though it is sound to do so.
We might apply $\widen$ in place of $\join$ only for commands that are the beginnings of loops, for instance, to guarantee that no infinite path in the program avoids infinitely many encounters with $\widen$ to tame infinite ascending chains.

For intervals, when we are working with programs that we fear will keep increasing variables indefinitely through loops, a simple form of widening is defined as follows.
Set $(a_1, b_1) \widen (a_2, b_2) = (a_1, b_1) \join (a_2, b_2)$ when $b_2 \leq b_1$, that is, when the upper bound of the interval hasn't increased since the last iteration.
Otherwise, set $(a_1, b_1) \widen (a_2, b_2) = (\min(a_1, a_2), \infty)$.
In other words, when an interval expands to include higher values, fast-forward its upper bound to $\infty$.

With this modification, analysis of our tricky example successfully finds the invariant $\mathsf{a} \geq 7$.
In fact, flow-insensitive and flow-sensitive interval analysis with this widening operator applied at loop starts are guaranteed to terminate, for any input programs.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{\label{compiler_correctness}Compiler Correctness via Simulation Arguments}

\newcommand{\outp}[1]{\mathsf{out}(#1)}

A good application of operational semantics is correctness of compiler transformations\index{compilers}.
A compiler is composed of a series of \emph{phases}\index{compiler phase}, each of which translates programs in some \emph{source} language\index{source language} into some \emph{target} language\index{target language}.
Usually, in most phases of a compiler, the source and target languages are the same, and such phases are often viewed as \emph{optimizations}\index{optimization}\index{compiler optimization}, which tend to improve performance of most programs in practice.
The verification problem is plenty hard enough when the source and target languages are the same, so we will confine our attention in this chapter to a single language.
It's almost the same as the imperative language from the last two chapters, but we add one new syntactic construction, underlined below.
$$\begin{array}{rrcl}
  \textrm{Numbers} & n &\in& \mathbb N \\
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Expressions} & e &::=& n \mid x \mid e + e \mid e - e \mid e \times e \\
  \textrm{Commands} & c &::=& \skipe \mid \assign{x}{e} \mid c; c \mid \ifte{e}{c}{c} \mid \while{e}{c} \mid \underline{\outp{e}}
\end{array}$$

A command $\outp{e}$ outputs\index{output} the value of expression $e$, say by writing it to a terminal window.
What's interesting about adding output is that now \emph{different nonterminating\index{nontermination} programs have interestingly different behavior}: they may produce different output sequences, finite or infinite.
Any compiler phase should leave output behavior intact.
It's worth noticing that our workhorse technique of invariants can't help us here directly.
Output equivalence can only be judged by watching full runs of programs.
A nonterminating program that has behaved itself up to some point, satisfying the invariant of our choice, may still fail to follow through later on.
While invariants are complete for \emph{safety} properties\index{safety properties}, here we have our first systematic study of a class of \emph{liveness} properties\index{liveness properties}.
We must also delve into establishing \emph{relational} properties\index{relational properties} of programs, meaning that we reason about connections between executions of two different programs.
In our case, such a pair will include the program fed as input into a phase, plus the program that the phase generates.

\newcommand{\silent}[0]{\epsilon}
\newcommand{\smallstepol}[3]{#1 \stackrel{#2}{\to_0} #3}
\newcommand{\smallstepcl}[3]{#1 \stackrel{#2}{\to_\mathsf{c}} #3}

To get started phrasing the correctness condition formally, we need to modify our operational semantics to track output.
We do so by adopting a \emph{labeled transition system}\index{labeled transition system}, where step arrows are annotated with \emph{labels} that explain interactions with the world.
For this language, the only interaction kind is an output, which we will write as a number.
We also have \emph{silent}\index{silent steps} labels $\silent$, for when no output takes place.
For completeness, here are the full rules of the extended language, where the definitions of contexts and plugging are inherited unchanged.

$$\infer{\smallstepol{(v, \outp{e})}{\denote{e}v}{(v, \skipe)}}{}$$
$$\infer{\smallstepol{(v, \assign{x}{e})}{\silent}{(\mupd{v}{x}{\denote{e}v}, \skipe)}}{}
\quad \infer{\smallstepol{(v, \skipe; c_2)}{\silent}{(v, c_2)}}{}$$
$$\infer{\smallstepol{(v, \ifte{e}{c_1}{c_2})}{\silent}{(v, c_1)}}{
  \denote{e}v \neq 0
}
\quad \infer{\smallstepol{(v, \ifte{e}{c_1}{c_2})}{\silent}{(v, c_2)}}{
  \denote{e}v = 0
}$$
$$\infer{\smallstepol{(v, \while{e}{c_1})}{\silent}{(v, c_1; \while{e}{c_1})}}{
  \denote{e}v \neq 0
}
\quad \infer{\smallstepol{(v, \while{e}{c_1})}{\silent}{(v, \skipe)}}{
  \denote{e}v = 0
}$$

$$\infer{\smallstepcl{(v, C[c])}{\ell}{(v', C[c'])}}{
  \smallstepol{(v, c)}{\ell}{(v', c')}
}$$

\newcommand{\Tr}[1]{\mathsf{Tr}(#1)}
\newcommand{\terminate}[0]{\mathsf{terminate}}

To reason about infinite executions, we need a new abstraction, compared to what has worked in our invariant-based proofs so far.
That abstraction will be \emph{traces}\index{traces}, sequences of outputs (and termination events) that a program might be observed to generate.
We define a command's trace set inductively.
Recall that $\cdot$ is the empty list, while $\bowtie$ does list concatenation.
$$\infer{\cdot \in \Tr{s}}{}
\quad \infer{\terminate \in \Tr{(v, \skipe)}}{}
\quad \infer{t \in \Tr{s}}{
  \smallstepcl{s}{\silent}{s'}
  & t \in \Tr{s'}
}
\quad \infer{\concat{\outp{n}}{t} \in \Tr{s}}{
  \smallstepcl{s}{n}{s'}
  & t \in \Tr{s'}
}$$

Notice that a trace is allowed to end at any point, even if the program under inspection hasn't terminated yet.
Also, since our language is deterministic\index{determinism}, for any two traces of one command, one trace is a prefix of the other.
Many parts of the machinery we develop here will, however, work well for nondeterministic systems, as we will see with labeled transition systems for concurrency in Chapter \ref{process_algebra}.

\newcommand{\trinc}[2]{#1 \preceq #2}
\newcommand{\treq}[2]{#1 \simeq #2}

\begin{definition}[Trace inclusion]
  \index{trace inclusion}For commands $c_1$ and $c_2$, let $\trinc{c_1}{c_2}$ iff $\Tr{c_1} \subseteq \Tr{c_2}$.
\end{definition}

\begin{definition}[Trace equivalence]
  \index{trace equivalence}For commands $c_1$ and $c_2$, let $\treq{c_1}{c_2}$ iff $\Tr{c_1} = \Tr{c_2}$.
\end{definition}

We will enforce that a correct compiler phase respects trace equivalence.
That is, the output program has the same traces as the input program.
For nondeterministic languages, subtler conditions are called for, but we're happy to stay within the safe confines of determinism for this chapter.


\section{Basic Simulation Arguments and Optimizing Expressions}

\newcommand{\cfold}[1]{\mathsf{cfold}_1(#1)}

As our first example compiler phase, we consider a limited form of \emph{constant folding}\index{constant folding}, where expressions with statically known values are replaced by constants.
The whole of the optimization is (1) finding all maximal program subexpressions that don't contain variables and (2) replacing each such subexpression with its known constant value.
We write $\cfold{c}$ for the result of applying this optimization on command $c$.
(For the program transformations in this chapter, we stick to informal descriptions of how they operate, leaving the details to the accompanying Coq code.)

A program optimized in this way proceeds in a very regular manner, compared to executions of the original, unoptimized program.
The small steps line up one-to-one.
Therefore, a very regular kind of \emph{simulation relation} connects them.
(This notion is very similar to the one from Section \ref{trs_simulation}, though now it incorporates labels.)

\begin{definition}[Simulation relation]
  We say that binary relation $R$ over states of our object language is a \emph{simulation relation} iff:
  \begin{enumerate}
    \item Whenever $(v_1, \skipe) \; R \; (v_2, c_2)$, it follows that $c_2 = \skipe$.
    \item Whenever $s_1 \; R \; s_2$ and $\smallstepcl{s_1}{\ell}{s'_1}$, there exists $s'_2$ such that $\smallstepcl{s_2}{\ell}{s'_2}$ and $s'_1 \; R \; s'_2$. 
  \end{enumerate}
\end{definition}

The crucial second condition can be drawn like this.
\[
\begin{tikzcd}
s_1 \arrow{r}{R} \arrow{d}{\forall \stackrel{\ell}{\to_{\mathsf{c}}}} & s_2 \arrow{d}{\exists \stackrel{\ell}{\to_{\mathsf{c}}}} \\
s'_1 & s'_2 \arrow{l}{R^{-1}}
\end{tikzcd}
\]

\invariants
As usual, the diagram tells us that when a path along the left exists, a matching roundabout path exists, too.
That is, any step on the left can be matched by a step on the right.
Notice the similarity to the invariant-induction principle that we have mostly relied on so far.
Instead of showing that every step preserves a one-state predicate, we show that every step preserves a two-state predicate in a particular way.
The simulation approach is as general for relating programs as the invariant approach is for verifying individual programs.

\begin{theorem}
  \label{simulation_ok}If there exists a simulation $R$ such that $s_1 \; R \; s_2$, then $\treq{s_1}{s_2}$.
\end{theorem}
\begin{proof}
  We prove the two trace-inclusion directions separately.
  The left-to-right direction proceeds by induction over the definition of traces on the left, while the right-to-left direction proceeds by similar induction on the right.
  While most of the proof is generic in details of the labeled transition system, for the right-to-left direction we do rely on proofs of two important properties of this object language.
  First, the semantics is \emph{total}, in the sense that any state whose command isn't $\skipe$ can take a step.
  Second, the semantics is \emph{deterministic}, in that there can be at most one label/state pair reachable in one step from a particular starting state.

  In the inductive step of the right-to-left inclusion proof, we know that the righthand system has taken a step.
  The lefthand system might already be a $\skipe$, in which case, by the definition of simulations, the righthand system is already a $\skipe$, contradicting the assumption that the righthand side stepped.
  Otherwise, by totality, the lefthand system can take a step.
  By the definition of simulation, there exists a matching step on the righthand side.
  By determinism, the matching step is the same as the one we were already aware of.
  Therefore, we have a new $R$ relationship to connect to that step and apply the induction hypothesis.
\end{proof}

We can apply this very general principle to constant folding.

\begin{theorem}
  \label{cfold_ok}For any $v$ and $c$, $\treq{(v, c)}{(v, \cfold{c})}$.
\end{theorem}
\begin{proof}
  By a simulation argument using this relation:
  \begin{eqnarray*}
    (v_1, c_1) \; R \; (v_2, c_2) &=& v_1 = v_2 \land c_2 = \cfold{c_1}
  \end{eqnarray*}
  What we have done is translate the original theorem statement into the language of binary relations, as this simple case needs no equivalent of strengthening the induction hypothesis.
  Internally to the proof, we need to define constant folding of evaluation contexts $C$, and we need to prove that primitive steps $\to_0$ may be lifted to apply over constant-folded states, this second proof by case analysis on $\to_0$ derivations.
  Another more obvious workhorse is a lemma showing that constant folding of expressions preserves interpretation results.
\end{proof}


\section{Simulations That Allow Skipping Steps}

\newcommand{\cfoldt}[1]{\mathsf{cfold}_2(#1)}

Consider an evolution of our constant-folding optimization to take advantage of known values of $\mathsf{if}$ test expressions.
Depending on whether the value is zero, we can replace the whole $\mathsf{if}$ with one of its two cases.
We will write $\cfoldt{c}$ for this expanded optimization and work up to proving it sound, too.
However, we can no longer use last section's definition of simulation!
The reason is that optimizations intentionally cut down on steps that a program needs to execute.
Some steps of the source program now have no matching steps of the target program, say when we are stepping an $\mathsf{if}$ whose test expression had a known value.

Let's take a first crack at making simulation more flexible.

\begin{definition}[Simulation relation with skipping (\emph{faulty} version!)]
  We say that binary relation $R$ over states of our object language is a \emph{simulation relation with skipping} iff:
  \begin{enumerate}
    \item Whenever $(v_1, \skipe) \; R \; (v_2, c_2)$, it follows that $c_2 = \skipe$.
    \item Whenever $s_1 \; R \; s_2$ and $\smallstepcl{s_1}{\ell}{s'_1}$, then either:
      \begin{enumerate}
        \item there exists $s'_2$ such that $\smallstepcl{s_2}{\ell}{s'_2}$ and $s'_1 \; R \; s'_2$,
        \item or $\ell = \silent$ and $s'_1 \; R \; s_2$.
      \end{enumerate}
  \end{enumerate}
\end{definition}

In other words, to match a silent step, it suffices to do nothing, so long as $R$ still holds afterward.

\newcommand{\addad}[1]{\mathsf{withAds}(#1)}

We didn't mark the definition as \emph{faulty} for nothing.
It actually does not imply trace equivalence.
Consider a questionable ``optimization'' defined as $\addad{\while{1}{\skipe}} = \while{1}{\outp{0}}$, and $\addad{c} = c$ for all other $c$.
It adds a little extra advertisement into a particular infinite loop.
Now we define a candidate simulation relation.
\begin{eqnarray*}
  (v_1, c_1) \; R \; (v_2, c_2) &=& c_1 \in \{\while{1}{\skipe}, (\skipe; \while{1}{\skipe})\}
\end{eqnarray*}
This suspicious relation records nothing about $c_2$.
The $\skipe$ condition of simulations is handled trivially, as we can see by inspection that $R$ does not allow $c_1$ to be $\skipe$.
Checking the execution-matching condition of simulations, $c_1$ is either $\while{1}{\skipe}$ or $(\skipe; \while{1}{\skipe})$, each of which steps silently to the other.
We may match either step by keeping $c_2$ in place, as $R$ does not constrain $c_2$ at all.
Thus, $R$ is a simulation relation with skipping, and, for $c = \while{1}{\skipe}$, it relates $c$ to $\addad{c}$.

From here we expect to conclude trace equivalence.
However, clearly $\mathsf{withAds}$ can turn a program that never outputs into a program that outputs infinitely often!

Let's patch our definition.

\begin{definition}[Simulation relation with skipping]
  We say that an $\mathbb N$-indexed family of binary relations $R_n$ over states of our object language is a \emph{simulation relation with skipping} iff:
  \begin{enumerate}
    \item Whenever $(v_1, \skipe) \; R_n \; (v_2, c_2)$, it follows that $c_2 = \skipe$.
    \item Whenever $s_1 \; R_n \; s_2$ and $\smallstepcl{s_1}{\ell}{s'_1}$, then either:
      \begin{enumerate}
        \item there exist $n'$ and $s'_2$ such that $\smallstepcl{s_2}{\ell}{s'_2}$ and $s'_1 \; R_{n'} \; s'_2$,
        \item or $n > 0$, $\ell = \silent$, and $s'_1 \; R_{n-1} \; s_2$.
      \end{enumerate}
  \end{enumerate}
\end{definition}

This new version imposes a finite limit $n$ at any point, on how many times the righthand side may match lefthand steps without stepping itself.
Our bad counterexample fails to satisfy the conditions, because eventually the starting step count $n$ will be used up, and the incorrect ``optimized'' program will be forced to reveal itself by taking a step that outputs.

\begin{theorem}
  If there exists a simulation with skipping $R$ such that $s_1 \; R_n \; s_2$, then $\treq{s_1}{s_2}$.
\end{theorem}
\begin{proof}
  The proof is fairly similar to that of Theorem \ref{simulation_ok}.
  To show termination preservation in the backward direction, we find ourselves proving a lemma by induction on $n$.
\end{proof}

\newcommand{\countIfs}[1]{\mathsf{countIfs}(#1)}

\begin{theorem}
  For any $v$ and $c$, $\treq{(v, c)}{(v, \cfoldt{c})}$.
\end{theorem}
\begin{proof}
  By a simulation argument (with skipping) using this relation:
  \begin{eqnarray*}
    (v_1, c_1) \; R_n \; (v_2, c_2) &=& v_1 = v_2 \land c_2 = \cfoldt{c_1} \land \countIfs{c_1} < n
  \end{eqnarray*}
  We rely on a simple helper function $\countIfs{c}$ to count how many $\mathsf{If}$ nodes appear in the syntax of $c$.
  This notion turns out to be a conservative upper bound on how many times in a row we will need to let lefthand steps go unmatched on the right.
  The rest of the proof proceeds essentially the same way as in Theorem \ref{cfold_ok}.
\end{proof}


\section{Simulations That Allow Taking Multiple Matching Steps}

\newcommand{\flatten}[1]{\mathsf{flatten}(#1)}
\newcommand{\smallstepcls}[3]{#1 \stackrel{#2}{\to_\mathsf{c}}^* #3}

Consider our final example compiler phase: flattening\index{flattening} expressions into sequences of assignments to temporaries, using only noncompound subexpressions, where the arguments to every binary operator are variables or constants.
Now a single step at the source level must be matched by many steps at the target level.
We write $\flatten{c}$ for the flattening of command $c$.
How can we prove that this transformation is correct?

\begin{definition}[Simulation relation with multiple matching steps]
  We say that a binary relation $R$ over states of our object language is a \emph{simulation relation with multiple matching steps} iff:
  \begin{enumerate}
    \item Whenever $(v_1, \skipe) \; R \; (v_2, c_2)$, it follows that $c_2 = \skipe$.
    \item Whenever $s_1 \; R \; s_2$ and $\smallstepcl{s_1}{\ell}{s'_1}$, there exists $s'_2$ such that $\smallstepcls{s_2}{\ell}{s'_2}$ and $s'_1 \; R \; s'_2$.
  \end{enumerate}
\end{definition}

We write $\smallstepcls{s}{\ell}{s'}$ to indicate that $s$ steps to $s'$ via zero or more silent steps and then one step with label $\ell$ (which might also be silent).

\begin{theorem}
  If there exists a simulation with multiple matching steps $R$ such that $s_1 \; R \; s_2$, then $\treq{s_1}{s_2}$.
\end{theorem}
\begin{proof}
  The backward direction is the interesting part of this proof.
  The key lemma proceeds by strong induction on the number of steps needed to generate the trace on the right.
\end{proof}

\begin{theorem}
  For any $v$ and $c$ where $c$ doesn't use any names that are reserved for temporaries, $\treq{(v, c)}{(v, \flatten{c})}$.
\end{theorem}
\begin{proof}
  By a simulation argument (with multiple matching steps) using this relation:
  \begin{eqnarray*}
    (v_1, c_1) \; R \; (v_2, c_2) &=& \textrm{$c_1$ doesn't use any names reserved for temporaries} \\
    && \land \; v_1 \cong v_2 \land c_2 = \flatten{c_1}
  \end{eqnarray*}
  The heart of this relation is a subrelation $\cong$ over valuations, capturing when they agree on all variables that are not reserved for temporaries, since the flattened program will feel free to scribble all over the temporaries.
  The details of $\cong$ are especially important to the key lemma, showing that flattening of expressions is sound, both taking in a $\cong$ premise and drawing a related $\cong$ conclusion.
  The overall proof is not short, with quite a few lemmas, found in the Coq code.
\end{proof}

\medskip

It might not be clear why we bothered to define simulation with multiple matching steps, when we already had simulation with skipping.
After all, we use simulation to conclude completely symmetric facts about two commands, so why not just verify this section's example by applying simulation with skipping, with the operand order reversed?

Consider the heart of the proof approach that we \emph{did} adopt.
We need to show that any step of $c$ can be matched suitably by $\flatten{c}$.
The proof is divided into cases by inversion on a premise $\smallstepcl{(v, c)}{\ell}{(v', c')}$.
Each case naturally fixes the top-level structure of $c$, from which we can apply straightforward algebraic simplification to find the top-level structure of $\flatten{c}$ and therefore the step rules that apply to it.

Now consider applying simulation with skipping, with the commands passed as operands in the reverse order.
The crucial inversion is on $\smallstepcl{(v, \flatten{c})}{\ell}{(v', c')}$.
Unfortunately, the top-level structure of $\flatten{c}$ does not imply the top-level structure of $c$, but we need to show that $c$ can take a matching step.
We need to prove a whole set of bothersome special-case inversion lemmas by induction, essentially to invert the action of what is, in the general case, an arbitrarily complex compiler.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Lambda Calculus and Simple Type Safety}\label{types}

We'll now take a break from the imperative language we've been studying for the last three chapters, instead looking at a classic sort of small language that distills the essence of \emph{functional} programming\index{functional programming}.
That's the language paradigm that we've been using throughout this book, as we coded executable versions of algorithms.
Its distinctive characteristics are first, a computation style based on simplifying terms instead of running step-by-step instructions that modify state; and second, use of functions as first-class values.
Functional programming went mainstream in the early 21st century, influencing widely adopted languages from JavaScript\index{JavaScript}, where first-class functions are routinely used as callbacks in asynchronous event processing; to Scala\index{Scala}, a hybrid language that melds functional-programming ideas with object-oriented programming for the Java platform; to Haskell\index{Haskell}, a purely functional language that has become popular with programming hobbyists and is seeing increasing adoption in industry.

The heart of functional programming persists even in \emph{$\lambda$-calculus}\index{$\lambda$-calculus} (or lambda calculus\index{lambda calculus}), the simplest version of which contains just three syntactic forms, but which provides probably the simplest of the widely known Turing-complete languages that is (nearly!) pleasant to program in directly.


\section{Untyped Lambda Calculus}

Here is the syntax of the original $\lambda$-calculus.
$$\begin{array}{rrcl}
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Expressions} & e &::=& x \mid \lambda x. \; e \mid e \; e
\end{array}$$

An expression $\lambda x. \; e$\index{$\lambda$ expression} is a first-class, anonymous function, also called a \emph{function abstraction}\index{function abstraction} or \emph{$\lambda$-abstraction}\index{$\lambda$-abstraction}.
When called, it replaces its formal-argument variable $x$ with the actual argument within $e$ and continues evaluating.
The third syntactic form $e \; e$ uses \emph{juxtaposition}\index{juxtaposition}, or writing one term after another, for function application.

A simple example of an expression is $\lambda x. \; x$, for an identity function.
When we apply it to itself, like $(\lambda x. \; x) \; (\lambda x. \; x)$, it reduces again to itself.

\newcommand{\fv}[1]{\textsf{FV}(#1)}

We can give a simple big-step operational semantics to $\lambda$-terms.
The key auxiliary operation is \emph{substitution}\index{substitution}, where we write $\subst{e}{x}{e'}$ for replacing all \emph{free} occurrences of $x$ in $e$ with $e'$.
Here we refer to a notion of \emph{free variables}\index{free variables}, which we should define first, as a recursive function.
\begin{eqnarray*}
  \fv{x} &=& \{x\} \\
  \fv{\lambda x. \; e} &=& \fv{e} - \{x\} \\
  \fv{e_1 \; e_2} &=& \fv{e_1} \cup \fv{e_2}
\end{eqnarray*}
Intuitively, a variable is free in an expression iff it doesn't occur inside the scope of a $\lambda$ binding the same variable.

Next we define substitution.
\begin{eqnarray*}
  \subst{x}{x}{e'} &=& e' \\
  \subst{y}{x}{e'} &=& y\textrm{, if $y \neq x$} \\
  \subst{\lambda x. \; e}{x}{e'} &=& \lambda x. \; e \\
  \subst{\lambda y. \; e}{x}{e'} &=& \lambda y. \; \subst{e}{x}{e'}\textrm{, if $y \neq x$} \\
  \subst{e_1 \; e_2}{x}{e'} &=& \subst{e_1}{x}{e'} \; \subst{e_2}{x}{e'}
\end{eqnarray*}

Notice a peculiar property of this definition when we work with \emph{open} terms\index{open terms}, whose free-variable sets are nonempty.
According to the definition $\subst{\lambda x. \; y}{y}{x} = \lambda x. \; x$.
In this example, we say that $\lambda$-bound variable $x$ has been \emph{captured}\index{variable capture} unintentionally, where substitution created a reference to that $\lambda$ where none existed before.
Such a problem can only arise when replacing a variable with an open term.
In this case, that term is $x$, where $\fv{x} = \{x\} \neq \emptyset$.

More general investigations into $\lambda$-calculus will define a more involved notion of \emph{capture-avoiding} substitution\index{capture-avoiding substitution}.
Instead, in this book, we carefully steer clear of the $\lambda$-calculus applications that require substituting open terms for variables, letting us stick with the simpler definition.
When it comes to formal encoding of this style of syntax in proof assistants, surprisingly many complications arise, leading to what is still an active research area in encodings of language syntax with local variable binding\index{variable binding}.
Since we aim more for broad than deep coverage of the field of formal program reasoning, we are happy to avoid those complexities.

With substitution in hand, a big-step semantics\index{big-step semantics} is easy to define.
We use the syntactic shorthand $v$ for a \emph{value}\index{value}, or term that needs no further evaluation, which in this case includes just the $\lambda$-abstractions.
\encoding
$$\infer{\bigstep{\lambda x. \; e}{\lambda x. \; e}}{}
\quad \infer{\bigstep{e_1 \; e_2}{v'}}{
  \bigstep{e_1}{\lambda x. \; e}
  & \bigstep{e_2}{v}
  & \bigstep{\subst{e}{x}{v}}{v'}
}$$

A value evaluates to itself.
To evaluate an application, evaluate both the function and the argument.
The function value must be some $\lambda$-abstraction.
Substitute the argument value in the body of the abstraction, evaluate the result, and return that value as the overall value.
Note that we only ever need to evaluate \emph{closed} terms\index{closed terms}, meaning terms that are not open, so we obey the restriction on substitution sketched above.

It may be surprising that these two rules are enough to define the full semantics of a Turing-complete language!
Indeed, $\lambda$-calculus is Turing-complete, and we must be able to find nonterminating programs.
Here is one example.
\begin{eqnarray*}
  \Omega &=& (\lambda x. \; x \; x) \; (\lambda x. \; x \; x) \\
\end{eqnarray*}
\begin{theorem}
  $\Omega$ does not evaluate to anything.  In other words, $\bigstep{\Omega}{v}$ implies a contradiction.
\end{theorem}
\begin{proof}
  By induction on the derivation of $\bigstep{\Omega}{v}$.
\end{proof}


\section{A Quick Case Study in Program Verification: Church Numerals}

\newcommand{\church}[1]{\underline{#1}}

Since $\lambda$-calculus is Turing-complete, it must be able to represent numbers and all the usual arithmetic operations.
The classic representation is \emph{Church numerals}\index{Church numerals}, where every natural number $n$ is represented as a particular $\lambda$-term $\church{n}$ that, when passed a function $f$ as input, returns $f^n$, the $n$-way self-composition of $f$.
In some sense, repeating a process is the fundamental use of a natural number, and it turns out that we can recover all of the usual operations atop this primitive.

\newcommand{\lc}[1]{\mathsf{#1}}

Two $\lambda$-calculus functions are sufficient to build up all the naturals as Church numerals.
\begin{eqnarray*}
  \lc{zero} &=& \lambda f. \; \lambda x. \; x \\
  \lc{plus1} &=& \lambda n. \; \lambda f. \; \lambda x. \; f \; (n \; f \; x)
\end{eqnarray*}
Our representation of 0 returns an identity function, no matter which $f$ it is passed.
Our successor operation takes in a number $n$ and returns a new one that first runs $n$ and then applies $f$ one extra time.
Now we have $\church{0} = \lc{zero}$, $\church{1} = \lc{plus1} \; \lc{zero}$, $\church{2} = \lc{plus1} \; (\lc{plus1} \; \lc{zero})$, and so on.

\newcommand{\prechurch}[1]{\left \lfloor #1 \right \rfloor}

These Church numerals are not values yet.
Let us formalize which values they evaluate to and tweak the encoding to use the values instead.
We write $\prechurch{n}$ for the body of a $\lambda$-abstraction that we are building to represent $n$, where variables $f$ and $x$ are in scope.
\begin{eqnarray*}
  \prechurch{0} &=& x \\
  \prechurch{n+1} &=& f \; ((\lambda f. \; \lambda x. \; \prechurch{n}) \; f \; x)
\end{eqnarray*}
The $n+1$ case may seem wastefully large, but, in fact, this is the precise form of the values produced by evaluating repeated applications of $\lc{plus1}$ to $\lc{zero}$, as the reader can verify using the big-step semantics.
We define $\church{n} = \lambda f. \; \lambda x. \; \prechurch{n}$, giving a canonical encoding for each number.

Now we notate correctness of an encoding $e$ for number $n$ by $e \sim n$, defining it as $\bigstep{e}{\church{n}}$, meaning that $e$ evaluates to the Church encoding of $n$.
Two first easy results show that our primitive constructors are correct.

\begin{theorem}
  $\lc{zero} \sim 0$.
\end{theorem}

\begin{theorem}
  If $e_n \sim n$, then $\lc{plus1} \; e_n \sim n+1$.
\end{theorem}

Things get more interesting as we start to code up the arithmetic operations.
\begin{eqnarray*}
  \lc{add} &=& \lambda n. \; \lambda m. \; n \; \lc{plus1} \; m
\end{eqnarray*}

That is, addition of $n$ to $m$ is calculated by applying $n$ $\lc{plus1}$ operations to $m$.

\begin{theorem}\label{church_add}
  If $e_n \sim n$ and $e_m \sim m$, then $\lc{add} \; e_n \; e_m \sim n + m$.
\end{theorem}
\begin{proof}
  After a few steps applying the big-step rules directly, we finish by induction on $n$.
  A silly-seeming but necessary lemma proves that $\subst{\prechurch{n}}{m}{e} = \prechurch{n}$, since $\prechurch{n}$ does not contain free occurrences of $m$.
\end{proof}

Multiplication proceeds in much the same way.
\begin{eqnarray*}
  \lc{mult} &=& \lambda n. \; \lambda m. \; n \; (\lc{add} \; m) \; \lc{zero}
\end{eqnarray*}

\begin{theorem}
  If $e_n \sim n$ and $e_m \sim m$, then $\lc{mult} \; e_n \; e_m \sim n \times m$.
\end{theorem}
\begin{proof}
  After a few steps applying the big-step rules directly, we finish by induction on $n$, within which we appeal to Theorem \ref{church_add}.
\end{proof}

An enjoyable (though not entirely trivial) exercise for the reader is to generalize the methods of Church encoding to encoding of other inductive datatypes, including the syntax of $\lambda$-calculus itself.
A hallmark of a Turing-complete language is that it can host an interpreter for itself, and $\lambda$-calculus is no exception!


\section{Small-Step Semantics}

$\lambda$-calculus is also straightforward to formalize with a small-step semantics\index{small-step operational semantics}.
One might argue that the technique is even simpler for $\lambda$-calculus, since we must deal only with expressions, not also imperative variable valuations.

\encoding
The crucial rule is for \emph{$\beta$-reduction}\index{$\beta$-reduction}, which explains the behavior of function applications in terms of substitution.
$$\infer{\smallstep{(\lambda x. \; e) \; v}{\subst{e}{x}{v}}}{}$$
Note the one subtlety: the function argument is required to be a \emph{value}.
This innocuous-looking restriction helps enforce \emph{call-by-value evaluation order}\index{call-by-value}, where, upon encountering a function application, we must first evaluate the function, then evaluate the argument, and only then call the function.

Two more rules complete the semantics and the characterization of call-by-value.
$$\infer{\smallstep{e_1 \; e_2}{e'_1 \; e_2}}{
  \smallstep{e_1}{e'_1}
}
\quad \infer{\smallstep{v \; e_2}{v \; e'_2}}{
  \smallstep{e_2}{e'_2}
}$$
Note again from the second rule that we are only allowed to move on to evaluating the function argument after the function is fully evaluated.

Following a very similar outline to what we used in Chapter \ref{operational_semantics}, we establish equivalence between the two semantics for $\lambda$-calculus.

\begin{theorem}
  If $\smallsteps{e}{v}$, then $\bigstep{e}{v}$.
\end{theorem}

\begin{theorem}
  If $\bigstep{e}{v}$, then $\smallsteps{e}{v}$.
\end{theorem}

There are a few proof subtleties beyond what we encountered before, and the Coq formalization may be worth reading, to see those details.

Again as before, we have a natural way to build a transition system from any $\lambda$-term $e$, where $\mathcal L$ is the set of $\lambda$-terms.
We define $\mathbb T(e) = \angled{\mathcal L, \{e\}, \to}$.
The next section gives probably the most celebrated $\lambda$-calculus result based on the transition-system perspective.


\section{Simple Types and Their Soundness}

Let's spruce up the language with some more constructs.
$$\begin{array}{rrcl}
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Numbers} & n &\in& \mathbb N \\
  \textrm{Expressions} & e &::=& n \mid e + e \mid x \mid \lambda x. \; e \mid e \; e \\
  \textrm{Values} & v &::=& n \mid \lambda x. \; e
\end{array}$$
We've added natural numbers as a primitive feature, supported via constants and addition.
Numbers may be intermixed with functions, and we may, for instance, write first-class functions that take numbers as input or return numbers.

We add two new bureaucratic rules for addition, mirroring those for function application.
$$\infer{\smallstep{e_1 + e_2}{e'_1 + e_2}}{
  \smallstep{e_1}{e'_1}
}
\quad \infer{\smallstep{v + e_2}{v + e'_2}}{
  \smallstep{e_2}{e'_2}
}$$

One more rule for addition is called for.
Here we face a classic nuisance in writing rules that combine explicit syntax with standard mathematical operators, and we write $+$ for the syntactic construct and $\textbf{+}$ for the mathematical addition operator.
$$\infer{\smallstep{n + m}{n \textbf{+} m}}{}$$

What would be a useful property to prove about our new expressions?
For one thing, we don't want them to ``crash,'' as in the expression $(\lambda x. \; x) + 7$ that tries to add a function and a number.
No rule of the semantics knows what to do with that case, but it also isn't a value, so we shouldn't consider it as finished with evaluation.
Define an expression as \emph{stuck}\index{stuck term} when it is not a value and it cannot take a small step.
For ``reasonable'' expressions $e$, we should be able to prove that it is an invariant of $\mathbb T(e)$ that no expression is ever stuck.

To define ``reasonable,'' we formalize the popular idea of a static type system.
Every expression will be assigned a type, capturing which sorts of contexts it may legally be dropped into.
Our language of types is simple.
\abstraction
$$\begin{array}{rrcl}
  \textrm{Types} & \tau &::=& \mathbb N \mid \tau \to \tau
\end{array}$$
We have trees of function-space constructors, where all the leaves are instances of the natural-number type $\mathbb N$.
Note that, with type assignment, we have yet another case of \emph{abstraction}, approximating a potentially complex expression with a type that only records enough information to rule out crashes.

\newcommand{\hasty}[3]{#1 \vdash #2 : #3}

To assign types to closed terms, we must recursively define what it means for an open term to have a type.
To that end, we use \emph{typing contexts}\index{typing context} $\Gamma$, finite maps from variables to types.
To mimic standard notation, we write $\Gamma, x : \tau$ as shorthand for $\mupd{\Gamma}{x}{\tau}$, overriding of key $x$ with value $\tau$ in $\Gamma$.
Now we define typing as a three-place relation, written $\hasty{\Gamma}{e}{\tau}$, to indicate that, assuming $\Gamma$ as an assignment of types to $e$'s free variables, we conclude that $e$ has type $\tau$.

We define the relation inductively, with one case per syntactic construct.
\modularity
$$\infer{\hasty{\Gamma}{x}{\tau}}{
  \msel{\Gamma}{x} = \tau
}
\quad \infer{\hasty{\Gamma}{n}{\mathbb N}}{}
\quad \infer{\hasty{\Gamma}{e_1 + e_2}{\mathbb N}}{
    \hasty{\Gamma}{e_1}{\mathbb N}
    & \hasty{\Gamma}{e_2}{\mathbb N}
}$$
$$\infer{\hasty{\Gamma}{\lambda x. \; e}{\tau_1 \to \tau_2}}{
  \hasty{\Gamma, x : \tau_1}{e}{\tau_2}
}
\quad \infer{\hasty{\Gamma}{e_1 \; e_2}{\tau_2}}{
  \hasty{\Gamma}{e_1}{\tau_1 \to \tau_2}
  & \hasty{\Gamma}{e_2}{\tau_1}
}$$

We write $\hasty{}{e}{\tau}$ as shorthand for $\hasty{\mempty}{e}{\tau}$, meaning that closed term $e$ has type $\tau$, with no typing context required.
Note that this style of typing rules provides another instance of \emph{modularity}, since we can separately type-check different subexpressions of a large expression, using just their types to coordinate expectations among subexpressions.

It should be an invariant of $\mathbb T(e)$ that every reachable expression has the same type as the original, so long as the original was well-typed.
This observation is the key to proving that it is also an invariant that no reachable expression is stuck, using a proof technique called \emph{the syntactic approach to type soundness}\index{syntactic approach to type soundness}, which turns out to be just another instance of our general toolbox for invariant proofs.

We work our way through a suite of standard lemmas to support that invariant proof.

\begin{lemma}[Progress]\label{progress}
  If $\hasty{}{e}{\tau}$, then $e$ isn't stuck.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hasty{}{e}{\tau}$.
\end{proof}

\begin{lemma}[Weakening]\label{weakening}
  If $\hasty{\Gamma}{e}{\tau}$ and every mapping in $\Gamma$ is also included in $\Gamma'$, then $\hasty{\Gamma'}{e}{\tau}$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hasty{\Gamma}{e}{\tau}$.
\end{proof}

\begin{lemma}[Substitution]\label{substitution}
  If $\hasty{\Gamma, x : \tau'}{e}{\tau}$ and $\hasty{}{e'}{\tau'}$, then $\hasty{\Gamma}{\subst{e}{x}{e'}}{\tau}$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hasty{\Gamma, x: \tau'}{e}{\tau}$, with appeal to Lemma \ref{weakening}.
\end{proof}

\begin{lemma}[Preservation]\label{preservation}
  If $\smallstep{e_1}{e_2}$ and $\hasty{}{e_1}{\tau}$, then $\hasty{}{e_2}{\tau}$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\smallstep{e_1}{e_2}$.
\end{proof}

\invariants
\begin{theorem}[Type Soundness]
  If $\hasty{}{e}{\tau}$, then $\neg \textrm{stuck}$ is an invariant of $\mathbb T(e)$.
\end{theorem}
\begin{proof}
  First, we strengthen the invariant to $I(e) = \; \hasty{}{e}{\tau}$, justifying the implication by Lemma \ref{progress}, Progress.
  Then we apply invariant induction, where the base case is trivial.
  The induction step is a direct match for Lemma \ref{preservation}, Preservation.
\end{proof}

The syntactic approach to type soundness is often presented as a proof technique in isolation, but what we see here is that it follows very directly from our general invariant proof technique.
Usually syntactic type soundness is presented as fundamentally about proving Progress and Preservation conditions.
The Progress condition maps to invariant strengthening, and the Preservation condition maps to invariant induction, which we have used in almost every invariant proof so far.
Since the basic proof structure matches our standard one, the main insight is the usual one: a good choice of a strengthened invariant.
In this case, invariant $I(e) = \; \hasty{}{e}{\tau}$ is that crucial insight, including the original design of the set of types and the typing relation.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{More on Evaluation Contexts}\label{ectx}

In Chapter \ref{operational_semantics}, we met \emph{evaluation contexts}\index{evaluation contexts}, a strange little way of formalizing the idea ``the next place in a program where a step will happen.''
Though we showed off how easy they make it to add concurrency to a language, the payoff from this style may still not have been clear.
In this chapter, we continue the theme of showing how evaluation contexts make it easy to \emph{add new features to a formal semantics with very concise descriptions}.
That is, the game will be to continue extending a language definition \emph{modularly}, leaving unchanged as much of our old definitions and proofs as possible.
Many of the concise conventions we adopt here do require explicit expansion in the associated Coq code, though fancier Coq frameworks would obviate that requirement as well.

\section{Last Chapter's Example Revisited}

Let's begin by recasting last chapter's typed $\lambda$-calculus with numbers, using evaluation contexts.
We introduce this grammar of contexts.
$$\begin{array}{rrcl}
  \textrm{Evaluation contexts} & C &::=& \Box \mid C \; e \mid v \; C \mid C + e \mid v + C
\end{array}$$
Note the one subtlety, same as we encountered last chapter in a different place: the third and fifth forms of evaluation context require the first operand to be a \emph{value}.
Again we enforce \emph{call-by-value evaluation order}\index{call-by-value}.
Tweaks to the definition of $C$ produce other evaluation orders, like \emph{call-by-name}\index{call-by-name}, but we will say no more about those alternatives.

Two rules explain the primitive steps.
They should look familiar from last chapter's more direct small-step semantics.
$$\infer{\smallstepo{(\lambda x. \; e) \; v}{\subst{e}{x}{v}}}{}
\quad \infer{\smallstepo{n + m}{n \textbf{+} m}}{}$$

\encoding
As we will throughout this chapter, we assume a standard definition of what it means to plug an expression into the hole in a context, and now we can give the top-level small-step evaluation rule.
$$\infer{\smallstep{\plug{C}{e}}{\plug{C}{e'}}}{
  \smallstepo{e}{e'}
}$$

Last chapter's type-system definition may be reused unchanged.
We just need to make a small modification to the sequence of results leading to type soundness.

\begin{lemma}\label{preservation0}
  If $\smallstepo{e_1}{e_2}$ and $\hasty{}{e_1}{\tau}$, then $\hasty{}{e_2}{\tau}$.
\end{lemma}
\begin{proof}
  By inversion on the derivation of $\smallstepo{e_1}{e_2}$.
\end{proof}

\begin{lemma}\label{preservation_prime}
  If $\smallstepo{e_1}{e_2}$ and $\hasty{}{C[e_1]}{\tau}$, then $\hasty{}{C[e_2]}{\tau}$.
\end{lemma}
\begin{proof}
  By induction on the structure of $C$, with appeal to Lemma \ref{preservation0}.
\end{proof}

\begin{lemma}[Preservation]\label{preservation}
  If $\smallstep{e_1}{e_2}$ and $\hasty{}{e_1}{\tau}$, then $\hasty{}{e_2}{\tau}$.
\end{lemma}
\begin{proof}
  By inversion on the derivation of $\smallstep{e_1}{e_2}$, with appeal to Lemma \ref{preservation_prime}.
\end{proof}

\section{Product Types}

Let's see some examples of how easy it is to add new features to our language, starting with \emph{product types}\index{product types}, or pair types\index{pair types}.
The name ``product types'' comes from ``Cartesian product.''\index{Cartesian product}
We indicate extension of existing grammars by beginning definitions with $\ldots$, and we indicate extension of existing inductive predicates just by listing new rules.
$$\begin{array}{rrcl}
  \textrm{Expressions} & e &::=& \ldots \mid (e, e) \mid \pi_1(e) \mid \pi_2(e) \\
  \textrm{Values} & v &::=& \ldots \mid (v, v) \\
  \textrm{Contexts} & C &::=& \ldots \mid (C, e) \mid (v, C) \mid \pi_1(C) \mid \pi_2(C) \\
  \textrm{Types} & \tau &::=& \ldots \mid \tau \times \tau
\end{array}$$

Operator $\pi_i$ is for projecting\index{projection (from pairs)} the $i$th element of a pair.
Two new small-step rules finish the explanation of projection behavior.
$$\infer{\smallstepo{\pi_1((v_1, v_2))}{v_1}}{}
\quad \infer{\smallstepo{\pi_2((v_1, v_2))}{v_2}}{}$$

Finally, we can extend our type system with one new rule per expression kind.
$$\infer{\hasty{\Gamma}{(e_1, e_2)}{\tau_1 \times \tau_2}}{
  \hasty{\Gamma}{e_1}{\tau_1}
  & \hasty{\Gamma}{e_2}{\tau_2}
}
\quad \infer{\hasty{\Gamma}{\pi_1(e)}{\tau_1}}{
  \hasty{\Gamma}{e}{\tau_1 \times \tau_2}
}
\quad \infer{\hasty{\Gamma}{\pi_2(e)}{\tau_2}}{
  \hasty{\Gamma}{e}{\tau_1 \times \tau_2}
}$$

And that's the complete, unambiguous specification of this new feature!
The type-soundness proof adapts very naturally.
In fact, in the Coq code, almost exactly the same proof script as before keeps doing the job, a theme we will see continue throughout the chapter.

\section{Sum Types}

\newcommand{\inl}[1]{\mathsf{inl}(#1)}
\newcommand{\inr}[1]{\mathsf{inr}(#1)}
\newcommand{\match}[5]{(\mathsf{match} \; #1 \; \mathsf{with} \; \inl{#2} \Rightarrow #3 \mid \inr{#4} \Rightarrow #5)}

Next on our tour is \emph{sum types}\index{sum types}, otherwise known as \emph{variants}\index{variants} or \emph{disjoint unions}\index{disjoint unions}.
An element of the sum type $\tau_1 + \tau_2$ is either a $\tau_1$ or a $\tau_2$, and we indicate which with constructor functions $\mathsf{inl}$ and $\mathsf{inr}$.
$$\begin{array}{rrcl}
  \textrm{Expressions} & e &::=& \ldots \mid \inl{e} \mid \inr{e} \mid \match{e}{x}{e}{x}{e} \\
  \textrm{Values} & v &::=& \ldots \mid \inl{v} \mid \inr{v} \\
  \textrm{Contexts} & C &::=& \ldots \mid \match{C}{x}{e}{x}{e} \\
  \textrm{Types} & \tau &::=& \ldots \mid \tau + \tau
\end{array}$$

The $\mathsf{match}$ form, following pattern-matching in Coq and other languages, accounts for most of the syntactic complexity.
Two new small-step rules explain its behavior.
$$\infer{\smallstepo{\match{\inl{v}}{x_1}{e_1}{x_2}{e_2}}{\subst{e_1}{x_1}{v}}}{}$$
$$\infer{\smallstepo{\match{\inr{v}}{x_1}{e_1}{x_2}{e_2}}{\subst{e_2}{x_2}{v}}}{}$$

And the typing rules:
$$\infer{\hasty{\Gamma}{\inl{e}}{\tau_1 + \tau_2}}{
  \hasty{\Gamma}{e}{\tau_1}
}
\quad \infer{\hasty{\Gamma}{\inr{e}}{\tau_1 + \tau_2}}{
  \hasty{\Gamma}{e}{\tau_2}
}
\quad \infer{\hasty{\Gamma}{\match{e}{x_1}{e_1}{x_2}{e_2}}{\tau}}{
  \hasty{\Gamma}{e}{\tau_1 + \tau_2}
  & \hasty{\Gamma, x_1 : \tau_1}{e_1}{\tau}
  & \hasty{\Gamma, x_2 : \tau_2}{e_2}{\tau}
}$$

Again, the automated type-soundness proof adapts with minimal modification.

\section{Exceptions}

\newcommand{\throw}[1]{\mathsf{throw}(#1)}
\newcommand{\catch}[3]{(\mathsf{try} \; #1 \; \mathsf{catch} \; #2 \Rightarrow #3)}

Next, let's see how to model \emph{exceptions}\index{exceptions}, a good representative of control-flow-heavy language features.
For simplicity, we will encode exception values themselves as natural numbers.
The action is in how exceptions are thrown and caught.
$$\begin{array}{rrcl}
  \textrm{Expressions} & e &::=& \ldots \mid \throw{e} \mid \catch{e}{x}{e} \\
  \textrm{Contexts} & C &::=& \ldots \mid \throw{C} \mid \catch{C}{x}{e} \\
\end{array}$$

We also introduce metavariable $C^-$ to stand for an evaluation context that does not use the constructor for $\mathsf{catch}$ and is not just a hole $\Box$ (though it must contain a $\Box$).
It is handy to express the idea of exceptions \emph{bubbling up} to the nearest enclosing $\mathsf{catch}$ constructs.
Specifically, here are three rules to define exception behavior.
$$\infer{\smallstepo{\catch{v}{x}{e}}{v}}{}
\quad \infer{\smallstepo{\catch{\throw{v}}{x}{e}}{\subst{e}{x}{v}}}{}
\quad \infer{\smallstepo{C^-[\throw{v}]}{\throw{v}}}{}$$

And the typing rules, where the biggest twist is that a $\mathsf{throw}$ expression can have any type, since it never terminates normally:
$$\infer{\hasty{\Gamma}{\throw{e}}{\tau}}{
  \hasty{\Gamma}{e}{\mathbb N}
}
\quad \infer{\hasty{\Gamma}{\catch{e}{x_1}{e_1}}{\tau}}{
  \hasty{\Gamma}{e}{\tau}
  & \hasty{\Gamma, x_1 : \mathbb N}{e_1}{\tau}
}$$

Most of the automated type-soundness proof adapts, but we do need to make one nontrivial change: extending the primary safety invariant, since now there are well-typed states that are neither values nor able to step.
An \emph{uncaught exception}\index{uncaught exceptions} is the new potential outcome of an execution.
Therefore, the invariant we prove (with related fix-ups to a lemma statement or two) is:
$$I(e) = \textrm{$e$ is a value} \lor (\exists n : \mathbb N. \; e = \throw{n}) \lor (\exists e'. \; \smallstep{e}{e'})$$

It is worth pausing here to reflect on what we did \emph{not} need to do.
Though we added a new kind of side effect, we did not need to modify a single rule dealing with preexisting features.
The open-ended abstraction of evaluation contexts helped us plan ahead for side effects without foreseeing them precisely.
For instance, it was critical that we could refer to a restricted context $C^-$ to consider exception bubbling past \emph{any} of the prior features for which we defined order-of-evaluation rules.

\section{Mutable Variables\label{mutable_variables}}

Let's now consider another side effect and how we can add it without having to modify existing rules.
This one we will build on the lambda calculus with products and sums, not trying to harmonize it with exceptions.
We also keep the existing immutable variables and add new syntactic features for mutable variables.
$$\begin{array}{rrcl}
  \textrm{Mutable variables} & X \\
  \textrm{Expressions} & e &::=& \ldots \mid X \mid X \leftarrow e \\
  \textrm{Contexts} & C &::=& \ldots \mid X \leftarrow C
\end{array}$$

\newcommand{\smallstepw}[2]{#1 \to_1 #2}

We keep the $\rightarrow_0$ relation defined previously and define on top of it $\rightarrow_1$ that works on states that contain variable valuations $\sigma$ (as we have used before to model imperative languages).
$$\infer{\smallstepw{(\sigma, e_1)}{(\sigma, e_2)}}{
  \smallstepo{e_1}{e_2}
}
\quad \infer{\smallstepw{(\sigma, X)}{(\sigma, v)}}{
  \sigma(X) = v
}
\quad \infer{\smallstepw{(\sigma, X \leftarrow v)}{(\mupd{\sigma}{X}{v}, v)}}{
}$$

We define an alternative top-level small-step relation:
$$\infer{\smallstep{(\sigma, \plug{C}{e})}{(\sigma', \plug{C}{e'})}}{
  \smallstepw{(\sigma, e)}{(\sigma', e')}
}$$
Note how planning ahead and modularizing our order-of-operations rules via context plugging allows us to reuse the same rules, even in the presence of mutable-variable valuations.

\newcommand{\rhasty}[4]{#1; #2 \vdash #3 : #4}

We ``cheat'' a bit and extend our typing judgment to take an extra argument, so a typing fact looks like $\rhasty{\Delta}{\Gamma}{e}{\tau}$.
The new context $\Delta$ assigns types to mutable variables, and it stays fixed throughout a program's execution.
We implicitly edit each prior typing rule to thread $\Delta$ through unchanged, and we add the two crucial new ones:
$$\infer{\rhasty{\Delta}{\Gamma}{X}{\tau}}{
  \Delta(X) = \tau
}
\quad \infer{\rhasty{\Delta}{\Gamma}{X \leftarrow e}{\tau}}{
  \Delta(X) = \tau
  & \rhasty{\Delta}{\Gamma}{e}{\tau}
}$$

We have to make moderate edits to the central lemma statements.
One frequent ingredient is a compatibility relation $\Delta \simeq \sigma$, indicating that any variable given a type $\tau$ in $\Delta$ is also assigned a $\tau$-typed value in $\sigma$.
Then we can adapt the type-preservation proof as follows.
(The progress proof works essentially the same as before.)

\begin{lemma}\label{mutable_preservation0}
  If $\smallstepo{e_1}{e_2}$ and $\hasty{}{e_1}{\tau}$, then $\hasty{}{e_2}{\tau}$.
\end{lemma}
\begin{proof}
  By inversion on the derivation of $\smallstepo{e_1}{e_2}$.
\end{proof}

\begin{lemma}\label{preservation1}
  If $\smallstepw{(\sigma_1, e_1)}{(\sigma_2, e_2)}$, $\rhasty{\Delta}{\mempty}{e_1}{\tau}$, and $\Delta \simeq \sigma_1$, then $\rhasty{\Delta}{\mempty}{e_2}{\tau}$ and $\Delta \simeq \sigma_2$.
\end{lemma}
\begin{proof}
  By inversion on the derivation of $\smallstepw{(\sigma_1, e_1)}{(\sigma_2, e_2)}$, with appeal to Lemma \ref{mutable_preservation0}.
\end{proof}

\begin{lemma}\label{mutable_preservation_prime}
  If $\smallstepw{(\sigma_1, e_1)}{(\sigma_2, e_2)}$, $\rhasty{\Delta}{\mempty}{C[e_1]}{\tau}$, and $\Delta \simeq \sigma_1$, then $\rhasty{\Delta}{}{C[e_2]}{\tau}$ and $\Delta \simeq \sigma_2$.
\end{lemma}
\begin{proof}
  By induction on the structure of $C$, with appeal to Lemma \ref{preservation1}.
\end{proof}

\begin{lemma}[Preservation]\label{mutable_preservation}
  If $\smallstep{(\sigma_1, e_1)}{(\sigma_2, e_2)}$, $\rhasty{\Delta}{}{e_1}{\tau}$, and $\Delta \simeq \sigma_1$, then $\rhasty{\Delta}{\mempty}{e_2}{\tau}$ and $\Delta \simeq \sigma_2$.
\end{lemma}
\begin{proof}
  By inversion on the derivation of $\smallstep{(\sigma_1, e_1)}{(\sigma_2, e_2)}$, with appeal to Lemma \ref{mutable_preservation_prime}.
\end{proof}

Everything can be brought together in type-safety proof with a strengthened invariant that also asserts the $\simeq$ relation, just as in the various statements of preservation.

\section{Concurrency Again}

Mutable variables provide a means for communication between threads, so we can bring concurrency to our language with remarkably little effort.
We choose a \emph{concurrent pairing}\index{concurrent pairing} operator, which builds a pair through simultaneous evaluation of the expressions for the two elements.
$$\begin{array}{rrcl}
  \textrm{Expressions} & e &::=& \ldots \mid e || e \\
  \textrm{Contexts} & C &::=& \ldots \mid C || e \mid e || C
\end{array}$$

As when we looked at adding concurrency to an imperative language, note that we introduce nondeterminism by allowing evaluation to proceed on \emph{either} side of the parallel composition $||$, on any given step.
One new small-step rule explains what to do when both sides are fully evaluated.
$$\infer{\smallstepo{v_1 || v_2}{(v_1, v_2)}}{}$$

The typing rule is straightforward:
$$\infer{\rhasty{\Delta}{\Gamma}{e_1 || e_2}{\tau_1 \times \tau_2}}{
  \rhasty{\Delta}{\Gamma}{e_1}{\tau_1}
  & \rhasty{\Delta}{\Gamma}{e_2}{\tau_2}
}$$

After this change, literally the same Coq proof script as before establishes type soundness.
That's a pretty strong demonstration of modularity in semantics.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Types and Mutation}

As we glimpsed last chapter, the syntactic approach to type soundness continues to apply to \emph{impure} functional languages, which combine imperative side effects with first-class functions.
We'll study the general domain through its most common exemplar: $\lambda$-calculus with \emph{mutable references}\index{mutable references}\index{references}, which generalize the mutable variables that we modeled in Section \ref{mutable_variables}.

\section{Simply Typed Lambda Calculus with Mutable References}

\newcommand{\newref}[1]{\mathsf{new}(#1)}
\newcommand{\readref}[1]{!#1}
\newcommand{\writeref}[2]{#1 := #2}

Here is an extension of the lambda-calculus syntax from Chapter \ref{types}, with additions underlined.
$$\begin{array}{rrcl}
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Numbers} & n &\in& \mathbb N \\
  \textrm{Expressions} & e &::=& n \mid e + e \mid x \mid \lambda x. \; e \mid e \; e \mid \underline{\newref{e} \mid \; \readref{e} \mid \writeref{e}{e}}
\end{array}$$

\newcommand{\elet}[3]{\mathsf{let} \; #1 = #2 \; \mathsf{in} \; #3}

The three new expression forms deal with \emph{references}, which act like, for instance, Java\index{Java} objects that only have single public fields.
We write $\newref{e}$ to allocate a new reference initialized with value $e$, we write $\readref{e}$ for reading the value stored in reference $e$, and we write $\writeref{e_1}{e_2}$ for overwriting the value of reference $e_1$ with $e_2$.
An example is worth a thousand words, so let's consider a concrete program.
We'll use two notational shorthands:
\begin{eqnarray*}
  \elet{x}{e_1}{e_2} &\triangleq& (\lambda x. \; e_2) \; e_1 \\
  e_1; e_2 &\triangleq& \elet{\_}{e_1}{e_2} \textrm{ (for $\_$ a variable not used anywhere else)}
\end{eqnarray*}

Here is a simple program that uses references.
$$\elet{r}{\newref{0}}{\writeref{r}{\; \readref{r} + 1}; \readref{r}}$$

This program (1) allocates a new reference $r$ storing the value 0; (2) increments $r$'s value by 1; and (3) returns the new $r$ value, which is 1.

To be more formal about the meanings of all programs, we extend the operational semantics from the start of last chapter.
First, we add some new kinds of evaluation contexts.
$$\begin{array}{rrcl}
  \textrm{Evaluation contexts} & C &::=& \Box \mid C \; e \mid v \; C \mid C + e \mid v + C \\
  &&& \mid \; \underline{\newref{C} \mid \; \readref{C} \mid \writeref{C}{e} \mid \writeref{v}{C}}
\end{array}$$

Next we define the basic reduction steps of the language.
Similarly to in Section \ref{mutable_variables}, we work with states that include not just expressions but also \emph{heaps}\index{heaps} $h$, partial functions from references to their current stored values.
We begin by copying over the two basic-step rules from last chapter, threading through the heap $h$ unchanged.
$$\infer{\smallstepo{(h, (\lambda x. \; e) \; v)}{(h, \subst{e}{x}{v})}}{}
\quad \infer{\smallstepo{(h, n + m)}{(h, n \textbf{+} m)}}{}$$

To write out the rules that are specific to references, it's helpful to extend our language syntax with a form that will never appear in original programs, but which does show up at intermediate execution steps.
In particular, let's add an expression form for \emph{locations}\index{locations}, the runtime values of references, and let's say that locations also count as values.
$$\begin{array}{rrcl}
  \textrm{Locations} & \ell &\in& \mathbb N \\
  \textrm{Expressions} & e &::=& n \mid e + e \mid x \mid \lambda x. \; e \mid e \; e \mid \newref{e} \mid \; \readref{e} \mid \writeref{e}{e} \mid \underline{\ell} \\
  \textrm{Values} & v &::=& n \mid \lambda x. \; e \mid \underline{\ell}
\end{array}$$

\newcommand{\dom}[1]{\mathsf{dom}(#1)}
Now we can write the rules for the three reference primitives.
$$\infer{\smallstepo{(h, \newref{v})}{(\mupd{h}{\ell}{v}, \ell)}}{
  \ell \notin \dom{h}
}
\quad \infer{\smallstepo{(h, \readref{\ell})}{(h, v)}}{
  \msel{h}{\ell} = v
}
\quad \infer{\smallstepo{(h, \writeref{\ell}{v'})}{(\mupd{h}{\ell}{v'}, v')}}{
  \msel{h}{\ell} = v
}$$

To evaluate a reference allocation $\newref{e}$, we nondeterministically\index{nondeterminism} pick some unused location $\ell$ and initialize it with the requested value.
To read from a reference in $\readref{e}$, we just look up the location in the heap; the program will be \emph{stuck} if the location is not already included in $h$.
Finally, to write to a reference with $\writeref{e_1}{e_2}$, we check that the requested location is already in the heap (we're stuck if not), then we overwrite its value with the new one.

Here is the overall step rule, which looks just like the one for basic $\lambda$-calculus, with a heap wrapped around everything.
$$\infer{\smallstep{(h, \plug{C}{e})}{(h', \plug{C}{e'})}}{
  \smallstepo{(h, e)}{(h', e')}
}$$

As a small exercise for the reader, it may be worth using this judgment to derive that our example program from before always returns 1.
Even fixing the empty heap in the starting state, there is some nondeterminism in which final heap it returns: the possibilities are all the single-location heaps, mapping their single locations to value 1.
It is natural to allow this nondeterminism in allocation, since typical memory allocators in real systems don't give promises about predictability in the addresses that they return.
However, we will be able to prove that, for instance, any program returning a number \emph{gives the same answer, independently of nondeterministic choices made by the allocator}.
That property is not true in programming languages like C\index{C programming language} that are not \emph{memory safe}\index{memory safety}, as they allow arithmetic and comparisons on pointers\index{pointers}, the closest C equivalent of our references.


\section{Type Soundness}

\newcommand{\reft}[1]{#1 \; \mathsf{ref}}

For $\lambda$-calculus with references, we can prove a similar type-soundness theorem to what we proved last chapter, though the proof has a twist or two.
To start with, we should define our extended type system, with one new case for references.
$$\begin{array}{rrcl}
  \textrm{Types} & \tau &::=& \mathbb N \mid \tau \to \tau \mid \underline{\reft{\tau}}
\end{array}$$

Here are the rules from last chapter's basic $\lambda$-calculus, which we can keep unchanged.
$$\infer{\hasty{\Gamma}{x}{\tau}}{
  \msel{\Gamma}{x} = \tau
}
\quad \infer{\hasty{\Gamma}{n}{\mathbb N}}{}
\quad \infer{\hasty{\Gamma}{e_1 + e_2}{\mathbb N}}{
    \hasty{\Gamma}{e_1}{\mathbb N}
    & \hasty{\Gamma}{e_2}{\mathbb N}
}$$
$$\infer{\hasty{\Gamma}{\lambda x. \; e}{\tau_1 \to \tau_2}}{
  \hasty{\Gamma, x : \tau_1}{e}{\tau_2}
}
\quad \infer{\hasty{\Gamma}{e_1 \; e_2}{\tau_2}}{
  \hasty{\Gamma}{e_1}{\tau_1 \to \tau_2}
  & \hasty{\Gamma}{e_2}{\tau_1}
}$$

We also need a rule for each of the reference primitives.
$$\infer{\hasty{\Gamma}{\newref{e}}{\reft{\tau}}}{
  \hasty{\Gamma}{e}{\tau}
}
\quad \infer{\hasty{\Gamma}{\; \readref{e}}{\tau}}{
    \hasty{\Gamma}{e}{\reft{\tau}}
}
\quad \infer{\hasty{\Gamma}{\writeref{e_1}{e_2}}{\tau}}{
  \hasty{\Gamma}{e_1}{\reft{\tau}}
  & \hasty{\Gamma}{e_2}{\tau}
}$$

That's enough notation to let us state type soundness, which is indeed provable.

\begin{theorem}[Type Soundness]
  If $\hasty{}{e}{\tau}$, then $\neg \textrm{stuck}$ is an invariant of $\mathbb T(e)$.
\end{theorem}

However, we will need to develop some more machinery to let us state the strengthened invariant that makes the proof go through.

The trouble with our typing rules is that they disallow location constants, but those constants \emph{will} arise in intermediate states of program execution.
To prepare for them, we introduce \emph{heap typings}\index{heap typings} $\Sigma$, partial functions from locations to types.
The idea is that a heap typing $\Sigma$ models a heap $h$ by giving the intended type for each of its locations.
We define an expanded typing judgment of the form $\rhasty{\Sigma}{\Gamma}{e}{\tau}$, with a new parameter included solely to enable the following rule.
$$\infer{\rhasty{\Sigma}{\Gamma}{\ell}{\tau}}{
  \msel{\Sigma}{\ell} = \tau
}$$

We must also extend every typing rule we gave before, adding an extra ``$\Sigma;$'' prefix, threaded mindlessly through everything.
We never extend $\Sigma$ as we recurse into subexpressions, and we only examine it in leaves of derivation trees, corresponding to $\ell$ expressions.

We have made some progress toward stating an inductive invariant for the type-soundness theorem.
The essential idea of the proof is found in the invariant choice $I(h, e) = \exists \Sigma. \; \rhasty{\Sigma}{\mempty}{e}{\tau}$.
However, we can tell that something is suspicious with this invariant, since it does not mention $h$.
We should also somehow characterize the relationship between $\Sigma$ and $h$.

\newcommand{\heapty}[2]{#1 \vdash #2}

Here is a first cut at defining a relation $\heapty{\Sigma}{h}$.
$$\infer{\heapty{\Sigma}{h}}{
  \forall \ell, \tau. \; \msel{\Sigma}{\ell} = \tau \Rightarrow \exists v. \; \msel{h}{\ell} = v \land \rhasty{\Sigma}{\mempty}{v}{\tau}
}$$

In other words, whenever $\Sigma$ announces the existence of location $\ell$ meant to store values of type $\tau$, the heap $h$ actually stores some value $v$ for $\ell$, and that value has the right type.
Note the tricky recursion inherent in typing $v$ with respect to the very same $\Sigma$.

This rule as stated is not \emph{quite} sufficient to make the invariant inductive.
We could get stuck on a $\newref{e}$ expression if the heap $h$ becomes \emph{infinite}, with no free addresses left to allocate.
Of course, we know that finite executions, started in the empty heap, only produce finite intermediate heaps.
Let's remember that fact with another condition in the $\heapty{\Sigma}{h}$ relation.
$$\infer{\heapty{\Sigma}{h}}{
  (\forall \ell, \tau. \; \msel{\Sigma}{\ell} = \tau \Rightarrow \exists v. \; \msel{h}{\ell} = v \land \rhasty{\Sigma}{\mempty}{v}{\tau})
  & (\exists \; \mathsf{bound}. \; \forall \ell \geq \mathsf{bound}. \; \ell \notin \dom{h})
}$$

The rule requires the existence of some upper bound $\mathsf{bound}$ on the already-allocated locations.
By construction, whenever we need to allocate a fresh location, we may choose $\mathsf{bound}$, or indeed any location greater than it.

We now have the right machinery to define an inductive invariant, namely:
\invariants
$$I(h, e) = \exists \Sigma. \; \rhasty{\Sigma}{\mempty}{e}{\tau} \land \heapty{\Sigma}{h}$$

We prove variants of all of the lemmas behind last chapter's type-safety proof, with a few new ones and twists on the originals.
Here we give some highlights.

\begin{lemma}[Heap Weakening]
  If $\rhasty{\Sigma}{\Gamma}{e}{\tau}$ and every mapping in $\Sigma$ is also included in $\Sigma'$, then $\rhasty{\Sigma'}{\Gamma}{e}{\tau}$.
\end{lemma}

\begin{lemma}
  If $\smallstepo{(h, e)}{(h', e')}$, $\rhasty{\Sigma}{\mempty}{e}{\tau}$, and $\heapty{\Sigma}{h}$, then there exists $\Sigma'$ such that $\rhasty{\Sigma'}{\mempty}{e'}{\tau}$, $\heapty{\Sigma'}{h'}$, and $\Sigma'$ preserves all mappings from $\Sigma$.
\end{lemma}

\begin{lemma}
  If $\rhasty{\Sigma}{\mempty}{\plug{C}{e_1}}{\tau}$, then there exists $\tau_0$ such that $\rhasty{\Sigma}{\mempty}{e_1}{\tau_0}$ and, for all $e_2$ and $\Sigma'$, if $\rhasty{\Sigma'}{\mempty}{e_2}{\tau_0}$ and $\Sigma'$ preserves mappings from $\Sigma$, then $\rhasty{\Sigma'}{\mempty}{\plug{C}{e_2}}{\tau}$.
\end{lemma}

\begin{lemma}[Preservation]
  If $\smallstep{(h, e)}{(h', e')}$, $\rhasty{\Sigma}{\mempty}{e}{\tau}$, and $\heapty{\Sigma}{h}$, then there exists $\Sigma'$ such that $\rhasty{\Sigma'}{\mempty}{e'}{\tau}$ and $\heapty{\Sigma'}{h'}$.
\end{lemma}


\section{Garbage Collection}

Functional languages like ML\index{ML} and Haskell\index{Haskell} include features very similar to the mutable references that we study in this chapter.
However, their execution models depart in an important way from the operational semantics we just defined: they use \emph{garbage collection}\index{garbage collection} to deallocate unused references, whereas our last semantics allows references to accumulate forever in the heap, even if it is clear that some of them will never be needed again.
Worry not!
We can model garbage collection with one new rule of the operational semantics, and then our type-safety proof adapts and shows that we still avoid stuckness, when the garbage collector can snatch \emph{unreachable} locations away from us at any moment.

\newcommand{\freeloc}[1]{\mathsf{freeloc}(#1)}

To define \emph{unreachable}, we start with a way to compute the \emph{free locations} of an expression.
\begin{eqnarray*}
  \freeloc{x} &=& \emptyset \\
  \freeloc{n} &=& \emptyset \\
  \freeloc{e_1 + e_2} &=& \freeloc{e_1} \cup \freeloc{e_2} \\
  \freeloc{\lambda x. \; e_1} &=& \freeloc{e_1} \\
  \freeloc{e_1 \; e_2} &=& \freeloc{e_1} \cup \freeloc{e_2} \\
  \freeloc{\newref{e_1}} &=& \freeloc{e_1} \\
  \freeloc{\readref{e_1}} &=& \freeloc{e_1} \\
  \freeloc{\writeref{e_1}{e_2}} &=& \freeloc{e_1} \cup \freeloc{e_2} \\
  \freeloc{\ell} &=& \{\ell\}
\end{eqnarray*}

\newcommand{\reach}[2]{\mathcal R_{#1}(#2)}

Next, we define a relation to capture \emph{which locations are reachable from some starting expression, relative to a particular heap?}
For each expression $e$ and heap $h$, we define $\reach{h}{e}$ as the set of locations reachable from $e$ via $h$.
$$\infer{\ell \in \reach{h}{\ell}}{}
\quad \infer{\ell' \in \reach{h}{\ell}}{
  \msel{h}{\ell} = v
  & \ell' \in \reach{h}{v}
}
\quad \infer{\ell' \in \reach{h}{e}}{
  \ell \in \freeloc{e}
  & \ell' \in \reach{h}{\ell}
}$$

In order, the rules say: any location reaches itself; any location reaches anywhere reachable from the value assigned to it by $h$; and any expression reaches anywhere reachable from any of its free locations.

Now we add one new top-level rule to the operational semantics, saying \emph{unreachable locations may be removed at any time}.
$$\infer{\smallstep{(h, e)}{(h', e)}}{
  \begin{array}{c}
    \forall \ell, v. \; \ell \in \reach{h}{e} \land \msel{h}{\ell} = v \Rightarrow \msel{h'}{\ell} = v \\
    \forall \ell, v. \; \msel{h'}{\ell} = v \Rightarrow \msel{h}{\ell} = v \\
    h' \neq h
  \end{array}
}$$

Let us explain each premise in more detail.
The first premise says that, going from the old heap $h$ to the new heap $h'$, \emph{the value of every reachable reference is preserved}.
The second premise says that \emph{the new heap is a subheap of the original, not spontaneously adding any new mappings}.
The final premise says that we have actually done some useful work: the new heap isn't just the same as the old one.

It may not be clear why we must include the last premise.
The reason has to do with our formulation of type safety, by saying that programs never get \emph{stuck}.
We defined that $e$ is \emph{stuck} if it is not a value but it also can't take a step.
If we omitted from the garbage-collection rule the premise $h' \neq h$, then this rule would \emph{always} apply, for any term, simply by setting $h' = h$.
That is, \emph{no} term would ever be stuck, and type safety would be meaningless!
Since the rule also requires that $h'$ be \emph{no larger than} $h$ (with the second premise), additionally requiring $h' \neq h$ forces $h'$ to \emph{shrink}, garbage-collecting at least one location.
Thus, in any execution state, we can ``kill time'' by running garbage collection only finitely many times before we need to find some ``real'' step to run.
More precisely, the limit on how many times we can run garbage collection in a row, starting from heap $h$, is $|\dom{h}|$, the number of locations in $h$.

The type-safety proof is fairly straightforward to update.
We prove progress by \emph{ignoring} the garbage-collection rule, since the existing rules were already enough to find a step for every nonvalue.
A bit more work is needed to update the proof of preservation; its cases for the existing rules follow the same way as before, while we must prove a few lemmas on the way to handling the new rule.

\begin{lemma}[Transitivity for reachability]
  If $\freeloc{e_1} \subseteq \freeloc{e_2}$, then $\reach{h}{e_1} \subseteq \reach{h}{e_2}$.
\end{lemma}

\begin{lemma}[Irrelevance of unreachable locations for typing]
  If $\heapty{\Sigma}{h}$, $\rhasty{\Sigma}{\Gamma}{e}{\tau}$, then $\rhasty{\Sigma'}{\Gamma}{e}{\tau}$, if we also know that, for all $\ell$ and $\tau'$, when $\ell \in \reach{h}{e}$ and $\msel{\Sigma}{\ell} = \tau'$, it follows that $\msel{\Sigma'}{\ell} = \tau'$.
\end{lemma}

\begin{lemma}[Reachability sandwich]
  If $\ell \in \reach{h}{e}$, $\msel{h}{\ell} = v$, and $\ell' \in \reach{h}{v}$, then $\ell' \in \reach{h}{e}$.
\end{lemma}

To extend the proof of preservation, we need to show that the strengthened invariant still holds after garbage collection.
A key element is choosing the new heap typing.
We pick \emph{the restriction of the old heap typing $\Sigma$ to the domain of the new heap $h'$}.
That is, we drop from the heap typing all locations that have been garbage collected, preserving the types of the survivors.
Some work is required to show that this strategy is sound, given the definition of reachability, but the lemmas above work out the details, leaving just a bit of bookkeeping in the preservation proof.
The final safety proof then proceeds in exactly the same way as before.

Our proof here hasn't quite covered all the varieties of garbage collectors that exist.
In particular, \emph{copying collectors}\index{copying garbage collectors} may \emph{move references to different locations}, while we only allow collectors to delete some references.
It may be an edifying exercise for the reader to extend our proof in a way that also supports reference relocation.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Hoare Logic: Verifying Imperative Programs}

We now take a step away from the last chapters in two dimensions: we switch back from functional to imperative programs, and we return to proofs of deep correctness properties, rather than mere absence of type-related crashes.
Nonetheless, the essential proof structure winds up being the same, as we once again prove invariants of transition systems!


\section{An Imperative Language with Memory}

\newcommand{\assert}[1]{\mathsf{assert}(#1)}
\newcommand{\readfrom}[1]{{*}[#1]}
\newcommand{\writeto}[2]{\readfrom{#1} \leftarrow #2}

To provide us with an interesting enough playground for program verification, let's begin by defining an imperative language with an infinite mutable heap.
For reasons that will become clear shortly, we do a strange bit of mixing of syntax and semantics.
In certain parts of the syntax, we include \emph{assertions}\index{assertions} $a$, which are arbitrary mathematical predicates over program state, split between heaps $h$ and variable valuations $v$.

$$\begin{array}{rrcl}
  \textrm{Numbers} & n &\in& \mathbb N \\
  \textrm{Variables} & x &\in& \mathsf{Strings} \\
  \textrm{Expressions} & e &::=& n \mid x \mid e + e \mid e - e \mid e \times e \mid \readfrom{e} \\
  \textrm{Boolean expressions} & b &::=& e = e \mid e < e \\
  \textrm{Commands} & c &::=& \skipe \mid \assign{x}{e} \mid \writeto{e}{e} \mid c; c \\
  &&& \mid \ifte{b}{c}{c} \mid \{a\}\while{b}{c} \mid \assert{a}
\end{array}$$

Beside assertions, we also have memory-read operations $\readfrom{e}$ and memory-write operations $\writeto{e_1}{e_2}$, which are written suggestively, as if the memory were a global array named $*$.
Loops have sprouted an extra assertion in their syntax, which we will actually ignore in the language semantics, but which becomes important as part of the proof technique we will learn, especially in automating it.

Expressions have a standard recursive semantics.
\begin{eqnarray*}
  \denote{n}(h, v) &=& n \\
  \denote{x}(h, v) &=& \msel{v}{x} \\
  \denote{e_1 + e_2}(h, v) &=& \denote{e_1}(h, v) + \denote{e_2}(h, v) \\
  \denote{e_1 - e_2}(h, v) &=& \denote{e_1}(h, v) - \denote{e_2}(h, v) \\
  \denote{e_1 \times e_2}(h, v) &=& \denote{e_1}(h, v) \times \denote{e_2}(h, v) \\
  \denote{\readfrom{e}}(h, v) &=& \msel{h}{\denote{e}(h, v)} \\
  \denote{e_1 = e_2}(h, v) &=& \denote{e_1}(h, v) = \denote{e_2}(h, v) \\
  \denote{e_1 < e_2}(h, v) &=& \denote{e_1}(h, v) < \denote{e_2}(h, v)
\end{eqnarray*}

We finish up with a big-step semantics in the style of those we've seen before, with the added complication of threading a heap through.
\encoding
$$\infer{\bigstep{(h, v, \skipe)}{(h, v)}}{}
\quad \infer{\bigstep{(h, v, \assign{x}{e})}{(h, \mupd{v}{x}{\denote{e}(h, v)})}}{}$$

$$\infer{\bigstep{(h, v, \writeto{e_1}{e_2})}{(\mupd{h}{\denote{e_1}(h, v)}{\denote{e_2}(h, v)}, v)}}{}$$

$$\infer{\bigstep{(h, v, c_1; c_2)}{(h_2, v_2)}}{
  \bigstep{(h, v, c_1)}{(h_1, v_1)}
  & \bigstep{(h_1, v_1, c_2)}{(h_2, v_2)}
}$$

$$\infer{\bigstep{(h, v, \ifte{b}{c_1}{c_2})}{(h', v')}}{
  \denote{b}(h, v)
  & \bigstep{(h, v, c_1)}{(h', v')}
}
\quad \infer{\bigstep{(h, v, \ifte{b}{c_1}{c_2})}{(h', v')}}{
  \neg \denote{b}(h, v)
  & \bigstep{(h, v, c_2)}{(h', v')}
}$$

$$\infer{\bigstep{(h, v, \{I\} \while{b}{c})}{(h', v')}}{
  \denote{b}(h, v)
  & \bigstep{(h, v, c; \{I\} \while{b}{c})}{(h', v')}
}
\quad \infer{\bigstep{(h, v, \{I\} \while{b}{c})}{(h, v)}}{
  \neg \denote{b}(h, v)
}$$

$$\infer{\bigstep{(h, v, \assert{a})}{(h, v)}}{
  a(h, v)
}$$

Reasoning directly about operational semantics can get tedious, so let's develop some machinery for proving program correctness automatically.


\section{Hoare Triples}

\newcommand{\hoare}[3]{\{#1\} #2 \{#3\}}

Much as we did with type systems, we define a syntactic predicate and prove it sound once and for all.
Afterward, we can automatically show that particular programs and their specifications inhabit the predicate.
This time, predicate instances will be written like $\hoare{P}{c}{Q}$, with $c$ the command being verified, $P$ its \emph{precondition}\index{precondition} (assumption about the program state before we start running $c$), and $Q$ its \emph{postcondition} (obligation about the program state after $c$ finishes).
We call any such fact a \emph{Hoare triple}\index{Hoare triple}, and the overall predicate is an instance of \emph{Hoare logic}\index{Hoare logic}.

\encoding
A first rule for $\skipe$ is easy: anything that was true before is also true after.
$$\infer{\hoare{P}{\skipe}{P}}{}$$

A rule for assignment is slightly more involved: to state what we know is true after, we recall that there existed a prestate satisfying the precondition, which then evolved into the poststate in the expected way.
$$\infer{\hoare{P}{\assign{x}{e}}{\lambda (h, v). \; \exists v'. \; P(h, v') \land v = \mupd{v'}{x}{\denote{e}(h, v')}}}{}$$

The memory-write command is treated symmetrically.
$$\infer{\hoare{P}{\writeto{e_1}{e_2}}{\lambda (h, v). \; \exists h'. \; P(h', v) \land h = \mupd{h'}{\denote{e_1}(h', v)}{\denote{e_2}(h', v)}}}{}$$

To model sequencing, we thread predicates through in an intuitive way.
$$\infer{\hoare{P}{c_1; c_2}{R}}{
  \hoare{P}{c_1}{Q}
  & \hoare{Q}{c_2}{R}
}$$

For conditional statements, we start from the basic approach of sequencing, adding two twists.
First, since the two subcommands run after different outcomes of the test expression, we extend their preconditions.
Second, since we may reach the end of the command after running either subcommand, we take the disjunction of their postconditions.
$$\infer{\hoare{P}{\ifte{b}{c_1}{c_2}}{\lambda s. \; Q_1(s) \lor Q_2(s)}}{
  \hoare{\lambda s. \; P(s) \land \denote{b}(s)}{c_1}{Q_1}
  & \hoare{\lambda s. \; P(s) \land \neg \denote{b}(s)}{c_2}{Q_2}
}$$

Coming to loops, we at last have a purpose for the assertion annotated on each one.
\invariants
We call those assertions \emph{loop invariants}\index{loop invariants}; one of these is meant to be true every time a loop iteration begins.
We will try to avoid confusion with the more fundamental concept of invariant for transition systems, though in fact the two are closely related formally, which we will see in the last section of this chapter.
Essentially, the loop invariant gives the \emph{induction hypothesis} that makes the program correctness proof go through.
We encapsulate the induction reasoning once and for all, in the proof of soundness for Hoare triples.
To verify an individual program, it is only necessary to prove the premises of the rule, which we give now.
$$\infer{\hoare{P}{\{I\} \while{b}{c}}{\lambda s. \; I(s) \land \neg \denote{b}(s)}}{
  (\forall s. \; P(s) \Rightarrow I(s))
  & \hoare{\lambda s. \; I(s) \land \denote{b}(s)}{c}{I}
}$$
In words: the loop invariant is true when we begin the loop, and every iteration preserves the invariant, given the extra knowledge that the loop test succeeded.
If the loop finishes, we know that the invariant is still true, but now the test is false.

The final command-specific rule, for assertions, is a bit anticlimactic.
The precondition is carried over as postcondition, if it is strong enough to prove the assertion.
$$\infer{\hoare{P}{\assert{I}}{P}}{
  \forall s. \; P(s) \Rightarrow I(s)
}$$

One more essential rule remains, this time not specific to any command form.
The rules we've given deduce specific kinds of precondition-postcondition pairs.
For instance, the $\skipe$ rule forces the precondition and postcondition to match.
However, we expect to be able to prove $\hoare{\lambda (h, v). \; \msel{v}{x} > 0}{\skipe}{\lambda (h, v). \; \msel{v}{x} \geq 0}$, because the postcondition is \emph{weaker}\index{weaker predicate} than the precondition, meaning the precondition implies the postcondition.
Alternatively, the precondition is \emph{stronger}\index{stronger predicate} than the postcondition, because the precondition keeps all restrictions from the postcondition while adding new ones.
Hoare Logic's \emph{rule of consequence}\index{rule of consequence} allows us to build a new Hoare triple from an old one by \emph{strengthening the precondition}\index{strengthening the precondition} and \emph{weakening the postcondition}\index{weakening the postcondition}.
$$\infer{\hoare{P'}{c}{Q'}}{
  \hoare{P}{c}{Q}
  & (\forall s. \; P'(s) \Rightarrow P(s))
  & (\forall s. \; Q(s) \Rightarrow Q'(s))
}$$

These rules together are \emph{complete}\index{completeness of Hoare logic}, in the sense that any intuitively correct precondition-postcondition pair for a command is provable.
Here we only go into detail on a proof of the dual property, \emph{soundness}\index{soundness of Hoare logic}.

\begin{lemma}\label{hoare_while}
  Assume the following fact: Together, $\bigstep{(h, v, c)}{(h', v')}$, $I(h, v)$, and $\denote{b}(h, v)$ imply $I(h', v')$.
  Then, given $\bigstep{(h, v, \{I\} \while{b}{c})}{(h', v')}$, it follows that $I(h', v')$ and $\neg \denote{b}(h', v')$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\bigstep{(h, v, \{I\} \while{b}{c})}{(h', v')}$.
\end{proof}

That lemma encapsulates once and for all the use of induction in reasoning about the many iterations of loops.

\begin{theorem}[Soundness of Hoare logic]
  If $\hoare{P}{c}{Q}$, $\bigstep{(h, v, c)}{(h', v')}$, and $P(h, v)$, then $Q(h', v')$.
\end{theorem}
\begin{proof}
  By induction on the derivation of $\hoare{P}{c}{Q}$ and inversion on the derivation of $\bigstep{(h, v, c)}{(h', v')}$, appealing to Lemma \ref{hoare_while} in the appropriate case.
\end{proof}

We leave concrete example derivations to the accompanying Coq code, as that level of fiddly detail deserves to be machine-checked.
Note that there is a rather effective automated proof procedure lurking behind the rules introduced in this section:
To prove a Hoare triple, first try applying the rule associated with its top-level syntax-tree constructor (e.g., assignment or loop rule).
If the conclusion of that rule does unify with the goal, apply the rule and proceed recursively on its premises.
Otherwise, apply the rule of consequence to replace the postcondition with one matching that from the matching rule; note that all rules accept arbitrarily shaped preconditions, so we don't actually need to do work to massage the precondition.
After a step like this one, it is guaranteed that the ``fundamental'' rule now applies.

This process creates a pile of side conditions to be proved by other means, corresponding to the assertion implications generated by the rules for loops, assertions, and consequence.
Many real-world tools based on Hoare logic discharge such goals using solvers for satisfiability modulo theories\index{satisfiability modulo theories}, otherwise known as SMT solvers\index{SMT solvers}.
The accompanying Coq code just uses a modest Coq automation tactic definition building on the proof steps we have been using all along.
It is not complete by any means, but it does surprisingly well in the examples we step through, of low to moderate complexity.

Before closing our discussion of the basics of Hoare logic, let's consider how it brings to bear some more of the general principles that we have met before.
\abstraction
A command's precondition and postcondition serve as an \emph{abstraction} of the command: it is safe to model a command with its specification, if it has been proved using a Hoare triple.
\modularity
Furthermore, the Hoare rules themselves take advantage of \emph{modularity} to analyze subcommands separately, mediating between them using only the specifications.
The implementation details of a subcommand don't matter for any other subcommands in the program, so long as that subcommand has been connected to a specification that preserves enough information about its behavior.
It is an art to choose the right specification for each piece of a program.
Detailed specifications minimize the chance that some other part of the program winds up unprovable, despite its correctness, but more detailed specifications also tend to be harder to prove in the first place.


\section{Small-Step Semantics}

Last section's soundness theorem only lets us draw conclusions about programs that terminate.
We call such guarantees \emph{partial correctness}\index{partial correctness}.
Other forms of Hoare triples guarantee \emph{total correctness}\index{total correctness}, which includes termination.
However, sometimes programs aren't meant to terminate, yet we still want to gain confidence about their behavior.
To that end, we first give a small-step semantics for the same programming language.
Then we prove a different soundness theorem for the same Hoare-triple predicate, showing that it also implies a useful invariant for programs as transition systems.

The small-step relation is quite similar to the one from last chapter, though now our states are triples $(h, v, c)$, of heap $h$, variable valuation $v$, and command $c$.

\encoding
$$\infer{\smallstep{(h, v, \assign{x}{e})}{(h, \mupd{v}{x}{\denote{e}(h, v)}, \skipe)}}{}$$

$$\infer{\smallstep{(h, v, \writeto{e_1}{e_2})}{(\mupd{h}{\denote{e_1}(h, v)}{\denote{e_2}(h, v)}, v, \skipe)}}{}$$

$$\infer{\smallstep{(h, v, \skipe; c_2)}{(h, v, c_2)}}{}
\quad \infer{\smallstep{(h, v, c_1; c_2)}{(h', v', c'_1; c_2)}}{
  \smallstep{(h, v, c_1)}{(h', v', c'_1)}
}$$

$$\infer{\smallstep{(h, v, \ifte{b}{c_1}{c_2})}{(h, v, c_1)}}{
  \denote{b}(h, v)
}
\quad \infer{\smallstep{(h, v, \ifte{b}{c_1}{c_2})}{(h, v, c_2)}}{
  \neg \denote{b}(h, v)
}$$

$$\infer{\smallstep{(h, v, \{I\} \while{b}{c})}{(h, v, c; \{I\} \while{b}{c})}}{
  \denote{b}(h, v)
}
\quad \infer{\smallstep{(h, v, \{I\} \while{b}{c})}{(h, v, \skipe)}}{
  \neg \denote{b}(h, v)
}$$

$$\infer{\smallstep{(h, v, \assert{a})}{(h, v, \skipe)}}{
  a(h, v)
}$$


\section{Transition-System Invariants from Hoare Triples}

Even an infinite-looping program must satisfy its $\mathsf{assert}$ commands, every time it passes one of them.
For that reason, it's interesting to consider how to show that a command never gets stuck on a false assertion.
We work up to that result with a few intermediate ones.
First, we define \emph{stuck} much the same way as in the last three chapters: a state $(h, v, c)$ is stuck if $c$ is not $\skipe$, but there is also nowhere to step to from this state.
An example of a stuck state would be one beginning with an $\mathsf{assert}$ of an assertion that does not hold on $h$ and $v$.
In fact, we can prove that any other state is unstuck, though we won't bother here.

\begin{lemma}[Progress]\label{hoare_progress}
  If $\hoare{P}{c}{Q}$ and $P(h, v)$, then $(h, v, c)$ is unstuck.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hoare{P}{c}{Q}$.
\end{proof}

\begin{lemma}\label{hoare_skip}
  If $\hoare{P}{\skipe}{Q}$, then $\forall s. \; P(s) \Rightarrow Q(s)$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hoare{P}{\skipe}{Q}$.
\end{proof}

\begin{lemma}[Preservation]\label{hoare_preservation}
  If $\hoare{P}{c}{Q}$, $\smallstep{(h, v, c)}{(h', v', c')}$, and $P(h, v)$, then $\hoare{\lambda s. \; s = (h', v')}{c'}{Q}$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hoare{P}{c}{Q}$, appealing to Lemma \ref{hoare_skip} in one case.  Note how we conclude a very specific precondition, forcing exact state equality with the one we have stepped to.
\end{proof}

\begin{theorem}[Invariant Safety]
\invariants
  If $\hoare{P}{c}{Q}$ and $P(h, v)$, then unstuckness is an invariant for the small-step transition system starting at $(h, v, c)$.
\end{theorem}
\begin{proof}
  First we weaken the invariant to $I(h, v, c) = \hoare{\lambda s. \; s = (h, v)}{c}{\lambda \_. \; \top}$.
  That is, we focus in on the most specific applicable precondition, and we forget everything that the postcondition was recording for us.
  Note that postconditions are still an essential part of Hoare triples for this proof, but we have already done our detailed analysis of them in the earlier lemmas.
  Lemma \ref{hoare_progress} gives the needed implication from the new invariant to the old.
  
  Next, we apply invariant induction, whose base case follows trivially.
  The induction step follows by Lemma \ref{hoare_preservation}.
\end{proof}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Deep Embeddings, Shallow Embeddings, and Options in Between}
\label{embeddings}

So far, in this book, we have followed the typographic conventions of ordinary mathematics and logic, as they would be worked out on whiteboards.
In parallel, we have mechanized all of the definitions and proofs in Coq.
Often little tidbits of encoding challenge show up in mechanizing the proofs.
As formal languages get more complex, it becomes more and more important to choose the right encoding.
For instance, in the previous chapter, we repeatedly jumped through hoops to track the local variables of programs, threading variable valuations $v$ throughout everything.
Coq already has built into it a respectable notion of variables; can we somehow reuse that mechanism, rather than roll our own new one?
This chapter gives a ``yes'' answer, working toward redefining last chapter's Hoare logic in a lighter-weight manner, along the way introducing some key terminology that is used to classify encoding choices.

Since whiteboard math doesn't usually bother with encoding details, here we must break with our convention of using only standard notation in the book.
Instead, we will use notation closer to literal Coq code, and, in fact, more of the technical action than usual is only found in the accompanying Coq source file.

\section{The Basics}\label{mixed}

Recall some terminology introduced in Section \ref{metalanguage}: every formal proof is carried out in some \emph{metalanguage}\index{metalanguage}, which, in our case, is Coq's logic and programming language called Gallina\index{Gallina}.
A syntactic language that we formalize is called an \emph{object language}\index{object language}.
Often it is convenient to do reasoning without any particular object language, as in this simple arithmetic function that can be defined directly in Gallina.
\begin{eqnarray*}
  \mt{foo} &=& \lambda(x, y). \; \elet{u}{x + y}{\elet{v}{u \times y}{u + v}}
\end{eqnarray*}

However, it is difficult to prove some important facts about terms encoded directly in the metalanguage.
For instance, we can't easily do induction over the syntax of all such terms.
To allow that kind of induction, we can define an object language inductively.
\encoding
\begin{eqnarray*}
  \mt{Const} &:& \mathbb N \to \mt{exp} \\
  \mt{Var} &:& \mathbb V \to \mt{exp} \\
  \mt{Plus} &:& \mt{exp} \to \mt{exp} \to \mt{exp} \\
  \mt{Times} &:& \mt{exp} \to \mt{exp} \to \mt{exp} \\
  \mt{Let} &:& \mathbb V \to \mt{exp} \to \mt{exp} \to \mt{exp}
\end{eqnarray*}

That last example program, with implicit \emph{free variables}\index{free variables} $x$ and $y$, may now be redefined in the $\mt{exp}$ type.
\newcommand{\var}[1]{\mt{Var} \; \textrm{``#1''}}
\begin{eqnarray*}
  \mt{foo'} &=& \mt{Let} \; (\var{u}) \; (\mt{Plus} \; (\var{x}) \; (\var{y})) \; (\mt{Let} \; (\var{v}) \\
  && \hspace{.1in} (\mt{Times} \; (\var{u}) \; (\var{y})) \; (\mt{Plus} \; (\var{u}) \; (\var{v})))
\end{eqnarray*}

As in Chapter \ref{interpreters}, we can define a recursive interpreter, mapping $\mt{exp}$ programs and variable valuations to numbers.
Using that interpreter, we can prove equivalence of $\mt{foo}$ and $\mt{foo'}$.

We say that $\mt{foo}$ uses a \emph{shallow embedding}\index{shallow embedding}, because it is coded directly in the metalanguage, with no extra layer of syntax.
Conversely, $\mt{foo'}$ uses a \emph{deep embedding}\index{deep embedding}, since it goes via the inductively defined $\mt{exp}$ type.

These extremes are not our only options.
In higher-order logics like Coq's, we may also choose what might be called \emph{mixed embeddings}\index{mixed embedding}, which define syntax-tree types that allow some use of general functions from the metalanguage.
Here's an example, as an alternative definition of $\mt{exp}$.
\encoding
\begin{eqnarray*}
  \mt{Const} &:& \mathbb N \to \mt{exp} \\
  \mt{Var} &:& \mathbb V \to \mt{exp} \\
  \mt{Plus} &:& \mt{exp} \to \mt{exp} \to \mt{exp} \\
  \mt{Times} &:& \mt{exp} \to \mt{exp} \to \mt{exp} \\
  \mt{Let} &:& \mt{exp} \to (\mathbb N \to \mt{exp}) \to \mt{exp}
\end{eqnarray*}

The one change is in the type of the $\mt{Let}$ constructor, where now no variable name is given, and instead \emph{the body of the ``let'' is represented as a Gallina function from numbers to expressions}.
The intent is that the body is called on the number that results from evaluating the first expression.
This style is called \emph{higher-order abstract syntax}\index{higher-order abstract syntax}.
Though that term is often applied to a more specific instance of the technique, which is not exactly the one used here, we will not be so picky.

As an illustration of the technique in action, here's our third encoding of the simple example program.
\begin{eqnarray*}
  \mt{foo''} &=& \mt{Let} \; (\mt{Plus} \; (\var{x}) \; (\var{y})) \; (\lambda u. \\
  && \hspace{.1in} \mt{Let} \; (\mt{Times} \; (\mt{Const} \; u) \; (\var{y})) \; (\lambda v. \\
  && \hspace{.2in} \mt{Plus} \; (\mt{Const} \; u) \; (\mt{Const} \; v)))
\end{eqnarray*}

With a bit of subtlety, we can define an interpreter for this language, too.
\begin{eqnarray*}
  \denote{\mt{Const} \; n}v &=& n \\
  \denote{\mt{Var} \; x}v &=& \msel{v}{x} \\
  \denote{\mt{Plus} \; e_1 \; e_2}v &=& \denote{e_1}v + \denote{e_2}v \\
  \denote{\mt{Times} \; e_1 \; e_2}v &=& \denote{e_1}v \times \denote{e_2}v \\
  \denote{\mt{Let} \; e_1 \; e_2}v &=& \denote{e_2(\denote{e_1}v)}v
\end{eqnarray*}

Note how, in the $\mt{Let}$ case, since the body $e_2$ is a function, before evaluating it, we call it on the result of evaluating $e_1$.
This language would actually be sufficient even if we removed the $\mt{Var}$ constructor and the $v$ argument of the interpreter.
Coq's normal variable binding is enough to let us model interesting programs and prove things about them by induction on syntax.

It is important here that Coq's induction principles give us useful induction hypotheses, for constructors whose recursive arguments are functions.
The second argument of $\mt{Let}$ above is an example.
When we do induction on expression syntax to establish $\forall e. \; P(e)$, the case for $\mt{Let} \; e_1 \; e_2$ includes two induction hypotheses.
The first one is standard: $P(e_1)$.
The second one is more interesting: $\forall n : \mathbb N. \; P(e_2(n))$.
That is, the theorem holds on all results of applying body $e_2$ to arguments.


\section{A Mixed Embedding for Hoare Logic}

This general strategy also applies to modeling imperative languages like the one from last chapter.
We can define a polymorphic type family $\mt{cmd}$ of commands, indexed by the type of value that a command is meant to return.
\encoding
\begin{eqnarray*}
  \mt{Return} &:& \forall \alpha. \; \alpha \to \mt{cmd} \; \alpha \\
  \mt{Bind} &:& \forall \alpha, \beta. \; \mt{cmd} \; \beta \to (\beta \to \mt{cmd} \; \alpha) \to \mt{cmd} \; \alpha \\
  \mt{Read} &:& \mathbb N \to \mt{cmd} \; \mathbb N \\
  \mt{Write} &:& \mathbb N \to \mathbb N \to \mt{cmd} \; \mt{unit}
\end{eqnarray*}

We use notation $x \leftarrow c_1; c_2$ as shorthand for $\mt{Bind} \; c_1 \; (\lambda x. \; c_2)$, making it possible to write some very natural-looking programs in this type.
Here are two examples.
\begin{eqnarray*}
  \mt{array\_max}(0, a) &=& \mt{Return} \; a \\
  \mt{array\_max}(i+1, a) &=& v \leftarrow \mt{Read} \; i; \mt{array\_max} \; i \; (\max(v, a)) \\
  \\
  \mt{increment\_all}(0) &=& \mt{Return} \; () \\
  \mt{increment\_all}(i+1) &=& v \leftarrow \mt{Read} \; i; \_ \leftarrow \mt{Write} \; i \; (v+1); \mt{increment\_all} \; i
\end{eqnarray*}

Function $\mt{array\_max}$ computes the highest value found in the first $i$ slots of memory, using an accumulator $a$.
Function $\mt{increment\_all}$ adds 1 to every one of the first $i$ memory slots.

Note that we are not writing programs directly as syntax trees, but rather working with recursive functions that \emph{compute syntax trees}.
We are able to do so despite the fact that we built no support for recursion into the $\mt{cmd}$ type family.
Likewise, we didn't need to build in any support for $\max$, addition, or any of the other operations that are easy to code up in Gallina.

It is straightforward to implement an interpreter for this object language, where each command's interpretation maps input heaps to pairs of output heaps and results.
Note that we have no need for an explicit variable valuation.
\begin{eqnarray*}
  \denote{\mt{Return} \; v}h &=& (h, v) \\
  \denote{\mt{Bind} \; c_1 \; c_2}h &=& \elet{(h', v)}{\denote{c_1}h}{\denote{c_2(v)}h'} \\
  \denote{\mt{Read} \; a}h &=& (h, \msel{h}{a}) \\
  \denote{\mt{Write} \; a \; v}h &=& (\mupd{h}{a}{v}, ())
\end{eqnarray*}

We can also define a syntactic Hoare-logic relation for this type, where preconditions are predicates over initial heaps, and postconditions are predicates over \emph{result values} and final heaps.
$$\infer{\hoare{P}{\mt{Return} \; v}{\lambda r, h. \; P(h) \land r = v}}{}
\quad \infer{\hoare{P}{\mt{Bind} \; c_1 \; c_2}{R}}{
  \hoare{P}{c_1}{Q}
  & \forall r. \; \hoare{Q(r)}{c_2(r)}{R}
}$$
$$\infer{\hoare{P}{\mt{Read} \; a}{\lambda r, h. \; P(h) \land r = \msel{h}{a}}}{}
\quad \infer{\hoare{P}{\mt{Write} \; a \; v}{\lambda r, h. \; \exists h'. \; P(h') \land h = \mupd{h'}{a}{v}}}{}$$
$$\infer{\hoare{P'}{c}{Q'}}{
  \hoare{P}{c}{Q}
  & (\forall h. \; P'(h) \Rightarrow P(h))
  & (\forall r, h. \; Q(r, h) \Rightarrow Q'(r, h))
}$$

Much of the details are the same as last chapter, including in a rule of consequence at the end.
The most interesting new wrinkle is in the rule for $\mt{Bind}$, where the premise about the body command $c_2$ starts with universal quantification over all possible results $r$ of executing $c_1$.
That result is passed off, via function application, both to the body $c_2$ and to $Q$, which serves as the postcondition of $c_1$ and the precondition of $c_2$.

This Hoare logic can be used to verify the two example programs from earlier in this section; see the accompanying Coq code for details.
We also have a standard soundness theorem.
\begin{theorem}
  If $\hoare{P}{c}{Q}$ and $P(h)$ for some heap $h$, then let $(h', r) = \denote{c}h$.  It follows that $Q(r, h')$.
\end{theorem}


\section{Adding More Effects}

We can continue to enhance our object language with different kinds of side effects that are not supported natively by Gallina.
First, we add \emph{nontermination}, in the form of unbounded loops.
For a type $\alpha$, we define $\mathbb O(\alpha)$ as the type of \emph{loop-body outcomes}, either $\mt{Done}(a)$ to indicate that the loop should terminate or $\mt{Again}(a)$ to indicate that the loop should keep running.
Our loops are functional, maintaining accumulators as they run, and the $a$ argument gives the latest accumulator value in each case.
So we add this constructor:
\begin{eqnarray*}
  \mt{Loop} &:& \forall \alpha. \; \alpha \to (\alpha \to \mt{cmd} \; (\mathbb O(\alpha))) \to \mt{cmd} \; \alpha
\end{eqnarray*}

Here's an example of looping in action, in a program that returns the address of the first occurrence of a value in memory, or loops forever if that value is not found in the whole infinite memory.
\begin{eqnarray*}
  \mt{index\_of}(n) &=& \mt{Loop} \; 0 \; (\lambda i. \; v \leftarrow \mt{Read} \; i; \mt{if} \; v = n \; \mt{then} \; \mt{Return} \; (\mt{Done}(i)) \; \mt{else} \; \mt{Return} \; (\mt{Again}(i+1)))
\end{eqnarray*}

With the addition of nontermination, it's no longer straightforward to write an interpreter for the language.
Instead, we implement a small-step operational semantics $\to$; see the accompanying Coq code for details.
We build an extended Hoare logic, keeping all the rules from last section and adding this new one.
\invariants
Like before, it is parameterized on a loop invariant, but now the loop invariant takes a loop-body outcome as parameter.
$$\infer{\hoare{I(\mt{Again}(i))}{\mt{Loop} \; i \; c}{\lambda r. \; I(\mt{Done}(r)))}}{
  \forall a. \; \hoare{I(\mt{Again}(a))}{c(a)}{I}
}$$

This new Hoare logic is usable to verify the example program from above and many more, and we can also prove a soundness theorem.
The operational semantics gives us the standard way of interpreting our programs as transition systems, with states $(c, h)$.

\invariants
\begin{theorem}
  If $\hoare{P}{c}{Q}$ and $P(h)$ for some heap $h$, then it is an invariant of $(c, h)$ that, if the command ever becomes $\mt{Return} \; r$ in a heap $h'$, then $Q(r, h')$.  That is, if the program terminates, the postcondition is satisfied.
\end{theorem}

We can add a further side effect to the language: \emph{exceptions}\index{exceptions}.
Actually, we stick to a simple variant of this classic side effect, where there is just one exception, and it cannot be caught.
We associate this exception with \emph{program failure}, and the Hoare logic will ensure that programs never actually fail.

The extension to program syntax is easy:
\begin{eqnarray*}
  \mt{Fail} &:& \forall \alpha. \; \mt{cmd} \; \alpha
\end{eqnarray*}
That is, a failing program can be considered to return any result type, since it will never actually return normally, instead throwing an uncatchable exception.

The operational semantics is also easily extended to signal failures, with a new special system state called $\mt{Failed}$.
We also add this Hoare-logic rule.
$$\infer{\hoare{\lambda \_. \; \bot}{\mt{Fail}}{\lambda \_, \_. \; \bot}}{}$$
That is, failure can only be verified against an unsatisfiable precondition, so that we know that the failure is unreachable.

With this extension, we can prove a soundness-theorem variant, capturing the impossibility of failure.

\invariants
\begin{theorem}
  If $\hoare{P}{c}{Q}$ and $P(h)$ for some heap $h$, then it is an invariant of $(c, h)$ that the state never becomes $\mt{Failed}$.
\end{theorem}

Note that this version of the theorem still tells us interesting things about programs that run forever.
It is easy to implement runtime assertion checking with code that performs some test and runs $\mt{Fail}$ if the test does not pass.
An infinite-looping program may perform such tests infinitely often, and we learn that none of the tests ever fail.

The accompanying Coq code demonstrates another advantage of this mixed-embedding style: we can extract\index{extraction} our programs to OCaml\index{OCaml} and run them efficiently.
That is, rather than using functional programming to implement our three kinds of side effects, we implement them directly with OCaml's mutable heap, unbounded recursion, and exceptions, respectively.
As a result, our extracted programs achieve the asymptotic performance that we would expect, thinking of them as C-like code, where interpreters in a pure functional language like Gallina would necessarily add at least an extra logarithmic factor in the modeling of unboundedly growing heaps.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Separation Logic}\label{seplog}

In our Hoare-logic examples so far, we have intentionally tread lightly when it comes to the potential aliasing\index{aliasing} of pointer variables in a program.
Generally, we have only worked with, for instance, a single array at a time.
Reasoning about multi-array programs usually depends on the fact that the arrays don't overlap in memory at all.
Things are even more complicated with linked data structures, like linked lists and trees, which we haven't even attempted up to now.
However, by using \emph{separation logic}\index{separation logic}, a popular variant of Hoare logic, we will find it quite pleasant to prove programs that use linked structures, with no need for explicit reasoning about aliasing, assuming that we keep all of our data structures disjoint from each other through simple coding patterns.


\section{An Object Language with Dynamic Memory Allocation}

Before we get into proofs, let's fix a mixed-embedding object language.
$$\begin{array}{rrcl}
  \textrm{Commands} & c &::=& \mt{Return} \; v \mid x \leftarrow c; c \mid \mt{Loop} \; i \; f \mid \mt{Fail} \\
  &&& \mid \mt{Read} \; n \mid \mt{Write} \; n \; n \mid \mt{Alloc} \; n \mid \mt{Free} \; n \; n
\end{array}$$

A small-step operational semantics explains what these commands mean.

$$\infer{\smallstep{(h, x \leftarrow c_1; c_2(x))}{(h', x \leftarrow c'_1; c_2(x))}}{
  \smallstep{(h, c_1)}{(h', c'_1)}
}
\quad \infer{\smallstep{(h, x \leftarrow \mt{Return} \; v; c(x))}{(h, c(v))}}{}$$

$$\infer{\smallstep{(h, \mt{Loop} \; i \; f)}{(h, x \leftarrow f(i); \mt{match} \; x \; \mt{with} \; \mt{Done}(a) \Rightarrow \mt{Return} \; a \mid \mt{Again}(a) \Rightarrow \mt{Loop} \; a \; f)}}{}$$

$$\infer{\smallstep{(h, \mt{Read} \; a)}{(h, \mt{Return} \; v)}}{
  \msel{h}{a} = v
}
\quad \infer{\smallstep{(h, \mt{Write} \; a \; v')}{(\mupd{h}{a}{v'}, \mt{Return} \; ())}}{
  \msel{h}{a} = v
}$$

$$\infer{\smallstep{(h, \mt{Alloc} \; n)}{(\mupd{h}{a}{0^n}, \mt{Return} \; a)}}{
  \dom{h} \cap [a, a+n) = \emptyset
}
\quad \infer{\smallstep{(h, \mt{Free} \; a \; n)}{(h - [a, a+n), \mt{Return} \; ())}}{
}$$

A few remarks about the last four rules:
The basic $\mt{Read}$ and $\mt{Write}$ operations now get \emph{stuck} when accessing unmapped addresses.
The premise of the rule for $\mt{Alloc}$ enforces that address $a$ denotes a currently unmapped memory region of size $n$.
We use a variety of convenient notations that we won't define in detail here, referring instead to the accompanying Coq code.
Another notation uses $0^n$ to refer informally to a sequence of $n$ zeroes to write into memory.
Similarly, the conclusion of the $\mt{Free}$ rule unmaps a whole size-$n$ region, starting at $a$.
We could also have chosen to enforce in this rule that the region starts out as mapped into $h$.

\section{Assertion Logic}

Separation logic is based on two big ideas.
The first one has to do with the \emph{assertion logic}\index{assertion logic}, which we use to write invariants; while the second one has to do with the \emph{program logic}\index{program logic}, which we use to prove that programs satisfy specifications.
The assertion logic is based on predicates over \emph{partial memories}\index{partial memories}, or finite maps from addresses to stored values.
Because they are finite, they omit infinitely many addresses, and it is crucial that we are able to describe heaps that intentionally leave addresses out of their domains.
Informally, a predicate \emph{claims ownership}\index{ownership} of addresses in the domains of matching heaps.

\newcommand{\emp}[0]{\mt{emp}}
\newcommand{\lift}[1]{[#1]}
\newcommand{\ptsto}[2]{#1 \mapsto #2}

We can describe the connectives of separation logic in terms of the sets of partial heaps that they accept.
\begin{eqnarray*}
  \emp &=& \{\mempty\} \\
  \ptsto{p}{v} &=& \{\mupd{\mempty}{p}{v}\} \\
  \lift{\phi} &=& \{h \mid \phi \land h = \mempty\} \\
  \exists x. \; P(x) &=& \{h \mid \exists x. \; h \in P(x)\} \\
  P * Q &=& \{h_1 \uplus h_2 \mid h_1 \in P \land h_2 \in Q\}
\end{eqnarray*}

The formula $\emp$ accepts only the empty heap, while formula $\ptsto{p}{v}$ accepts only the heap whose only address is $p$, mapped to value $v$.
We overload the $\mapsto$ operator in that second line above, to denote ``points-to'' on the lefthand side of the equality and finite-map overriding on the righthand side.
Notation $\lift{\phi}$ is \emph{lifting}\index{lifting pure propositions} a \emph{pure} (i.e., regular old mathematical) proposition $\phi$ into an assertion, enforcing both that the heap is empty and that $\phi$ is true.
We also adapt the normal existential quantifier to this setting.

The essential definition is the last one, of the \emph{separating conjunction}\index{separating conjunction} $*$.
We use the notation $h_1 \uplus h_2$ for \emph{disjoint union} of heaps $h_1$ and $h_2$, implicitly enforcing $\dom{h_1} \cap \dom{h_2} = \emptyset$.
The intuition of separating conjunction is that we \emph{partition} the overall heap into two subheaps, each of which matches one of the respective conjuncts $P$ and $Q$.
This connective implicitly enforces \emph{lack of aliasing}, leading to separation logic's famous conciseness of specifications that combine data structures.

We can also define natural comparison operators between assertions, overloading the usual notations for equivalence and implication of propositions.
\begin{eqnarray*}
  P \Leftrightarrow Q &=& \forall h. \; h \in P \Leftrightarrow h \in Q \\
  P \Rightarrow Q &=& \forall h. \; h \in P \Rightarrow h \in Q
\end{eqnarray*}

The core connectives satisfy a number of handy algebraic laws.
Here is a sampling (where we note that the first one takes advantage of the overloading of implication $\Rightarrow$ in two different senses).

$$\infer{P * \lift{\phi} \Rightarrow Q}{
  \phi \Rightarrow (P \Rightarrow Q)
}
\quad \infer{P \Rightarrow Q * \lift{\phi}}{
  \phi
  & P \Rightarrow Q
}
\quad \infer{P \Leftrightarrow \lift{\phi} * P}{
  \phi
}$$

$$\infer{P * Q \Leftrightarrow Q * P}{}
\quad \infer{P * (Q * R) \Leftrightarrow (P * Q) * R}{}
\quad \infer{P_1 * Q_1 \Rightarrow P_2 * Q_2}{
  P_1 \Rightarrow P_2
  & Q_1 \Rightarrow Q_2
}$$

$$\infer{(P * \exists x. \; Q(x)) \Leftrightarrow \exists x. \; P * Q(x)}{}
\quad \infer{(\exists x. \; P(x)) \Rightarrow Q}{
  \forall x. \; P(x) \Rightarrow Q
}
\quad \infer{P \Rightarrow \exists x. \; Q(x)}{
  P \Rightarrow Q(v)
}$$

This set of algebraic laws has a very special consequence: it supports automated proof of implications by \emph{cancellation}\index{cancellation}, where we repeatedly ``cross out'' matching subformulas on the two sides of the arrow.
Consider this example formula that we might want to prove.
$$(\exists q. \; \ptsto{p}{q} * \exists r. \; \ptsto{q}{r} * \ptsto{r}{0}) \Rightarrow (\exists a. \; \exists b. \; \exists c. \; \ptsto{b}{c} * \ptsto{p}{a} * \ptsto{a}{b})$$

First, the laws above allow us to bubble all quantifiers to the fronts of formulas.
$$(\exists q, r. \; \ptsto{p}{q} * \ptsto{q}{r} * \ptsto{r}{0}) \Rightarrow (\exists a, b, c. \; \ptsto{b}{c} * \ptsto{p}{a} * \ptsto{a}{b})$$

Next, all $\exists$ to the left can be replaced with fresh free variables, while all $\exists$ to the right can be replaced with fresh \emph{unification variables}\index{unification variables}, whose values, in terms of the free-variable values, we can deduce in the course of the proof.
$$\ptsto{p}{q} * \ptsto{q}{r} * \ptsto{r}{0} \Rightarrow \; \ptsto{?b}{?c} \; * \; \ptsto{p}{?a} \; * \; \ptsto{?a}{?b}$$

Next, we find matching subformulas to \emph{cancel}.
We start by matching $\ptsto{p}{q}$ with $\ptsto{p}{?a}$, learning that $?a = q$ and reducing to the following formula.
This crucial step relies on the three key properties of $*$, given in the second row of rules above: commutativity, associativity, and cancellativity\index{cancellativity}.
$$\ptsto{q}{r} * \ptsto{r}{0} \Rightarrow \; \ptsto{?b}{?c} \; * \; \ptsto{q}{?b}$$

We run another cancellation step of $\ptsto{q}{r}$ against $\ptsto{q}{?b}$, learning $?b = r$.
$$\ptsto{r}{0} \Rightarrow \; \ptsto{r}{?c}$$

Now we can finish the proof by reflexivity of $\Rightarrow$, learning $?c = 0$.

\section{Program Logic}

We use our automatic cancellation procedure to discharge some of the premises from the rules of the program logic, which we present now.
First, here are the rules that are (almost) exactly the same as from last chapter.

$$\infer{\hoare{P}{\mt{Return} \; v}{\lambda r. \; P * \lift{r = v}}}{}
\quad \infer{\hoare{P}{x \leftarrow c_1; c_2(x)}{R}}{
  \hoare{P}{c_1}{Q}
  & (\forall r. \; \hoare{Q(r)}{c_2(r)}{R})
}$$

$$\infer{\hoare{I(\mt{Again}(i))}{\mt{Loop} \; i \; f}{\lambda r. \; I(\mt{Done}(r))}}{
  \forall a. \; \hoare{I(\mt{Again}(a))}{f(a)}{I}
}
\quad \infer{\hoare{\lift{\bot}}{\mt{Fail}}{\lambda \_. \; \lift{\bot}}}{}$$

$$\infer{\hoare{P'}{c}{Q'}}{
  \hoare{P}{c}{Q}
  & P' \Rightarrow P
  & \forall r. \; Q(r) \Rightarrow Q'(r)
}$$

More interesting are the rules for primitive memory operations.
First, we have the rule for $\mt{Read}$.

$$\infer{\hoare{\exists v. \; \ptsto{a}{v} * R(v)}{\mt{Read} \; a}{\lambda r. \; \ptsto{a}{r} * R(r)}}{}$$

In words: before reading from address $a$, it must be the case that $a$ points to some value $v$, and predicate $R(v)$ records what else we know about the memory at that point.
Afterward, we know that $a$ points to the result $r$ of the read operation, and $R$ is still present.
We call $R$ a \emph{frame predicate}\index{frame predicate}, recording what we know about parts of memory that the command does not touch directly.
We might also say that the \emph{footprint} of this command is the singleton set $\{a\}$.
In general, frame predicates record preserved facts about addresses outside a command's footprint.
The next few rules don't have frame predicates baked in; we finish with a rule that adds them back, in a generic way for arbitrary Hoare triples.

$$\infer{\hoare{\exists v. \; \ptsto{a}{v}}{\mt{Write} \; a \; v'}{\lambda \_. \; \ptsto{a}{v'}}}{}$$

This last rule, for $\mt{Write}$, is even simpler.
We see a straightforward illustration of overwriting $a$'s old value $v$ with the new value $v$'.

$$\infer{\hoare{\emp}{\mt{Alloc} \; n}{\lambda r. \; \ptsto{r}{0^n}}}{}
\quad \infer{\hoare{\ptsto{a}{\; ?^n}}{\mt{Free} \; a \; n}{\lambda \_. \; \emp}}{}$$

The rules for allocation and deallocation deploy a few notations that we don't explain in detail here, with $0^n$ for sequences of $n$ zeroes and $?^n$ for sequences of $n$ arbitrary values.

The next rule, the \emph{frame rule}\index{frame rule}, gives the second key idea of separation logic, supporting the \emph{small-footprint}\index{small-footprint style} reasoning style.

$$\infer{\hoare{P * R}{c}{\lambda r. \; Q(r) * R}}{
  \hoare{P}{c}{Q}
}$$

In other words, any Hoare triple can be extended by conjoining an arbitrary predicate $R$ in both precondition and postcondition.
Even more intuitively, when a program satisfies a spec, it also satisfies an extended spec that records the state of some other part of memory that is untouched (i.e., is outside the command's footprint).

For the pragmatics of proving particular programs, we defer to the accompanying Coq code.
\modularity
However, for modular proofs, the frame rule has such an important role that we want to emphasize it here.
It is possible to define (recursively) a predicate $\mt{llist}(\ell, p)$, capturing the idea that the heap contains exactly an imperative linked list, rooted at pointer $p$, representing functional linked list $\ell$.
We can also prove a general specification for a list-reversal function:
$$\forall \ell, p. \; \hoare{\mt{llist}(\ell, p)}{\texttt{reverse}(p)}{\lambda r. \; \mt{llist}(\mt{rev}(\ell), r)}$$

Now consider that we have the roots $p_1$ and $p_2$ of two disjoint lists, respectively representing $\ell_1$ and $\ell_2$.
It is easy to instantiate the general theorem and get $\hoare{\mt{llist}(\ell_1, p_1)}{\texttt{reverse}(p_1)}{\lambda r. \; \mt{llist}(\mt{rev}(\ell_1), r)}$ and $\hoare{\mt{llist}(\ell_2, p_2)}{\texttt{reverse}(p_2)}{\lambda r. \; \mt{llist}(\mt{rev}(\ell_2), r)}$.
Applying the frame rule to the former theorem, with $R = \mt{llist}(\ell_2, p_2)$, we get:
$$\hoare{\mt{llist}(\ell_1, p_1) * \mt{llist}(\ell_2, p_2)}{\texttt{reverse}(p_1)}{\lambda r. \; \mt{llist}(\mt{rev}(\ell_1), r) * \mt{llist}(\ell_2, p_2)}$$
Similarly, applying the frame rule to the latter, with $R = \mt{llist}(\mt{rev}(\ell_1), r)$, we get:
$$\hoare{\mt{llist}(\ell_2, p_2) * \mt{llist}(\mt{rev}(\ell_1), r)}{\texttt{reverse}(p_2)}{\lambda r'. \; \mt{llist}(\mt{rev}(\ell_2), r') * \mt{llist}(\mt{rev}(\ell_1), r)}$$
Now it is routine to derive the following spec for a larger program:
$$\begin{array}{l}
  \{\mt{llist}(\ell_1, p_1) * \mt{llist}(\ell_2, p_2)\} \\
  \hspace{.2in} r \leftarrow \texttt{reverse}(p_1); r' \leftarrow \texttt{reverse}(p_2); \; \mt{Return}(r, r') \\
  \{\lambda (r, r'). \; \mt{llist}(\mt{rev}(\ell_1), r) * \mt{llist}(\mt{rev}(\ell_2), r')\}
\end{array}$$

Note that this specification would be incorrect if the two input lists could share any memory cells!
The separating conjunction $*$ in the precondition implicitly formalizes our expectation of nonaliasing.
The proof internals require only the basic rules for $\mt{Return}$ and sequencing, in addition to the rule of consequence, whose side conditions we discharge using the cancellation approach sketched in the previous section.

Note also that this highly automatable proof style works just as well when calling functions associated with several different data structures in memory.
The frame rule provides a way to show that any function, in any library, preserves arbitrary memory state outside its footprint.

\section{Soundness Proof}

Our Hoare logic is sound with respect to the object language's operational semantics.

\invariants
\begin{theorem}
  If $\hoare{P}{c}{Q}$ and $P(\mempty)$, then it is an invariant of the transition system starting from $(\mempty, c)$ that either the command has become a $\mt{Return}$ or another execution step is possible.
\end{theorem}

As usual, the key to the proof is to find a stronger invariant that can be proved by invariant induction.
In this case, we use the invariant $\lambda (h, c). \; \hoare{\{h\}}{c}{Q}$.
That is, assert a Hoare triple where the precondition enforces exact heap equality with the current heap.
The postcondition can remain the same throughout execution.

A few key lemmas are interesting enough to mention here; we leave other details to the Coq code.

First, we need to prove that this fancier invariant implies the one from the theorem statement, and the most direct statement needs to be strengthened, to get the induction to go through.

\begin{lemma}[Progress]
  If $\hoare{P}{c}{Q}$ and $P(h_1)$, then either $c$ is a $\mt{Return}$ or it is possible to take a step from $(h_1 \uplus h_2, c)$, for any disjoint $h_2$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hoare{P}{c}{Q}$.
\end{proof}

Note the essential inclusion of a disjoint union with the auxiliary heap $h_2$.
Without this strengthening of the obvious property, we would get stuck in the case of the proof for the frame rule.

\begin{lemma}[Preservation]
  If $\smallstep{(h, c)}{(h', c')}$ and $\hoare{\{h\}}{c}{Q}$, then $\hoare{\{h'\}}{c'}{Q}$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\smallstep{(h, c)}{(h', c')}$.
\end{proof}

The different cases of the proof depend on some not-entirely-obvious inversion lemmas.
For instance, here is the one we prove for $\mt{Write}$.

\begin{lemma}
  If $\hoare{P}{\mt{Write} \; a \; v'}{Q}$, then there exists $R$ such that:
  \begin{itemize}
  \item $P \Rightarrow \exists v. \; \ptsto{a}{v} * R$
  \item $\ptsto{a}{v'} * R \Rightarrow Q(())$
  \end{itemize}
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hoare{P}{\mt{Write} \; a \; v'}{Q}$.
\end{proof}

Again, without the introduction of the $R$ variable, we would get stuck proving the case for the frame rule.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Connecting to Real-World Programming Languages}

Our exercises so far have been confined to proofs of programs in idealized programming languages.
How can proof assistants be used to derive results about full-scale languages?
Developers are already used to coordinating build processes\index{build processes} that plumb together multiple languages and tools.
A proof assistant like Coq can become one tool in such a build flow.
However, the interesting wrinkle that arises is the chance to do more than just get tools cooperating on building executable code.
We can also get tools cooperating on generating proofs of parts of the system.

It is worth emphasizing that this whole subject is a very active area of research.
There are many competing approaches, and it is not clear which will be most practical in the long run.
With this chapter, we survey the interesting design dimensions and approaches to them that are known today.
We also go into more detail on one avant-garde approach, of verified compilation of shallowly embedded programs to deeply embedded programs in Coq.

\section{Where Does the Buck Stop?}

For any system where we really care about correctness (or its special case of security\index{security}), it is common to delineate a \emph{trusted code base (TCB)}\index{trusted code base}\index{TCB}: the parts of the system where bugs could invalidate correctness.
By implication, system components outside the TCB can be arbitrarily buggy without endangering correctness.

When we use Coq, the Coq proof checker itself is in the TCB.
Implicitly, all the infrastructure below the proof checker is also trusted.
That includes the OCaml\index{OCaml} compiler (since Coq is implemented in OCaml), the operating-system kernel\index{operating systems}, the processor beneath\index{processors}, and so on.
Interestingly, most of Coq is \emph{not} trusted.
For instance, the tactic engine outputs proof terms\index{proof terms} in a core language, and we only need to trust the (relatively small) checker for that language.

The point is, we always draw the trust boundary somewhere, and we always reason informally starting at some layer of a system.
Our real-world-connecting examples of prior chapters have stopped at extracting OCaml code from Coq developments.
We could avoid trusting extraction (and the OCaml compiler) by instead proving properties of C abstract syntax trees.
However, then we are still trusting the C compiler!
So we could verify that compiler and even the operating system its outputs run on.
However, then we are still trusting the computer processor!
So we could verify the processor, even down to the level of circuits\index{circuits} with analog dynamics modeled using differential equations\index{differential equations}.
However, then we are still trusting our characterization of the laws of physics!

In summary, it is naive to suggest that some formal-methods developments ``prove the whole system'' while others do not.

\section{Modes of Connecting to Real Artifacts}

\encoding
Any strategy in this space involves some connection between a proved component and an unproved component.
Oversimplifying a bit, we are considering what is the ``last level'' of a proof that spans multiple abstraction layer\index{abstraction layers}.
(The ``first level'' will be the top-level specification of an application or whatever other piece has a proof that is not used as a lemma for some other proof.)

\subsection{Modes Based on Extraction}

The easiest ``last level'' connection is via \emph{extraction}\index{extraction}, which translates Gallina programs into functional programs in languages like OCaml\index{OCaml}, Haskell\index{Haskell}, and Scheme\index{Scheme}.
Other proof assistants than Coq tend to include similar extraction facilities.
When the formal object of study is already a purely functional program, extraction is a very natural fit.
Its main downsides are TCB size and performance.
On the TCB front, a variety of compilers, including the one for extraction itself, remain trusted.
On the performance front, it is often possible to make a functional program run much faster or use much less memory by translating it to, say, a C program that doesn't rely on garbage collection.

Extraction can be part of other, less-obvious connections, where we bring certain kinds of \emph{side effects}\index{side effects} and real-world interactions into scope.
For instance, a common class of applications is \emph{reactive systems}\index{reactive systems}.
The system is viewed as an object, in the sense of object-oriented programming\index{object-oriented programming}, with encapsulated private state and public methods that are allowed to read and modify that state.
Some methods are designated as \emph{event handlers}\index{event handlers}: they are called when various input events take place in the real world, like when a packet is received from a network.
An event handler in turn signals output actions to take in response.
This model is actually quite easy to implement using extraction: a reactive system is a choice of private state type plus a set of pure functions, each taking in the state and returning the new modified state.
The pure functions are the methods of the object.
Expressive power in this style is very high, though the TCB-size and performance objections remain.

In Chapter \ref{embeddings}, we already saw another approach to adding side effects\index{side effects} atop extracted code: extract syntax trees, perhaps in a mixed embedding\index{mixed embedding}, run with an interpreter\index{interpreter} coded directly in the target language of extraction.
That interpreter is free to use whatever side effects the host language supports.
If anything, the TCB-size and performance objections increase with this approach, given the additional level of indirection that comes from using syntax trees and interpretation.
However, the flexibility can be very appealing, with a straightforward means of allowing most any side effect.

\subsection{Modes Based on Explicit Rendering}

Another ``last level'' strategy is to do explicit translation between abstract syntax trees and concrete, textual code formats.
These translations usually have nothing proved about them, meaning they belong to TCBs, but often the translators are simple enough that it is relatively unworrying to trust them.

In one direction, we implement a tool that takes in, say, the textual syntax of C code, outputting Coq source code for corresponding explicit syntax trees.
Now we can reason about these trees like any other mathematical objects coded manually in Coq.
A downside of this approach is that it is relatively complex to parse mainstream programming languages, yet we are trusting just such a parser.
However, this style often supports the smoothest integration with legacy code bases.

In the other direction, we write a Coq function from syntax trees to strings of concrete code in some widely used language.
This function is still trusted, but it tends to be much shorter and worthy of trust than its inverse.
Coq can be run as part of a build process, printing to the screen the string that has been computed from a syntax tree.
A scripting language can be used to extract the string from Coq's output, write it to a file, and call a conventional compiler.
A major challenge of this approach is that only deeply embedded languages have straightforward printing to concrete syntax, in practice, while shallowly embedded languages tend to be easier to do proofs about.


\section{The Importance of Modularity}

\modularity
Our discussion so far leaves out an important dimension.
Often significant projects are divided into libraries\index{libraries}, and we want to be able to prove libraries independently of each other.
We have a few choices for facing this reality of large-scale development.

The easiest approach is to use Coq pipelines to generate single libraries, which are linked outside of Coq.
Our libraries may even be linked\index{linking} with modules written in other languages or that would otherwise resist whatever proof methods we used.
These connections across languages may enlarge the TCB significantly, but we can boost performance by linking with crucial but unverified code.

We may also want to give modules first-class status in Coq and prove the correctness of linking mechanisms.
In concert with verified compilers\index{compiler verification}, possibilities open up to do linking across languages without expanding the TCB.
All languages can be compiled to some common format, like assembly language, and the compiled version of each library can be given a specification in a common logical format, like a particular Hoare logic.
With all the pieces in place, we wind up with the semantics of the common language in the TCB, but the compilers and all aspects of their source languages stay outside the TCB.


\section{Case Study: From a Mixed Embedding to a Deep Embedding}

This chapter's associated Coq code works out a case study of verified compilation from a mixed embedding to a deep embedding, realizing one of the ``last level'' options above that keeps the best of both worlds: straightforward program proof with a mixed embedding, but the smaller TCB and improved performance that comes from outputting concrete C\index{C programming language} syntax from Coq.

\subsection{Source and Target Languages}

Our mixed-embedding source language will be a simplification of last chapter's language.
$$\begin{array}{rrcl}
  \textrm{Commands} & c &::=& \mt{Return} \; v \mid x \leftarrow c; c \mid \mt{Loop} \; i \; f \mid \mt{Read} \; n \mid \mt{Write} \; n \; n
\end{array}$$

Our deep-embedding target language will expose essentially the same features but more in the traditional style of C and related languages.
$$\begin{array}{rrcl}
  \textrm{Expressions} & e &::=& x \mid n \mid e + e \mid \readfrom{e} \\
  \textrm{Statements} &s &::=& \skipe \mid \assign{x}{e} \mid \writeto{e}{e} \mid s; s \mid \ifte{e}{s}{s} \mid \while{e}{s}
\end{array}$$

We assume standard small-step semantics for both languages, referring to the Coq code for details.
A state of the source language takes the form $(h, c)$ for a heap $h$, while a state of the target language adds a variable valuation $v$, for triples $(h, v, c)$.

\subsection{Formal Compilation}

It is not at all obvious how to translate from a mixed embedding\index{mixed embedding} to a deep embedding\index{deep embedding}.
We can't just write a compiler as a Gallina function, because the $\mt{Bind}$ construct is encoded using functions of the metalanguage.
As a result, there is no way to ``recurse under a binder''!

\newcommand{\dscomp}[3]{#1 \vdash #2 \hookrightarrow #3}

However, inductive predicate definitions\index{inductive predicates} give us all the power we need, when we mix in the power of logical quantifiers\index{quantifiers}.
We will define a judgment $\dscomp{v}{c}{s}$, indicating that mixed-embedded command $c$ can be compiled to statement $s$, assuming that we start running $s$ when the valuation includes every mapping from $v$.

A good warmup is defining a related judgment $\dscomp{v}{n}{e}$, compiling normal Gallina numeric expressions $n$ into syntactic expressions $e$.

$$\infer{\dscomp{v}{n}{x}}{
  v(x) = n
}
\quad \infer{\dscomp{v}{n_1 + n_2}{e_1 + e_2}}{
  \dscomp{v}{n_1}{e_1}
  & \dscomp{v}{n_2}{e_2}
}$$

So far, the logic is straightforward.
When we want to mention a number, it suffices to find a variable that has already been assigned that number.
Translation recurses through addition in a natural way.
(Note that, in the addition rule, the ``+'' on the left of the arrow is normal Gallina addition, while the ``+'' on the right is syntactic ``plus'' of the deeply embedded language!)

Another rule may appear at first to be overly powerful.
$$\infer{\dscomp{v}{n}{n}}{}$$

That is, any numeric expression may be injected into the deep embedding \emph{as a constant}.
How can we hope to embed all Gallina expressions in C?
The details of the command-compilation rules reveal why we are safe, so let us turn to those rules.

The rules we show here are simplified from the full set in the Coq development, supporting an even smaller subset of the source language, to make the presentation easier to understand.
The rule for lone $\mt{Return}$ commands is simple, delegating most of the work to expression compilation, using a designated variable \texttt{result} to hold the final answer of a command.

$$\infer{\dscomp{v}{\mt{Return} \; n}{\assign{\texttt{result}}{e}}}{
  \dscomp{v}{n}{e}
}$$

The most interesting rules cover uses of $\mt{Bind}$ on various primitive operations directly.
Here is the simplest such rule, where the primitive is simple $\mt{Return}$.

$$\infer{\dscomp{v}{x \leftarrow \mt{Return} \; n; c(x)}{\assign{y}{e}; s}}{
    y \notin \dom{v}
    & \dscomp{v}{n}{e}
    & (\forall w. \; \dscomp{\mupd{v}{y}{w}}{c(w)}{s})
}$$

This rule selects a deep-embedding variable name $y$ to correspond to $x$ in the source program.
(Actually, the source-program $\mt{Bind}$ is encoded as a call to a higher-order function, so no choice of a particular $x$ is present.)
The name $y$ must not already be present in the valuation $v$ -- it must not have been chosen already for a command executed earlier.
Next, we find an expression $e$ that represents the value $n$ being bound.
Finally, we reach the trickiest part.
The statement $s$ needs to represent the $\mt{Bind}$ body $c$, \emph{for any value $w$ that $n$ might evaluate to}.
For every such choice $w$, we extend $v$ to record that $w$ was assigned to $y$.
In parallel, we pass $w$ into the body $c$.
As a result, the expression-translation rule for variables can pick up on this connection and compile mentions of $w$ into uses of $y$!

Here is where we see why it wasn't problematic earlier to include a rule that translates any number $n$ into a constant in the deep embedding.
Most numeric expressions in a source program will depend on results of earlier $\mt{Bind}$ operations.
However, \emph{the quantified premise of the last rule enforces that the output statement $s$ is not allowed to depend on $w$ directly}!
The values of introduced variables can only be accessed through deeply embedded variable names.
In other words, if we tried to use the constant rule directly to prove the quantified premise of that last rule, when the body involves a $\mt{Return}$ of a complex expression that mentions the bound variable, we would generate $s$ that includes $w$, which is not allowed.

Similar rules are present for $\mt{Read}$ and $\mt{Write}$, which interestingly are compiled the same way as $\mt{Return}$.
From a syntactic, compilation standpoint, they do not behave differently from pure computation.
More involved and raising new complications is the rule for loops; see the Coq code for details.

\subsection{Soundness Proof}

\begin{theorem}\label{dscompsim}
  If $\dscomp{v}{c}{s}$, then the source-language transition system starting at $(h, c)$ simulates\index{simulation} the target-language transition system starting at $(h, v, s)$.
\end{theorem}
\begin{proof}
  In other words, every execution of the target-language system can be mimicked by one of the source-language system, where the states are connected throughout by some relation that we choose.
  A good choice is the relation $\sim$ defined by this inference rule.
  $$\infer{(h, v \uplus v', s) \sim (h, c)}{
    \dscomp{v}{c}{s}
  }$$

  Note that the only departure from the theorem statement itself is the allowance for compiled program $s$ to run in a \emph{larger} valuation than we compiled it against.
  It is safe to provide extra variables $v'$ (merged into $v$ with disjoint union), so long as the program never reads them, which our translation judgment enforces.
  We need to allow for extra variables because loop bodies run multiple times, with later iterations technically being exposed to temporary-variable values set by earlier iterations.

  Actually, we need to modify our translation judgment so that it also applies to ``silly'' intermediate states of execution in the target language.
  For instance, we wind up with $\skipe$s that are quickly stepped away, yet those configurations must be related to source configurations by $\sim$.
  Here is one example of the extra rules that we need to add to make our induction hypothsis strong enough.
  $$\infer{\dscomp{v}{x \leftarrow \mt{Return} \; n; c(x)}{\mt{skip}; s}}{
    v(y) = n
    & \dscomp{v}{c(n)}{s}
  }$$

  The premises encode our expectation that an assignment of $n$ to $y$ ``just ran.''
\end{proof}

This result can be composed with soundness of any Hoare logic for the source language.
The associated Coq code defines one, essentially following our separation logic\index{separation logic} from last chapter.

\begin{theorem}
  If $\hoare{P}{c}{Q}$, $P(h)$, $\dscomp{v}{c}{s}$, and $\texttt{result} \notin \dom{v}$, then it is an invariant of the transition system starting in $(h, v, s)$ that execution never gets stuck.
\end{theorem}
\begin{proof}
  First, we switch to proving an invariant of the system $(h, c)$ using the simulation from Theorem \ref{dscompsim}.
  Next, we use the soundness theorem of the Hoare logic to weaken the invariant proved in that way into the one we want in the end.
\end{proof}

At this point, we can verify the high-level program conveniently while arriving at a low-level program automatically.
That low-level program is easy to print as a string of concrete C code, as the associated Coq code demonstrates.
We only trust the simple printing process, not the compiler that got us from a mixed embedding to a C-like syntax tree.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Deriving Programs from Specifications}\label{deriving}

We have generally focused so far on proving that programs meet specifications.
What if we could generate programs from their specifications, in ways that guarantee correctness?
Let's explore that direction, in the tradition of \emph{program derivation}\index{program derivation} via \emph{stepwise refinement}\index{stepwise refinement}.

\section{Sets as Computations}

The heart of stepwise refinement is to start with a specification and gradually transform it until it deserves to be called an implementation.
It will help to use a common program format for specifications, implementations, and intermediate states on the path from the former to the latter.
One convenient choice is \emph{sets of allowable answers}.
A specification is naturally considered as a relation $R$ between inputs and outputs, where the set-based version of the specification for inputs $x$ is $\mt{spec}(x) = \{y \mid x \; R \; y\}$.
An implementation is naturally considered as a function $f$ from an input to an output, which can be modeled with singleton sets as $\mt{impl}(x) = \{f(x)\}$.
Intermediate terms in our derivations may still be sets with multiple elements, but we aim to winnow down to single choices eventually.

Computations of this kind form a \emph{monad}\index{monad} with respect to two particular operators.
Monads are an abstraction of sequential computation, popular in functional programming.
They require definitions of ``return'' and ``bind'' operators, which we give here, writing $\mathcal P$ for the ``powerset''\index{powerset} operator that lifts types into sets.
\begin{eqnarray*}
  \mt{ret} &:& \forall \alpha. \; \alpha \to \mathcal P(\alpha) \\
  \mt{ret} &=& \lambda x. \; \{x\} \\
  \mt{bind} &:& \forall \alpha, \beta. \; \mathcal P(\alpha) \to (\alpha \to \mathcal P(\beta)) \to \mathcal P(\beta) \\
  \mt{bind} &=& \lambda c_1. \; \lambda c_2. \; \bigcup_{x \in c_1} c_2(x)
\end{eqnarray*}

We write $x \leftarrow c_1; c_2(x)$ as shorthand for $\mt{bind} \; c_1 \; c_2$.

A valid monad must also satisfy three algebraic laws.
We will state just one of those laws here, with respect to the superset relation $\supseteq$, which we read as ``refines into.''\index{refinement}
That is, the lefthand operand is a more specification-like computation, which we want to replace with the righthand operand, which should be more concrete.
In other words, any legal answer for the new computation is also legal for the old one.
However, we may decide that the new computation rules out some answers that were previously under consideration.
If we rule out all the possible answers, then we will be stuck, if we ever want to refine to a singleton set!

With our notion of refinement in place, we can state three key properties, the first of which is one of the monad laws.

\begin{theorem}\label{bindret}
  $\mt{bind} \; (\mt{ret} \; v) \; c \supseteq c(v)$.
\end{theorem}

\begin{theorem}\label{refine1}
  If $c_1 \supseteq c'_1$, then $\mt{bind} \; c_1 \; c_2 \supseteq \mt{bind} \; c'_1 \; c_2$.
\end{theorem}

\begin{theorem}\label{refine2}
  If $\forall x. \; c_2(x) \supseteq c'_2(x)$, then $\mt{bind} \; c_1 \; c_2 \supseteq \mt{bind} \; c_1 \; c'_2$.
\end{theorem}

Together with the well-known reflexivity and transitivity of $\supseteq$, these laws set us up for convenient \emph{equational reasoning}\index{equational reasoning}.
That is, we start from a specification and repeatedly \emph{rewrite}\index{rewriting} in it using $\supseteq$ facts, until we arrive at an acceptable implementation (singleton set whose element reads as an efficient computation).
Rewriting requires us to descend inside the structure of a term to find a match for the lefthand side of a $\supseteq$ fact.
When we descend into the first argument or second argument of a $\mt{bind}$, we appeal to Theorem \ref{refine1} or \ref{refine2}, respectively.
We also use transitivity of $\supseteq$ to chain together multiple rewritings.
Finally, we use Theorem \ref{bindret} whenever we have reduced a prefix of a computation into deterministic code.

The associated Coq code contains an example of this kind of refinement in action.
There are enough details that mechanized assistance is especially worthwhile.

\section{Refinement for Abstract Data Types}

Abstract data types (ADTs)\index{abstract data type} are an important program-encapsulation feature that we studied in Chapter \ref{adt}.
Recall that they package private state together with public methods that can manipulate it, somewhat in the style of object-oriented programming\index{object-oriented programming}.
Let us now study how to start from an ADT specification and refine it gradually into an efficient implementation, in a way that leaves a ``proof trail'' justifying correctness.

For simplicity, we will force all methods to take $\mathbb N$ as input and return $\mathbb N$ as output, in addition to the implicit threading-through of an object's private state.
The whole theory generalizes to methods of varying type.

\begin{definition}
  An \emph{abstract data type (ADT)}\index{abstract data type} over a set $M$ of methods is a triple $\angled{\mathcal S, \mathcal C, \mathcal M_{m \in M}}$, where $\mathcal S$ is the set of private states, $\mathcal C : \mathcal P(\mathcal S)$ is a \emph{constructor}\index{constructor} that initializes the state, and each $\mathcal M_m : \mathcal S \times \mathbb N \to \mathcal P(\mathcal S \times \mathbb N)$ is a method.
\end{definition}

Note that constructor and method bodies live in the computation monad, allowing them to be nondeterminstic and to mix program-style and specification-style code.

\begin{definition}
  Consider two ADTs $\mathcal T^1 = \angled{\mathcal S^1, \mathcal C^1, \mathcal M^1_{m \in M}}$ and $\mathcal T^2 = \angled{\mathcal S^2, \mathcal C^2, \mathcal M^2_{m \in M}}$ over the same methods $M$.
  We say that $\mathcal T^2$ refines $\mathcal T^1$ (written, with overloaded notation, as $\mathcal T^1 \supseteq \mathcal T^2$) when there exists binary relation $R$ on $\mathcal S^1$ and $\mathcal S^2$ such that:
  \begin{enumerate}
  \item $\forall s_2 \in \mathcal C^2. \; \exists s_1 \in \mathcal C^1. \; s_1 \; R \; s_2$
  \item \begin{tabular}{l}
    $\forall m, s_1, s_2. \; s_1 \; R \; s_2 \Rightarrow \forall x, y, s'_2. \; (s'_2, y) \in \mathcal M^2_m(s_2, x)$ \\
    $\hspace{.1in} \Rightarrow \exists s'_1. \; (s'_1, y) \in \mathcal M^1_m(s_1, x) \land s'_1 \; R \; s'_2$
  \end{tabular}
  \end{enumerate}
\end{definition}

In fact, the relation $R$ here is a \emph{simulation}\index{simulation}, in the sense of Chapter \ref{compiler_correctness}!
Intuitively, any sequence of method calls on $\mathcal T^2$ can be \emph{simulated} with the same sequence of method calls on $\mathcal T^1$ yielding the same answers.
The private states in the two worlds needn't be precisely equal, but at each step they must remain related by $R$.

A number of very handy refinement principles apply.

\begin{theorem}[Reflexivity]\label{adtrefl}
  $\mathcal T \supseteq \mathcal T$.
\end{theorem}
\begin{proof}
  Justified by choosing the simulation relation to be equality.
\end{proof}

\begin{theorem}[Transitivity]
  If $\mathcal T_1 \supseteq \mathcal T_2$ and $\mathcal T_2 \supseteq \mathcal T_3$, then $\mathcal T_1 \supseteq \mathcal T_3$.
\end{theorem}
\begin{proof}
  Justified by choosing the simulation relation for the conclusion to be the composition of the relations for the premises.
\end{proof}

\begin{theorem}[Focusing on a constructor]
  If $\mathcal C^1 \supseteq \mathcal C^2$, then $\angled{\mathcal S, \mathcal C^1, \mathcal M_{m \in M}} \supseteq \angled{\mathcal S, \mathcal C^2, \mathcal M_{m \in M}}$.
\end{theorem}
\begin{proof}
  Justified by choosing the simulation relation to be equality.
\end{proof}

\begin{theorem}[Focusing on a method]\label{refinemethod}
  Let $m$ be one of the methods for $\mathcal T$, and let the body of that method be $c$.
  Let $\mathcal T'$ be the result of replacing $m$'s body in $\mathcal T$ with a new function $c'$.
  If $\forall s, x. \; c(s, x) \supseteq c'(s, x)$, then $\mathcal T \supseteq \mathcal T'$.
\end{theorem}
\begin{proof}
  Justified by choosing the simulation relation to be equality.
\end{proof}

The next simulation principle is one of the most powerful.

\begin{theorem}[Change of representation]\label{repchange}
  Let $\mathcal T = \angled{\mathcal S, \mathcal C, \mathcal M_{m \in M}}$ be an ADT, and pick $A : \mathcal S' \to \mathcal S$ (for some new state set $\mathcal S'$) as an \emph{abstraction function}\index{abstraction function}.
  Now define $\mathcal T' = \angled{\mathcal S', \mathcal C', \mathcal M'_{m \in M}}$, where:
  \begin{enumerate}
  \item $\mathcal C' = s \leftarrow \mathcal C; \{s' \mid A(s') = s\}$
  \item $\mathcal M'_m = \lambda s'_0, x. \; \mathcal (s, y) \leftarrow M_m(A(s'_0), x); s' \leftarrow \{s' \mid A(s') = s\}; \mt{ret} \; (s', y)$
  \end{enumerate}
  Then $\mathcal T \supseteq \mathcal T'$.
\end{theorem}
\begin{proof}
  Justified by choosing the simulation relation $\{(A(s), s) \mid s \in \mathcal S'\}$.
\end{proof}

The intuition of representation change is that we choose $\mathcal S'$ as some clever new data structure.
We are responsible for a formal characterization of how it relates to the original, more obvious data structure.
Abstraction function $A$ shows how to ``undo our cleverness,'' computing the old version of a state.
It would generally be inefficient to run this conversion on every method call.
Luckily, the new method bodies generated by this rule can be subjected to further optimization!
For instance, we can use Theorem \ref{refinemethod} to rewrite method bodies further.
We will especially want to do so to replace subcomputations of the form $\{s' \mid A(s') = s\}$, which stand for calling $A^{-1}$ on particular values.
Of course not every function has an inverse as a total relation, let alone a total function, so there is no purely mechanical way to rewrite inverse function calls into executable code.

See the associated Coq code for some examples of these rules in action for concrete program derivations.
It turns out that Theorems \ref{adtrefl} through \ref{repchange} are \emph{complete}: any correct refinement fact on ADTs can be proved using them alone.

\section{Another Example Refinement Principle: Adding a Cache}

Still, it can be helpful to formulate additional ADT-refinement principles, capturing common optimization strategies.
As an example, we formalize the idea of \emph{adding a cache to a data structure}\index{caching}, which is also known as \emph{finite differencing}\index{finite differencing} in the literature.

\begin{theorem}[Adding a cache]
  Let $\mathcal T = \angled{\mathcal S, \mathcal C, \mathcal M_{m \in M}}$ be an ADT, and pick a method $m$ such that $\mathcal M_m = \lambda s, x. \; \mt{ret} \; (s, f(s))$ for some pure function $f : \mathcal S \to \mathbb N$.
  Now define $\mathcal T' = \angled{\mathcal S \times \mathbb N, \mathcal C', \mathcal M'_{m \in M}}$, where:
  \begin{enumerate}
  \item $\mathcal C' = s \leftarrow \mathcal C; \mt{ret} \; (s, f(s))$
  \item $\mathcal M'_m = \lambda (s, c), x. \; \mt{ret} \; ((s, c), c)$
  \item For $m' \neq m$, $\mathcal M'_{m'} = \lambda (s, c), x. \; \mathcal (s', y) \leftarrow M_m(s, x); c' \leftarrow \{c' \mid f(s) = c \Rightarrow f(s') = c'\}; \mt{ret} \; ((s', c'), y)$
  \end{enumerate}
  Then $\mathcal T \supseteq \mathcal T'$.
\end{theorem}
\begin{proof}
  Justified by choosing the simulation relation $\{(s, (s, f(s))) \mid s \in \mathcal S\}$.
\end{proof}

Intuitively, method $m$ is a pure \emph{observer}, not changing the state, only returning some pure function $f$ of it.
We change the state set from $\mathcal S$ to $\mathcal S \times \mathbb N$, so that the second component of a state \emph{caches} the output of $f$ on the first component.
Like in a change of representation, method bodies are all rewritten automatically, but pick-from-set operations are inserted, and we must refine them away to arrive at a final implementation.

Here the crucial such pattern is $\{c' \mid f(s) = c \Rightarrow f(s') = c'\}$.
Intuitively, we are asked to choose a cache value $c'$ that is correct for the new state $s'$, while we are allowed to \emph{assume} that the prior cache value $c$ was accurate for the old state $s$.
Therefore, it is natural to give an efficient formula for computing $c'$ in terms of $c$.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Introduction to Reasoning About Shared-Memory Concurrency}\label{sharedmem}

Separation logic~\index{separation logic} tames sharing of a mutable memory across libraries and data structures.
We will need some additional techniques when we add concurrency to the mix, resulting in the \emph{shared-memory}\index{shared-memory concurrency} style of concurrency.
This chapter introduces a basic style of operational semantics for shared memory, also studying its use in model checking, including with an important optimization called partial-order reduction.
The next chapter shows how to prove deeper properties of fancier programs, by extending the Hoare-logic approach to shared-memory concurrency.
Then the chapter after that shows how to formalize and reason about a different style of concurrency, message passing.

\section{An Object Language with Shared-Memory Concurrency}

For the next two chapters, we work with this object language.
$$\begin{array}{rrcl}
  \textrm{Commands} & c &::=& \mt{Fail} \mid \mt{Return} \; v \mid x \leftarrow c; c \mid \mt{Read} \; a \mid \mt{Write} \; a \; v \mid \mt{Lock} \; a \mid \mt{Unlock} \; a \mid c || c
\end{array}$$

In addition to the basic structure of the languages from the last two chapters, we have three features specific to concurrency.
We follow the common ``threads and locks''\index{locks} style of synchronization, with commands $\mt{Lock} \; a$ and $\mt{Unlock} \; a$ for acquiring and releasing locks, respectively.
We also have $c_1 || c_2$ for running commands $c_1$ and $c_2$ in parallel, giving a scheduler free reign to interleave their atomic steps.

The operational semantics is small-step\index{small-step operational semantics}, especially because big-step semantics\index{big-step operational semantics} is notoriously awkward for concurrency.
Each state of the system is a triple $(h, l, c)$, with $h$ and $c$ the heap and current command from our usual semantics.
New component $l$ is a \emph{lockset}\index{lockset}, recording which locks are currently held, without distinguishing between different threads that might have taken them.

$$\infer{\smallstep{(h, l, x \leftarrow c_1; c_2(x))}{(h', l', x \leftarrow c'_1; c_2(x))}}{
  \smallstep{(h, l, c_1)}{(h', l', c'_1)}
}
\quad \infer{\smallstep{(h, l, x \leftarrow \mt{Return} \; v; c_2(x))}{(h, l, c_2(v))}}{}$$

$$\infer{\smallstep{(h, l, \mt{Read} \; a)}{(h, l, \mt{Return} \; \msel{h}{a})}}{}
\quad \infer{\smallstep{(h, l, \mt{Write} \; a \; v)}{(\mupd{h}{a}{v}, l, \mt{Return} \; 0)}}{}$$

$$\infer{\smallstep{(h, l, \mt{Lock} \; a)}{(h, l \cup \{a\}, \mt{Return} \; 0)}}{
  a \notin l
}
\quad \infer{\smallstep{(h, l, \mt{Unlock} \; a)}{(h, l \setminus \{a\}, \mt{Return} \; 0)}}{
  a \in l
}$$

$$\infer{\smallstep{(h, l, c_1 || c_2)}{(h', l', c'_1 || c_2)}}{
  \smallstep{(h, l, c_1)}{(h', l', c'_1)}
}
\quad \infer{\smallstep{(h, l, c_1 || c_2)}{(h', l', c_1 || c'_2)}}{
  \smallstep{(h, l, c_2)}{(h', l', c'_2)}
}$$

Note that the last two rules are the only source of \emph{nondeterminism}\index{nondeterminism} in this semantics, where a single state can step to multiple different next states.
This nondeterminism corresponds to the freedom we give to a scheduler\index{scheduler} that may pick which thread runs next.
Though this kind of concurrent programming is very expressive and often achieves very high performance, it comes at a cost in reasoning, as there may be \emph{exponentially many different schedules} for a single program, measured with respect to the textual length of the program.
A popular name for this pitfall is \emph{the state-explosion problem}\index{state-explosion problem}.

Note also that we have omitted any looping constructs from this object language, so all programs terminate.
The Coq formalization uses the mixed-embedding\index{mixed embedding} style, making it not entirely obvious that all programs really do terminate.
In any case, if we must tame the state-explosion problem, we already have our work cut out for us, even when the state space rooted at any concrete state is finite!


\section{Shrinking the State Space via Local Actions}

\newcommand{\natf}[1]{\mt{natf}(#1)}

Recall our study of \emph{model checking}\index{model checking} in Chapter \ref{model_checking}.
With a little cleverness, many problems in program verification can be reduced to exploration of finite state spaces of transition systems.
In particular, we looked at \emph{safety properties}, which can be expressed as invariants of transition systems.
One simply follows all the edges in the graph determined by a transition system, accepting the program if that process terminates without finding a state that violates the invariant.
For our object language in this chapter, a good safety property is that commands are \emph{not about to fail}, formalized as:
\begin{eqnarray*}
  \natf{\mt{Fail}} &=& \bot \\
  \natf{x \leftarrow c_1; c_2(x)} &=& \natf{c_1} \\
  \natf{c_1 || c_2} &=& \natf{c_1} \land \natf{c_2} \\
  \natf{\_} &=& \top
\end{eqnarray*}

Here is an example of a program execution that avoids failures.
\begin{eqnarray*}
  (\mupd{\mempty}{0}{1}, \emptyset, n \leftarrow \mt{Read} \; 0; \mt{Write} \; 0 \; (n+1))
  &\rightarrow& (\mupd{\mempty}{0}{1}, \emptyset, n \leftarrow \mt{Return} \; 1; \mt{Write} \; 0 \; (n+1)) \\
  &\rightarrow& (\mupd{\mempty}{0}{1}, \emptyset, \mt{Write} \; 0 \; (1+1)) \\
  &\rightarrow& (\mupd{\mempty}{0}{2}, \emptyset, \mt{Return} \; 0)
\end{eqnarray*}

\newcommand{\rl}[1]{{\left \lfloor #1 \right \rfloor}}

When exploring the state space of this program, a n\"aive model checker will generate each of these states explicitly, even the ``silly'' second one that reduces to the third without reading or writing the shared state.
We can short-circuit those extra states by writing a simple function that makes all appropriate purely local reductions, everywhere within a command.
\begin{eqnarray*}
  \rl{x \leftarrow c_1; c_2(x)} &=& \rl{c_2(v)}\textrm{, when $\rl{c_1} = \mt{Return} \; v$} \\
  \rl{x \leftarrow c_1; c_2(x)} &=& x \leftarrow \rl{c_1}; \rl{c_2(x)}\textrm{, when $\rl{c_1}$ is not $\mt{Return}$} \\
  \rl{c_1 || c_2} &=& \rl{c_1} || \rl{c_2} \\
  \rl{c} &=& c
\end{eqnarray*}

\newcommand{\smallstepL}[2]{#1 \to_L #2}

Using this relation, we can define an alternative step relation that short-circuits local steps.
$$\infer{\smallstepL{(h, l, c)}{(h', l', \rl{c'})}}{
  \smallstep{(h, l, c)}{(h', l', c')}
}$$

The base semantics can be used to define transition systems in the usual way, with $\mathbb T(h, l, c) = \angled{\{(h, l, c)\}, \to}$.
We can also define short-circuiting transition systems with $\mathbb T_L(h, l, c) = \angled{\{(h, l, \rl{c})\}, \to_L}$.
A theorem shows that the latter overapproximates the former.

\abstraction
\begin{theorem}\label{local}
  If $\mt{natf}$ is an invariant of $\mathbb T_L(h, l, c)$, then it is also an invariant of $\mathbb T(h, l, c)$.
\end{theorem}
\begin{proof}
  By induction on a trace $\smallsteps{(h, l, c)}{(h', l', c')}$, matching each original step with zero or one alternative steps.
  We appeal to a number of lemmas, some of which are summarized below.
\end{proof}

\begin{lemma}\label{rl_idem}
  For all $c$, $\rl{\rl{c}} = \rl{c}$.
\end{lemma}
\begin{proof}
  By induction on the structure of $c$.
\end{proof}

\begin{lemma}
  If $\smallstep{(h, l, c)}{(h', l', c')}$, then either $(h', l') = (h, l)$ and $\rl{c'} = \rl{c}$ (the step was local), or there exists $c''$ where $\smallstep{(h, l, \rl{c})}{(h', l', c'')}$ and $\rl{c''} = \rl{c'}$ (the step was not local).
\end{lemma}
\begin{proof}
  By induction on the derivation of $\smallstep{(h, l, c)}{(h', l', c')}$, appealing in places to to Lemma \ref{rl_idem}.
\end{proof}

\begin{lemma}
  If $\natf{\rl{c}}$, then $\natf{c}$.
\end{lemma}
\begin{proof}
  By induction on the structure of $c$.
\end{proof}


\section{Basic Partial-Order Reduction}

What made the reduction in Theorem \ref{local} sound?
It was that local actions \emph{commute}\index{commute}\index{commutativity} with all actions in other threads.
A particular run of a system in the base semantics might indeed choose to run a nonlocal action before a local action that is enabled.
However, we can \emph{reorder} any such action to instead come after every enabled local action, without affecting the final state.
This reordering is an example of commutativity in action.

By recognizing and exploiting other varieties of commutativity, we can shrink state spaces even further, even reducing the spaces of certain interesting program families from exponential size to linear size.
A popular technique of this kind is \emph{partial-order reduction}\index{partial-order reduction}.
We formalize a simple variant of it in this section (and in the accompanying Coq code), then sketch a less formal generalization in the chapter's final section.

\newcommand{\summ}[2]{\mt{summarize}(#1, #2)}

To check commutativity more flexibly, we must use more than just the fact that a local action commutes with any action in another thread.
For instance, we should take advantage of the fact that any two $\mt{Read}$ actions commute.
We will do some \emph{static analysis}\index{static analysis} of programs to overapproximate which kinds of atomic actions they might perform.
Such an analysis is designed to be trivially computable.
Here's an example of one analysis, formulated as a relation $\summ{c}{(r, w, \ell)}$, which asserts that the only globally visible actions that could be performed by thread $c$ are reads to addresses in $r$, writes to addresses in $w$, and acquires or releases of locks in $\ell$.

$$\infer{\summ{\mt{Return} \; r}{s}}{}
\quad \infer{\summ{\mt{Fail}}{s}}{}
\quad \infer{\summ{x \leftarrow c_1; c_2(x)}{s}}{
    \summ{c_1}{s}
    & \forall r. \; \summ{c_2(r)}{s}
}$$

$$\infer{\summ{\mt{Read} \; a}{(r, w, \ell)}}{
  a \in r
}
\quad \infer{\summ{\mt{Write} \; a \; v}{(r, w, \ell)}}{
  a \in w
}$$

$$\infer{\summ{\mt{Lock} \; a}{(r, w, \ell)}}{
  a \in \ell
}
\quad \infer{\summ{\mt{Unlock} \; a}{(r, w, \ell)}}{
  a \in \ell
}$$

$$\infer{\summ{c_1 || c_2}{s}}{
  \summ{c_1}{s}
  & \summ{c_2}{s}
}$$

\newcommand{\na}[1]{\mt{nextAction}(#1)}

Those relations do all we need to do to record which actions a thread might not commute with.
The other key ingredient is an extractor for the next atomic action in a thread, written as a partial function.
\begin{eqnarray*}
  \na{\mt{Return} \; r} &=& \mt{Return} \; r \\
  \na{\mt{Fail}} &=& \mt{Fail} \\
  \na{\mt{Read} \; a} &=& \mt{Read} \; a \\
  \na{\mt{Write} \; a \; v} &=& \mt{Write} \; a \; v \\
  \na{\mt{Lock} \; a} &=& \mt{Lock} \; a \\
  \na{\mt{Unlock} \; a} &=& \mt{Unlock} \; a \\
  \na{x \leftarrow c_1; c_2(x)} &=& \na{c_1}
\end{eqnarray*}

Given a next atomic action and a summary of another thread, it is now easy to define commutativity of the two.
\newcommand{\commu}[2]{\mt{commutes}(#1, #2)}
\begin{eqnarray*}
  \commu{\mt{Return} \; \_}{\_} &=& \top \\
  \commu{\mt{Fail}}{\_} &=& \top \\
  \commu{\mt{Read} \; a}{(\_, w, \_)} &=& a \notin w \\
  \commu{\mt{Write} \; a \; \_}{(r, w, \_)} &=& a \notin r \cup w \\
  \commu{\mt{Lock} \; a}{(\_, \_, \ell)} &=& a \notin \ell \\
  \commu{\mt{Unlock} \; a}{(\_, \_, \ell)} &=& a \notin \ell \\
  \commu{\_}{\_} &=& \bot
\end{eqnarray*}

\newcommand{\pors}[1]{\mt{porSafe}(#1)}

With these ingredients, we can define a predicate $\mt{porSafe}$ that figures out when a state is eligible for the partial-order reduction optimization, which is to force the first thread to run next, ignoring the other threads for now.
In working out the formal details, we will confine ourselves to commands $c_1 || c_2$ with distinguished ``first threads'' $c_1$, though everything can be generalized to other settings (and doing that generalization could be a worthwhile exercise for the reader, though it requires a lot of logical bookkeeping).
This optimization is only safe when the first thread can take a step and when that step commutes with any action that other threads (combined into $c_2$) might perform.
Formally, we define $\pors{h, l, c_1, c_2, s}$ as follows, where $s$ should be a valid summary of $c_2$.
\begin{itemize}
\item There is some $c_0$ where $\na{c_1} = c_0$.  That is, thread $c_1$ has some uniquely determined atomic action lined up to run next.
\item There exist $h'$, $l'$, and $c'_1$ such that $\smallstep{(h, l, c_1)}{(h', l', c'_1)}$.  That is, thread $c_1$ is actually able to take a step, which might not be possible if e.g. trying to take a lock that is already held.
\item And the crucial compatibility condition: $\commu{c_0}{s}$.  That is, all actions that other threads might perform commute with $c_0$, the first action of $c_1$.
\end{itemize}

\newcommand{\smallstepC}[3]{#1 \to_C^{#2} #3}

With the applicability condition defined, it is now straightforward to define an optimized step relation, parameterized on an accurate summary $s$ for $c_2$.

$$\infer{\smallstepC{(h, l, c_1 || c_2)}{s}{(h', l', c'_1 || c_2)}}{
  \smallstep{(h, l, c_1)}{(h', l', c'_1)}
}$$

$$\infer{\smallstepC{(h, l, c_1 || c_2)}{s}{(h', l', c_1 || c'_2)}}{
  \neg \pors{h, l, c_1, c_2, s}
  & \smallstep{(h, l, c_2)}{(h', l', c'_2)}
}$$

The whole thing is wrapped up into transition systems as $\mathbb T_C(h, l, c_1, c_2, s) = \angled{\{(h, l, c_1 || c_2)\}, \to_C^s}$.

\newcommand{\tof}[2]{\mt{timeOf}(#1, #2)}

Our proof of soundness for this reduction will depend on having some constant upper bound on program execution time.
This relation computes a conservative overapproximation.
$$\infer{\tof{\mt{Return} \; r}{n}}{}
\quad \infer{\tof{\mt{Fail}}{n}}{}
\quad \infer{\tof{\mt{Read} \; a}{n+1}}{}
\quad \infer{\tof{\mt{Write} \; a \; v}{n+1}}{}$$

$$\infer{\tof{\mt{Lock} \; a}{n+1}}{}
\quad \infer{\tof{\mt{Unlock} \; a}{n+1}}{}$$

$$\infer{\tof{x \leftarrow c_1; c_2(x)}{n_1 + n_2 + 1}}{
  \tof{c_1}{n_1}
  & \forall r. \; \tof{c_2(r)}{n_2}
}
\quad \infer{\tof{c_1 || c_2}{n_1 + n_2 + 1}}{
  \tof{c_1}{n_1}
  & \tof{c_2}{n_2}
}$$

It may be surprising that, in our formal mixed embedding, there exist commands with no provable upper bounds, according to this relation.
We leave it as an exercise to the reader to find a concrete example.
(Actually, the Coq code includes an example and its proof of unboundedness.)

One last ingredient is to work with a relation $\to^i$, which is the $i$-way self-composition of $\to$.
It is easy to show that whenever $x \to^* y$, there exists $i$ such that $x \to^i y$.

\newcommand{\smallstepsC}[3]{#1 \to_C^{#2*} #3}

With these ingredients, we can state the reduction theorem.
\abstraction
\begin{theorem}
  If $\summ{c_2}{s}$ and $\tof{c_1 || c_2}{n}$, then to prove $\mt{natf}$ as an invariant of $\mathbb T(h, l, c_1 || c_2)$, it suffices to prove $\mt{natf}$ as an invariant of $\mathbb T_C(h, l, c_1, c_2, s)$.
\end{theorem}
\begin{proof}
  Setting $c = c_1 || c_2$, we assume for the sake of contradiction that there exists some derivation $\smallsteps{(h, l, c)}{(h', l', c')}$, where $\neg \natf{h', l', c'}$.
  First, since $c$ runs in bounded time, by Lemma \ref{completion}, we can \emph{complete} that execution to continue running to some $(h'', l'', c'')$, which is a stuck state.
  By Lemma \ref{stillFailing}, $\neg \natf{h'', l'', c''}$.
  Next, we conclude that there exists $i$ such that $(h, l, c) \to^i (h'', l'', c'')$.
  By Lemma \ref{translate_trace}, there exist $h'''$, $l'''$, and $c'''$ where $\smallstepsC{(h, l, c_1 || c_2)}{s}{(h''', l''', c''')}$ and $\neg \natf{c'''}$.
  These facts contradict our assumption that $\mt{natf}$ is an invariant of $\mathbb T_C(h, l, c_1, c_2, s)$.
\end{proof}

\begin{lemma}\label{completion}
  If $\tof{c}{n}$, then there exist $h'$, $l'$, and $c'$ where $\smallsteps{(h, l, c)}{(h', l', c')}$, such that $(h', l', c')$ is a stuck state.
\end{lemma}
\begin{proof}
  By strong induction\index{strong induction} on $n$.
\end{proof}

\begin{lemma}\label{stillFailing}
  If $\smallsteps{(h, l, c)}{(h', l', c')}$ and $\neg \natf{c}$, then $\neg \natf{c'}$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\smallsteps{(h, l, c)}{(h', l', c')}$, with a nested induction on the derivations of individual steps.
\end{proof}

\begin{lemma}\label{translate_trace}
  If $(h, l, c_1 || c_2) \to^i (h', l', c')$, and $(h', l', c')$ is stuck and about to fail, and $\summ{c_2}{s}$, then there exist $h''$, $l''$, and $c'''$ such that $\smallstepsC{(h, l, c_1 || c_2)}{s}{(h'', l'', c''')}$ and $\neg \natf{c'''}$.
\end{lemma}
\begin{proof}
  By induction on $i$.
  Note that induction on the structure of a derivation $\smallsteps{(h, l, c_1 || c_2)}{(h', l', c')}$ would \emph{not} be sufficient here, as we will see in the proof sketch below that we sometimes invoke the induction hypothesis on an execution trace that is not just the tail of the one we started with.

  If $i = 0$, then $(h, l, c_1 || c_2)$ is already about to fail, and the conclusion follows trivially.

  Otherwise, $i = i' + 1$ for some $i'$.
  We proceed by cases on the truth of $\pors{h, l, c_1, c_2, s}$.

  If $\neg \pors{h, l, c_1, c_2, s}$, then we invert the derivation $(h, l, c_1 || c_2) \to^i (h', l', c')$ to conclude $\smallstep{(h, l, c_1 || c_2)}{(h'', l'', c'')}$ and $(h'', l'', c'') \to^{i'} (h', l', c')$ for some intermediate state.
  $\to_C$ is easily able to match that first step, as the optimization is disabled, and the rest follows directly by appeal to the induction hypothesis.

  Otherwise, $\pors{h, l, c_1, c_2, s}$, and the key optimization is enabled, so that $\to_C$ only allows the first thread to run.
  The next deduction is not immediate, because the first original step $\smallstep{(h, l, c_1 || c_2)}{(h'', l'', c'')}$ may have chosen a thread beside $c_1$.
  However, the trace $(h, l, c_1 || c_2) \to^{i'+1} (h', l', c')$ \emph{must} eventually pick the first thread to run, and we apply Lemma \ref{translate_trace_commute} to \emph{commute} that eventual step to the front of the derivation, showing its equivalence to one that runs the first thread and then takes $i'$ additional steps to $(h', l', c')$.
  At this point the induction hypothesis applies to those $i'$ steps, to finish the proof.
\end{proof}

\begin{lemma}\label{translate_trace_commute}
  If $(h, l, c_1 || c_2) \to^{i+1} (h', l', c')$, where that last state is stuck, and if $\summ{c_2}{s}$, $\na{c_1} = x$, $\commu{x}{s}$, and $\smallstep{(h, l, c_1)}{(h_0, l_0, c'_1)}$, then $(h_0, l_0, c'_1 || c_2) \to^i (h', l', c')$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $(h, l, c_1 || c_2) \to^{i+1} (h', l', c')$, appealing to a few crucial lemmas, such as single-step determinism of any state in $\mt{nextAction}$'s domain, plus the soundness of $\mt{commutes}$ with respect to single steps of pairs of commands, plus the fact that single steps preserve the accuracy of summaries.
\end{proof}

\section{Partial-Order Reduction More Generally}

The key insights of the prior section can be adapted to prove soundness of a whole family of optimizations by partial-order reduction.
In general, we apply the optimization to remove edges from a state-space graph, whose nodes are states and whose edges are labeled with \emph{actions} $\alpha$.
In our setting, $\alpha$ is the identifier of the thread scheduled to run next.
To do model checking, the graph need not be materialized in memory in one go.
Instead, as an optimization, the graph tends to be constructed on-the-fly, during state-space exploration.

The proofs from the last two sections only apply to check the invariant that no thread is about to fail.
However, the results easily generalize to arbitrary \emph{safety properties}\index{safety properties}, which can be expressed as decidable invariants on states.
Another important class of specifications is \emph{liveness properties}\index{liveness properties}, the most canonical example of which is \emph{termination}, phrased in terms of reachability of some subsets of states designated as \emph{finished}.
There are many other useful liveness properties.
Another example applies to a producer-consumer system\index{producer-consumer systems}, where one thread continually enqueues new work into a queue, and another thread continually dequeues work items and does something with them.
A good liveness property for that system could be that, whenever the producer enqueues an item, the consumer eventually dequeues it, and from there the consumer eventually takes some visible action based on the value of the item.
Our general treatment of partial-order reduction is parameterized on some property $\phi$ over states, and it may be safety, liveness, or a combination of the two.

Every state $s$ of the transition system has an associated set $\mathcal E(s)$ of identifiers for threads that are enabled to run in $s$.
The partial-order reduction optimization conceptually is based on picking a function $\mathcal A$, mapping each state $s$ to an \emph{ample set}\index{ample sets} $\mathcal A(s)$ of threads to consider in state-space exploration.
A few eligibility criteria apply, for every state $s$.

\begin{description}
  \item[Readiness] \index{readiness}$\mathcal A(s) \subseteq \mathcal E(s)$.  That is, we do not select any threads that are not actually ready to run.
  \item[Progress] \index{progress (partial-order reduction)}If $\mathcal A(s) = \emptyset$, then $\mathcal E(s) = \emptyset$.  That is, so long as any thread at all can step, we select at least one thread.
  \item[Commutativity] \index{commutativity (partial-order reduction)}Consider all executions starting at $s$ and taking steps only with the threads \emph{not} in $\mathcal A(s)$.  These executions only include actions that commute with the next actions of the threads in $\mathcal A(s)$.  As a consequence, any actions that run before elements of the ample set can be reordered to follow the execution of any ample-set element.
  \item[Invisibility] \index{invisibility}If $\mathcal A(s) \neq \mathcal E(s)$, then no action in $\mathcal A(s)$ modifies the truth of $\phi$.
\end{description}

Any ample-set algorithm leads to a different variant of $\to_C$ from the prior section, and it is possible to prove that any such transition system is a sound abstraction of the original.

As an example of a different heuristic, consider a weakness of the one from the prior section: when we pick a thread $c$ as the only one to consider running next, $c$'s first action must commute with \emph{any action that any other thread might ever run, for the entire rest of the execution}.
However, imagine that thread $c$ holds some lock $a$.
We might formalize that notion by saying that (1) $a$ is in the lockset, and (2) the other threads, running independently, will never manage to run an $\mt{Unlock} \; a$ command.
A computable static analysis can verify this statement for many nontrivial programs.
Now consider which summaries of the other threads we can get away with comparing against $c$ for commutativity.
We only need to collect the actions that other threads can run \emph{before each one reaches its first occurrence of $\mt{Lock} \; a$}.
The reason is that, if $c$ holds lock $a$ and hasn't run yet, no other thread can progress past its first $\mt{Lock} \; a$.
Now threads may share addresses for read and write access, yet still take advantage of the optimization, when accesses are properly protected by locks.

The conditions above are only sufficient because we left unbounded loops out of our object language.
What happens if we add them back in?
Consider this program:
$$(\mt{while} \; (\mt{true}) \; \{ \; \mt{Write} \; 0 \; 0 \; \}) \; || \; (n \leftarrow \mt{Read} \; 1; \mt{Fail})$$
An optimization in the spirit of our original from the prior section would happily decree that it is safe always to pick the first thread to run.
This reduced state-transition system never gets around to running the second thread, so exploring the state space never finds the failure!
To plug this soundness hole, we add a final condition on the ample sets.

\begin{description}
  \item[Fairness] \index{fairness}If there is a cycle in the finite state space where $\alpha$ is enabled at some point, then $\alpha \in \mathcal A(s)$ for some $s$ in the cycle.
\end{description}

This condition effectively forces the ample set for the example program above to include the second thread.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Concurrent Separation Logic}

Chapters \ref{seplog} and \ref{sharedmem} respectively introduced techniques for reasoning about two tricky aspects of programs: heap-allocated linked data structures\index{linked data structures} and shared-memory concurrency\index{shared-memory concurrency}.
When we add concurrency to the mix for a program-reasoning problem, we are often surprised at how much more complex it becomes.
This chapter introduces a pleasant exception to the rule, \emph{concurrent separation logic}\index{concurrent separation logic}, a rather small addition to separation logic\index{separation logic} that supports invariant-based reasoning about threads-and-locks shared-memory programs.

\section{Object Language: Loops and Locks}

Here's the object language we adopt, which should be old hat by now, just mixing together features of the object languages from Chapters \ref{seplog} and \ref{sharedmem}.

$$\begin{array}{rrcl}
  \textrm{Commands} & c &::=& \mt{Fail} \mid \mt{Return} \; v \mid x \leftarrow c; c \mid \mt{Loop} \; i \; f \\
  &&& \mid \mt{Read} \; a \mid \mt{Write} \; a \; v \mid \mt{Alloc} \; n \mid \mt{Free} \; a \; n \mid \mt{Lock} \; a \mid \mt{Unlock} \; a \mid c || c
\end{array}$$

$$\infer{\smallstep{(h, l, x \leftarrow c_1; c_2(x))}{(h', l', x \leftarrow c'_1; c_2(x))}}{
  \smallstep{(h, l, c_1)}{(h', l', c'_1)}
}
\quad \infer{\smallstep{(h, l, x \leftarrow \mt{Return} \; v; c_2(x))}{(h, l, c_2(v))}}{}$$

$$\infer{\smallstep{(h, l, \mt{Loop} \; i \; f)}{(h, l, x \leftarrow f(i); \mt{match} \; x \; \mt{with} \; \mt{Done}(a) \Rightarrow \mt{Return} \; a \mid \mt{Again}(a) \Rightarrow \mt{Loop} \; a \; f)}}{}$$

$$\infer{\smallstep{(h, l, \mt{Read} \; a)}{(h, l, \mt{Return} \; v)}}{
  \msel{h}{a} = v
}
\quad \infer{\smallstep{(h, l, \mt{Write} \; a \; v')}{(\mupd{h}{a}{v'}, l, \mt{Return} \; ())}}{
  \msel{h}{a} = v
}$$

$$\infer{\smallstep{(h, l, \mt{Alloc} \; n)}{(\mupd{h}{a}{0^n}, l, \mt{Return} \; a)}}{
  \dom{h} \cap [a, a+n) = \emptyset
}
\quad \infer{\smallstep{(h, l, \mt{Free} \; a \; n)}{(h - [a, a+n), l, \mt{Return} \; ())}}{
}$$

$$\infer{\smallstep{(h, l, \mt{Lock} \; a)}{(h, l \cup \{a\}, \mt{Return} \; ())}}{
  a \notin l
}
\quad \infer{\smallstep{(h, l, \mt{Unlock} \; a)}{(h, l \setminus \{a\}, \mt{Return} \; ())}}{
  a \in l
}$$

$$\infer{\smallstep{(h, l, c_1 || c_2)}{(h', l', c'_1 || c_2)}}{
  \smallstep{(h, l, c_1)}{(h', l', c'_1)}
}
\quad \infer{\smallstep{(h, l, c_1 || c_2)}{(h', l', c_1 || c'_2)}}{
  \smallstep{(h, l, c_2)}{(h', l', c'_2)}
}$$


\section{The Program Logic}

We will build on basic separation logic, using the same kind of assertions and even adopting all of the original rules unchanged.
Here they are again, for easy reference.

$$\infer{\hoare{\emp}{\mt{Return} \; v}{\lambda r. \; \lift{r = v}}}{}
\quad \infer{\hoare{P}{x \leftarrow c_1; c_2(x)}{R}}{
  \hoare{P}{c_1}{Q}
  & (\forall r. \; \hoare{Q(r)}{c_2(r)}{R})
}$$

$$\infer{\hoare{I(\mt{Again}(i))}{\mt{Loop} \; i \; f}{\lambda r. \; I(\mt{Done}(r))}}{
  \forall a. \; \hoare{I(\mt{Again}(a))}{f(a)}{I}
}
\quad \infer{\hoare{\lift{\bot}}{\mt{Fail}}{\lambda \_. \; \lift{\bot}}}{}$$

$$\infer{\hoare{\exists v. \; \ptsto{a}{v} * R(v)}{\mt{Read} \; a}{\lambda r. \; \ptsto{a}{r} * R(r)}}{}
\quad \infer{\hoare{\exists v. \; \ptsto{a}{v}}{\mt{Write} \; a \; v'}{\lambda \_. \; \ptsto{a}{v'}}}{}$$

$$\infer{\hoare{\emp}{\mt{Alloc} \; n}{\lambda r. \; \ptsto{r}{0^n}}}{}
\quad \infer{\hoare{\ptsto{a}{\; ?^n}}{\mt{Free} \; a \; n}{\lambda \_. \; \emp}}{}$$

$$\infer{\hoare{P'}{c}{Q'}}{
  \hoare{P}{c}{Q}
  & P' \Rightarrow P
  & \forall r. \; Q(r) \Rightarrow Q'(r)
}
\quad \infer{\hoare{P * R}{c}{\lambda r. \; Q(r) * R}}{
  \hoare{P}{c}{Q}
}$$

\modularity
When two threads use disjoint regions of memory, it is trivial to apply this rule of Concurrent Separation Logic to verify the threads independently.
$$\infer{\hoare{P_1 * P_2}{c_1 || c_2}{\lambda \_. \; \lift{\bot}]}}{
  \hoare{P_1}{c_1}{Q_1}
  & \hoare{P_2}{c_2}{Q_2}
}$$
The separating conjunction $*$ turned out to be just the right way to express the idea of ``splitting the heap into a part for the first thread and a part for the second thread.''
Because $c_1$ and $c_2$ touch disjoint memory regions, all of their memory operations commute\index{commutativity}, so that we need not worry about the state-explosion problem, in all the ways that the scheduler might interleave their steps.
Note that, since for simplicity our running example family of concurrent object languages includes no way for parallel compositions to terminate, it makes sense to assign a contradictory overall postcondition.

However, with realistic shared-memory programs, we don't get off as easy as the parallel-composition rule suggests.
Threads \emph{do} share memory regions, using \emph{synchronization}\index{synchronization} to tame the state-explosion problem.
Our object language includes locks as its example of synchronization, and Concurrent Separation Logic is specialized to locks.
We may keep the simplistic-seeming rule for parallel composition and implicitly enrich its power by adding a twist, in the form of some other rules.

The big twist is that we parameterize everything over some finite set $L$ of locks that may be used.
\invariants
Furthermore, another parameter is a function $\mathcal I$ that maps locks to invariants, which have the same type as preconditions.
The idea is this: when no one holds a lock, \emph{the lock owns a chunk of memory that satisfies its invariant}.
When a thread holds the lock, the lock doesn't own any memory; it is waiting for the thread to unlock it and \emph{donate back} a chunk of memory satisfying the invariant.
We now think of the precondition of a Hoare triple as only describing the \emph{local memory} of a thread, which no other thread may access; while locks and their invariants coordinate the \emph{shared memory} regions of an application.
The proof rules will coordinate dynamic motion of memory regions between the shared regions and local regions.
This motion is only part of a proof technique; it has no runtime content reflected in the operational semantics!

With all of that set-up, the final two rules may seem surprisingly simple.
$$\infer{\hoare{\emp}{\mt{Lock} \; a}{\lambda \_. \; \mathcal I(a)}}{
  a \in L
}
\quad \infer{\hoare{\mathcal I(a)}{\mt{Unlock} \; a}{\lambda \_. \; \emp}}{
  a \in L
}$$

When a thread takes a lock, it appears as if \emph{a memory chunk satisfying that lock's invariant materializes in the local memory space}.
Conversely, when a thread releases a lock, it appears as if \emph{the lock grabs a memory chunk satisfying the invariant out of the local memory space}.
The rules are coordinating conceptual ownership transfers between local memory and the global lock memory.

The accompanying Coq code shows a few example verifications of interesting programs.

\section{Soundness Proof}

\newcommand{\guarded}[2]{#1 \longrightarrow #2}

We can adapt the separation-logic soundness proof to concurrency, with just a few new ideas.
First, we will appreciate some new connectives for writing assertions.
One simple one is a guarded predicate, defined like so, for pure proposition $\phi$ (the guard) and separation-logic assertion $P$.
\begin{eqnarray*}
  \guarded{\phi}{P} &=& \mt{if} \; \phi \; \mt{then} \; P \; \mt{else} \; \emp
\end{eqnarray*}

\renewcommand{\bigstar}[3]{\Asterisk_{#1 \in #2} #3}
\newcommand{\bigstarp}[3]{\Asterisk_{#1 \in #2} {\left ( #3 \right )}}

The other key addition will be the ``big star,'' \emph{iterated separating conjunction}, with quantification over finite sets, written like $\bigstar{x}{S}{P(x)}$.
The definition is:
\begin{eqnarray*}
  \bigstar{x}{\{v_1, \ldots, v_n\}}{P(x)} &=& P(v_1) * \ldots * P(v_n)
\end{eqnarray*}

The reader may be worried about the inherently unordered nature of sets.
For each ordering of a set, we get a syntactically distinct formula on the righthand side of the defining equation.
Luckily, separating conjunction $*$ is associative and commutative, so all orders lead to logically equivalent formulas.

With those preliminaries out of the way, we can state the soundness theorem, referring again to the \emph{not-about-to-fail} predicate $\mathsf{natf}$ from last chapter, extended appropriately to say that loops are not about to fail.

\invariants
\begin{theorem}[Soundness]
  If $\hoare{P}{c}{Q}$, and if a heap $h$ satisfies the predicate $(P * \bigstar{\ell}{L}{\mathcal I(\ell)})$, then $\mathsf{natf}$ is an invariant of the system starting at state $(h, \emptyset, c)$.
\end{theorem}

The theorem lays out restrictions on the starting heap.
It must have a segment to serve as the root thread's local heap, matching precondition $P$.
Then, for each lock $\ell \in L$, there must be an associated memory region satisfying $\mathcal I(\ell)$.
Our use of separating conjunction forces each of these regions to occupy disjoint memory from all the others.

Some key lemmas support the proof.
Here are the highlights.
The first is representative of a family of lemmas that we prove, one for each syntactic construct of the object language.

\begin{lemma}
  If $\hoare{P}{\mt{Read} \; a}{Q}$, then there exists $R$ such that $P \Rightarrow \exists v. \; \ptsto{a}{v} * R(v)$ and, for all $r$, $\ptsto{a}{r} * R(r) \Rightarrow Q(r)$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hoare{P}{\mt{Read} \; a}{Q}$.
\end{proof}

As another example incorporating more of the complexities of concurrency, we have this lemma.

\begin{lemma}
  If $\hoare{P}{c_1 || c_2}{Q}$, then there exist $P_1$, $P_2$, $Q_1$, and $Q_2$ such that $\hoare{P_1}{c_1}{Q_1}$, $\hoare{P_2}{c_2}{Q_2}$, and $P \Rightarrow P_1 * P_2$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hoare{P}{c_1 || c_2}{Q}$.
  One somewhat surprising case is when the frame rule begins the derivation.
  We have some predicate $R$ that is added to both the precondition and postcondition.
  In picking $P_1$, $P_2$, $Q_1$, and $Q_2$, we have a choice as to where we incorporate $R$.
  The two threads together leave $R$ alone, so clearly either thread individually does, too.
  Therefore, we arbitrarily incorporate $R$ in $P_1$ and $Q_1$.
\end{proof}

Two lemmas express crucial techniques to isolate elements within iterated conjunction.

\begin{lemma}\label{chunkslock}
  If $v \in S$, then $\bigstar{x}{S}{P(x)} \Rightarrow P(v) * \bigstar{x}{S \setminus \{v\}}{P(x)}$.
\end{lemma}
\begin{proof}
  By induction on the cardinality of $S$.
\end{proof}

\begin{lemma}\label{chunksunlock}
  If $v \notin S$, then $P(v) * \bigstar{x}{S}{P(x)} \Rightarrow \bigstar{x}{S \cup \{v\}}{P(x)}$.
\end{lemma}
\begin{proof}
  By induction on the cardinality of $S$.
\end{proof}

\begin{lemma}[Preservation]\label{cslpreservation}
  If $\smallstep{(h, l, c)}{(h', l', c')}$, $\hoare{P}{c}{Q}$, and $h$ satisfies $(P * R * \bigstarp{\ell}{L}{\guarded{\ell \notin l}{\mathcal I(\ell)}})$, then there exists $P'$ such that $\hoare{P'}{c'}{Q}$, where $h'$ satisfies $(P' * R * \bigstarp{\ell}{L}{\guarded{\ell \notin l'}{\mathcal I(\ell)}})$.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\smallstep{(h, l, c)}{(h', l', c')}$.
  The cases for lock and unlock respectively use Lemmas \ref{chunkslock} and \ref{chunksunlock}.
  Note that we include the parameter $R$ solely to get a strong enough induction hypothesis for steps of commands $c_1 || c_2$.
  We need to know that a step by one thread does not change the private heap of the other thread.
  To draw that conclusion, in appealing to the induction hypothesis, we extend $R$ with precisely that private state.
\end{proof}

\begin{lemma}\label{nonelocked}
  $\bigstar{\ell}{L}{\mathcal I(\ell)} \Rightarrow \bigstarp{\ell}{L}{\guarded{\ell \notin \emptyset}{\mathcal I(\ell)}}$.
\end{lemma}
\begin{proof}
  By induction on the cardinality of $L$.
\end{proof}

\begin{lemma}\label{cslinvariant}
  If $\hoare{P}{c}{Q}$, and if a heap $h$ satisfies the predicate $(P * \bigstar{\ell}{L}{\mathcal I(\ell)})$, then an invariant of the system starting at state $(h, \emptyset, c)$ is: for reachable state $(h', l', c')$, there exists $P'$ where $\hoare{P'}{c'}{Q}$, such that $h'$ satisfies $(P' * \bigstarp{\ell}{L}{\guarded{\ell \notin l'}{\mathcal I(\ell)}})$.
\end{lemma}
\begin{proof}
  By invariant induction\index{invariant induction}, using Lemma \ref{nonelocked} for the base case and Lemma \ref{cslpreservation} for the induction step, the latter with $R = \emp$.
\end{proof}

\begin{lemma}[Progress]\label{cslprogress}
  If $\hoare{P}{c}{Q}$ and $c$ is about to fail, then $P$ is unsatisfiable.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\hoare{P}{c}{Q}$.
\end{proof}

The overall soundness proof proceeds by invariant weakening\index{invariant weakening} with the invariant established by Lemma \ref{cslinvariant}.
We prove the inclusion of new invariant in old by Lemma \ref{cslprogress}.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{\label{process_algebra}Process Algebra and Refinement}

The last two chapters dealt with the most popular sort of concurrent programming, the threads-and-locks\index{threads and locks} shared-memory\index{shared-memory concurrency} style.
It's a fundamentally imperative style, with side effects coordinating synchronization across threads.
Another well-established (and increasingly popular) style is \emph{message passing}\index{message-passing concurrency}, which is closer in spirit to functional programming.
In that world, there is, in fact, no memory at all, let alone shared memory.
Instead, state is incorporated into the text of thread code, and information passes from thread to thread by sending \emph{messages} over \emph{channels}\index{channel}.
There are two main kinds of message passing.
In the \emph{asynchronous}\index{asynchronous message passing} or \emph{mailbox}\index{mailbox} style, a thread can deposit a message in a channel, even when no one is ready to receive the message immediately.
Later, a thread can come along and effectively dequeue the message from the channel.
In the \emph{synchronous}\index{synchronous message passing} or \emph{rendezvous}\index{rendezvous} style, a message send only executes when a matching receive, on the same channel, is available immediately.
The threads of the two complementary operations \emph{rendezvous} and pass the message in one atomic step.

Packages of semantics and proof techniques for such languages are often called \emph{process algebras}\index{process algebra}, as they support an algebraic style of reasoning about the source code of message-passing programs.
That is, we prove laws very similar to the familiar equations of algebra and use those laws to ``rewrite'' inside larger processes, by replacing their subprocesses with others we have shown suitably equivalent.
It's a powerful technique for highly modular proofs, which we develop in the rest of this chapter for one concrete synchronous language.
Well-known process algebras include the $\pi$-calculus\index{$\pi$-calculus} and the Calculus of Communicating Systems\index{Calculus of Communicating Systems}; the one we focus on is idiosyncratic and designed partly to make the Coq proofs manageable.


\section{An Object Language with Synchronous Message Passing}

\newcommand{\newp}[3]{\nu[#1](#2); #3}
\newcommand{\block}[2]{\mt{block}(#1); #2}
\newcommand{\send}[3]{!#1(#2); #3}
\newcommand{\recv}[3]{?#1(#2); #3}
\newcommand{\parl}[2]{#1 || #2}
\newcommand{\dup}[1]{\mt{dup}(#1)}
\newcommand{\done}[0]{\mt{done}}

$$\begin{array}{rrcl}
  \textrm{Channels} & c \\
  \textrm{Processes} & p &::=& \newp{\vec{c}}{x}{p(x)} \mid \block{c}{p} \mid \; \send{c}{v}{p} \mid \; \recv{c}{x}{p(x)} \mid \parl{p}{p} \mid \dup{p} \mid \done
\end{array}$$

Here's the intuitive explanation of each syntax construction.
\begin{itemize}
  \item \textbf{Fresh channel generation}\index{fresh channel generation} $\newp{\vec{c}}{x}{p(x)}$ creates a new \emph{private} channel to be used by the body process $p(x)$, where we replace $x$ with the channel that is chosen.  Following tradition, we use the Greek letter $\nu$\index{$\nu$}\index{nu} (nu) for this purpose.  Each generation operation takes a parameter $\vec{c}$, which we call the \emph{support}\index{support} of the operation.  It gives a list of channels already in use for other purposes, so that the fresh channel must not equal any of them.  (We assume an infinite domain of channels, so that, for any specific list, it is always possible to find a channel not in that list.)

  \item \textbf{Abstraction boundaries} $\block{c}{p}$ prevent ``the outside world'' from sending $p$ any messages on channel $c$ or receiving any messages from $p$ via $c$.  That is, $c$ is treated as a local channel for $p$.
\abstraction

  \item \textbf{Sends} $\send{c}{v}{p}$ and \textbf{receives} $\recv{c}{x}{p(x)}$, where we use an exclamation mark to suggest ``telling something'' and a question mark to suggest ``asking something.''  Processes of these kinds can rendezvous when they agree on the channel.  When $\send{c}{v}{p_1}$ and $\recv{c}{x}{p_2(x)}$ rendezvous, they respectively evolve to $p_1$ and $p_2(v)$.

  \item \textbf{Parallel compositions}\index{duplication} $\parl{p_1}{p_2}$ work as we're used to by now.

  \item \textbf{Duplications}\index{duplication} $\dup{p}$ act just like infinitely many copies of $p$ composed in parallel.  We use them to implement nonterminating ``server'' processes that are prepared to respond to many requests over particular channels.  In traditional process algebra, duplication fills the role that loops and recursion fill in conventional programming.

  \item \textbf{The inert process}\index{inert process} $\done$ is incapable of doing anything at all.  It stands for a finished program.
\end{itemize}

\newcommand{\readl}[2]{?#1(#2)}
\newcommand{\writel}[2]{!#1(#2)}
\newcommand{\lts}[3]{#1 \stackrel{#2}{\longrightarrow} #3}
\newcommand{\ltsS}[3]{#1 \stackrel{#2}{\longrightarrow}^* #3}

\medskip

We give an operational semantics in the form of a \emph{labeled transition system}\index{labeled transition system}, as we did to formalize output instructions for compiler correctness in Chapter \ref{compiler_correctness}.
That is, we not only express how a step takes us from one state to another, but we also associate each step with a \emph{label}\index{label} that summarizes what happened.
Our labels will include the \emph{silent} label $\silent$, read labels $\readl{c}{v}$, and write labels $\writel{c}{v}$.
The latter two indicate that a thread has read a value from or written a value to channel $c$, respectively, and the parameter $v$ indicates which value was read or written.
We write $\lts{p_1}{l}{p_2}$ to say that process $p_1$ steps to $p_2$ by performing label $l$.
We use $\lts{p_1}{}{p_2}$ as an abbreviation for $\lts{p_1}{\silent}{p_2}$.

We start with the rules for sends and receives.
$$\infer{\lts{\send{c}{v}{p}}{\writel{c}{v}}{p}}{}
\quad \infer{\lts{\recv{c}{x}{p(x)}}{\readl{c}{v}}{p(v)}}{}$$
They record the action in the obvious way, but there is already an interesting wrinkle: the rule for receives \emph{picks a value $v$ nondeterministically}.
This nondeterminism is resolved by the next two rules, the rendezvous rules, which force a read label to match a write label precisely.

$$\infer{\lts{\parl{p_1}{p_2}}{}{\parl{p'_1}{p'_2}}}{
  \lts{p_1}{\writel{c}{v}}{p'_1}
  & \lts{p_2}{\readl{c}{v}}{p'_2}
}
\quad \infer{\lts{\parl{p_1}{p_2}}{}{\parl{p'_1}{p'_2}}}{
  \lts{p_1}{\readl{c}{v}}{p'_1}
  & \lts{p_2}{\writel{c}{v}}{p'_2}
}$$

A fresh channel generation can step according to any valid choice of channel.
$$\infer{\lts{\newp{\vec c}{x}{p(x)}}{}{\block{c}{p(c)}}}{
  c \notin \vec c
}$$

An abstraction boundary prevents steps with labels that mention the protected channel.
(We overload notation $c \in l$ to indicate that channel $c$ appears in the send/receive position of label $l$.)
$$\infer{\lts{\block{c}{p}}{l}{\block{c}{p'}}}{
  \lts{p}{l}{p'}
  & c \notin l
}$$

Any step can be lifted up out of a parallel composition.
$$\infer{\lts{\parl{p_1}{p_2}}{l}{\parl{p'_1}{p_2}}}{
  \lts{p_1}{l}{p'_1}
}
\quad \infer{\lts{\parl{p_1}{p_2}}{l}{\parl{p_1}{p'_2}}}{
  \lts{p_2}{l}{p'_2}
}$$

Finally, a duplication can spawn a new copy (``thread'') at any time.
$$\infer{\lts{\dup{p}}{}{\parl{\dup{p}}{p}}}{}$$

The labeled-transition-system approach may seem a bit unwieldy for just explaining the behavior of programs.
Where it really pays off is in supporting a modular, algebraic reasoning style about processes, which we turn to next.


\section{Refinement Between Processes}

What sorts of correctness theorems should we prove about processes?
The classic choice is to show that a more complex \emph{implementation} process is a \emph{safe substitute} for a simpler \emph{specification} process.
We will say that the implementation $p$ \emph{refines}\index{refinement} the specification $p'$.
Intuitively, such a claim means that any trace of labels that $p$ could generate may also be generated by $p'$, so that $p$ has \emph{no more behaviors} than $p'$ has, though it may have fewer behaviors.
(There is a formal connection lurking here to the notion of refinement from Chapter \ref{deriving}, where method calls there are analogous to channel operations here.)
Crucially, in building traces of process executions, we ignore silent labels, only collecting the send and receive labels.

This condition is called \emph{trace inclusion}\index{trace inclusion}, and, though it is intuitive, it is not strong enough to support all of the composition properties that we will want.
Instead, we formalize refinement via \emph{simulation}, very similarly to how we formalized compiler correctness in Chapter \ref{compiler_correctness} and data abstraction in Chapter \ref{deriving}.

\abstraction
\begin{definition}
  Binary relation $R$ between processes is a \emph{simulation} when these two conditions hold.
  \begin{itemize}
  \item \textbf{Silent steps match up}: when $p_1 \; R \; p_2$ and $\lts{p_1}{}{p'_1}$, there always exists $p'_2$ such that $\ltsS{p_2}{}{p'_2}$ and $p'_1 \; R \; p'_2$.
  \item \textbf{Communication steps match up}: when $p_1 \; R \; p_2$ and $\lts{p_1}{l}{p'_1}$ for $l \neq \silent$, there always exist $p''_2$ and $p'_2$ such that $\ltsS{p_2}{}{p''_2}$, $\lts{p''_2}{l}{p'_2}$, and $p'_1 \; R \; p'_2$.
  \end{itemize}
\end{definition}

Intuitively, $R$ is a simulation when, starting in a pair of related processes, any step on the left can be matched by a step on the right, taking us back into $R$.
The conditions are naturally illustrated with commuting diagrams\index{commuting diagram}.
\[
\begin{tikzcd}
p_1 \arrow{r}{R} \arrow{d}{\forall \longrightarrow} & p_2 \arrow{d}{\exists \longrightarrow^*} \\
p'_1 & p'_2 \arrow{l}{R^{-1}}
\end{tikzcd}
\quad \begin{tikzcd}
p_1 \arrow{r}{R} \arrow{d}{\forall \stackrel{l}{\longrightarrow}} & p_2 \arrow{d}{\exists \longrightarrow^* \stackrel{l}{\longrightarrow}} \\
p'_1 & p'_2 \arrow{l}{R^{-1}}
\end{tikzcd}
\]

\newcommand{\refines}[2]{#1 \leq #2}

\invariants
Simulations have quite a lot in common with our well-worn concept of invariants of transition systems.
Simulation can be seen as a kind of natural generalization of invariants, which are predicates over single states, into relations that apply to states of two different transition systems that need to evolve in (approximate) lock-step.

We define \emph{refinement} $\refines{p_1}{p_2}$ to indicate that there exists a simulation $R$ such that $p_1 \; R \; p_2$.
Luckily, this somewhat involved definition is easily related back to our intuitions.

\begin{theorem}
  If $\refines{p_1}{p_2}$, then every trace generated by $p_1$ is also generated by $p_2$.
\end{theorem}
\begin{proof}
  By induction on executions of $p_1$.
\end{proof}

Refinement is also a preorder\index{preorder}.

\begin{theorem}[Reflexivity]
  For all $p$, $\refines{p}{p}$.
\end{theorem}
\begin{proof}
  Choose equality as the simulation relation.
\end{proof}

\begin{theorem}[Transitivity]
  If $\refines{p_1}{p_2}$ and $\refines{p_2}{p_3}$, then $\refines{p_1}{p_3}$.
\end{theorem}
\begin{proof}
  The two premises respectively imply the existence of simulations $R_1$ and $R_2$.
  Set the new simulation relation as $R_1 \circ R_2$, defined to contain a pair $(p, q)$ iff there exists $r$ with $p \; R_1 \; r$ and $r \; R_2 \; q$.
\end{proof}

The accompanying Coq code includes several examples of verifying moderately complex processes, by manual tailoring of simulation relations.
We leave those details to the code, turning now instead to further algebraic properties that allow us to \emph{compose} laborious manual proofs about components, in a black-box way.


\section{The Algebra of Refinement}

We finish the chapter with a tour through some algebraic properties of refinement that are proved in the Coq source.
We usually omit proof details here, though we work out one interesting example in more detail.

Perhaps the greatest pay-off from the refinement approach is that \emph{refinement is a congruence for parallel composition}\index{congruence}.
\begin{theorem}
  If $\refines{p_1}{p'_1}$ and $\refines{p_2}{p'_2}$, then $\refines{\parl{p_1}{p_2}}{\parl{p'_1}{p'_2}}$.
\end{theorem}

\modularity
This deceptively simple theorem statement packs a strong modularity punch!
We can verify a component in isolation and then connect to an arbitrary additional component, immediately concluding that the composition behaves properly.
The secret sauce, implicit in our formulation of the object language and refinement, is the labeled-transition-system style, where processes may generate receive labels nondeterministically.
In this way, we can reason about a process implicitly in terms of \emph{every value that some other process might send to it when they are composed}, without needing to quantify explicitly over all other eligible processes.

A similar congruence property holds for duplication, and we'll take this opportunity to explain a bit of the proof, in the form of choosing a good simulation relation.
\begin{theorem}
  If $\refines{p}{p'}$, then $\refines{\dup{p}}{\dup{p'}}$.
\end{theorem}
\begin{proof}
  The premise implies the existence of a simulation $R$.
  We define a derived relation $R^D$ with these inference rules.
  $$\infer{p \; R^D \; p'}{
    p \; R \; p'
  }
  \quad \infer{\dup{p} \; R^D \; \dup{p'}}{
    p \; R \; p'
  }
  \quad \infer{\parl{p_1}{p_2} \; R^D \; \parl{p'_1}{p'_2}}{
    p_1 \; R^D \; p'_1
    & p_2 \; R^D \; p'_2
  }$$
  $R^D$ is precisely the relation we need to finish the current proof.
  Intuitively, the challenge is that $\dup{p}$ includes infinitely many copies of $p$, each of which may evolve in a different way.
  It is even possible for different copies to interact with each other through shared channels.
  However, comparing intermediate states of $\dup{p}$ and $\dup{p'}$, we expect to see a shared backbone, where corresponding threads are related by the original simulation $R$.
  The definition of $R^D$ formalizes that intuition of a shared backbone with $R$ connecting corresponding leaves.
\end{proof}

\newcommand{\neverUses}[2]{\mt{neverUses}(#1, #2)}

We wrap up the chapter with a few more algebraic properties, which the Coq code puts to good use in larger examples.
We sometimes rely on a predicate $\neverUses{c}{p}$, to express that, no matter how other threads interact with it, process $p$ will never perform a send or receive operation on channel $c$.

\begin{theorem}
  If $\refines{p}{p'}$, then $\refines{\block{c}{p}}{\block{c}{p'}}$.
\end{theorem}

\begin{theorem}
  $\refines{\block{c_1}{\block{c_2}{p}}}{\block{c_2}{\block{c_1}{p}}}$
\end{theorem}

\begin{theorem}
  If $\neverUses{c}{p_2}$, then $\refines{(\block{c}{\parl{p_1}{p_2}})}{\parl{(\block{c}{p_1})}{p_2}}$.
\end{theorem}

\begin{theorem}[Handoff]
  If $\neverUses{c}{p(v)}$, then $\refines{(\block{c}{\parl{(\send{c}{v}{\done})}{\dup{\recv{c}{x}{p(x)}}}})}{p(v)}$.
\end{theorem}

That last theorem is notable for how it prunes down the space of possibilities given an infinitely duplicated server, where each thread is trying to receive from a channel.
If server threads never touch that channel after their initial receives, then most server threads will remain inert.
The one send $\send{c}{v}{\done}$ is the only possible source of interaction with server threads, thanks to the abstraction barrier on $c$, and that one send can only awaken one server thread.
Thus, the whole composition behaves just like a single server thread, instantiated with the right input value.

A concrete example of the Handoff theorem in action is a refinement like this one, applying to a kind of forwarding chain between channels:
$$\begin{array}{l}
  p = \block{c_1}{\block{c_2}{\parl{\send{c_1}{v}{\done}}{\parl{\dup{\recv{c_1}{x}{\send{c_2}{x}{\done}}}}{\dup{\recv{c_2}{y}{\send{c_3}{y}{\done}}}}}}} \\
  \refines{p}{\; \send{c_3}{v}{\done}}
\end{array}$$

Note that, without the abstraction boundaries at the start, this fact would not be derivable.
We would need to worry about meddlesome threads in our environment interacting directly with $c_1$ or $c_2$, spoiling the protocol and forcing us to add extra cases to the righthand side of the refinement.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\newcommand{\compl}[1]{\overline{#1}}
\newcommand{\channels}[0]{\mathcal C}
\newcommand{\mpty}[4]{#1 :_{#2,#3} #4}
\newcommand{\mptys}[3]{#1 :_{#2} #3}

\chapter{Session Types}

Process algebra, as we met it last chapter, can be helpful for modeling network protocols\index{network protocols}.
Here, multiple \emph{parties} step through a script of exchanging messages and making decisions based on message contents.
A buggy party might introduce a \emph{deadlock}\index{deadlock}, where, say, party A is blocked waiting for a message from party B, while B is also waiting for A.
\emph{Session types} are a style of static type system\index{type system} that rule out deadlock while allowing convenient separate checking of each party, given a shared protocol type.

There is an almost unlimited variation of different versions of session types.
We still step through a progression of three variants here, and even by the end there will be obvious protocols that don't fit the framework.
Still, we aim to convey the core ideas of the approach.

\section{Basic Two-Party Session Types}

Each of our type systems will apply to the object language from the prior chapter.
Assume for now that a protocol involves exactly two parties.
Here is a simple type system, explaining a protocol's ``script'' from the perspective of one party.
$$\begin{array}{rrcl}
  \textrm{Base types} & \sigma \\
  \textrm{Session types} & \tau &::=& \send{c}{\sigma}{\tau} \mid \; \recv{c}{\sigma}{\tau} \mid \done
\end{array}$$

\abstraction
We model simple parties with no internal duplication or parallelism.
A session type looks like an abstracted version of a process, remembering only the \emph{types} of messages exchanged on channels, rather than their \emph{values}.
A simple set of typing rules makes the connection.
$$\infer{\send{c}{v}{p} : \; \send{c}{\sigma}{\tau}}{
  v : \sigma
  & p : \tau
}
\quad \infer{\recv{c}{x}{p(x)} : \; \recv{c}{\sigma}{\tau}}{
  \forall v : \sigma. \; p(v) : \tau
}
\quad \infer{\done : \done}{}$$

The only wrinkle in these rules is the use of universal quantification for the receive rule, to force the body to type-check under any well-typed value read from the channel.
Actually, such proof obligations may be nontrivial when we encode this object language in the mixed-embedding\index{mixed embeddings} style of Section \ref{mixed}, where the body $p$ in the rule could include arbitrary metalanguage computation, to choose a body based on the value $v$ read from the channel.

The associated Coq code demonstrates tactics to deal with that complication, for automatic type-checking of concrete programs.
That code is also where we keep all of our concrete examples of object-language programs.

For the rest of this chapter, we will interpret last chapter's object language as a transition system with one small change: we only allow silent steps\index{silent steps}.
That is, we only model whole programs, with no communication with ``the environment.''
As a result, we consider self-contained protocols.

A satisfying soundness theorem applies to our type system.  To state it, we first need the crucial operation of \emph{complementing}\index{complement (of a session type)} a session type.
\begin{eqnarray*}
  \compl{\send{c}{\sigma}{\tau}} &=& \recv{c}{\sigma}{\compl{\tau}} \\
  \compl{\recv{c}{\sigma}{\tau}} &=& \send{c}{\sigma}{\compl{\tau}} \\
  \compl{\done} &=& \done
\end{eqnarray*}

\modularity
It is apparent that complementation just swaps the sends and receives.
When the original session type tells one party what to do, the complement type tells the other party what to do.
The power of this approach is that we can write one global protocol description (the session type) and then check two parties' code against it separately.
A new version of one party can be dropped in without rechecking the other party's code.

Using complementation, we can give succinct conditions for deadlock freedom of a pair of parties.

\begin{theorem}\label{stsound1}
  If $p_1 : \tau$ and $p_2 : \compl{\tau}$, then it is an invariant of $\parl{p_1}{p_2}$ that an intermediate process is either $\parl{\done}{\done}$ or can take a step.
\end{theorem}
\begin{proof}
  By invariant induction, after strengthening the invariant to say that any intermediate process takes the form $\parl{p'_1}{p'_2}$, where, for some type $\tau'$, we have $p'_1 : \tau'$ and $p'_2 : \compl{\tau'}$.
  The inductive case of the proof proceeds by simple inversion on the derivation of $p'_1 : \tau'$, where by the definition of complement it is apparent that any communication $p'_1$ performs has a matching action at the start of $p'_2$.
  The choice of $\tau'$ changes during such a step, to the ``tail'' of the old $\tau'$.
\end{proof}

\section{Dependent Two-Party Session Types}

It is a boring protocol that follows such a regular communication pattern as our first type system accepts.
Rather, it tends to be crucial to change up the expected protocol steps, based on \emph{values} sent over channels.
It is natural to switch to a \emph{dependent}\index{dependent types} type system to strengthen our expressiveness.
That is, a communication type will allow its body type to depend on the value sent or received.
$$\begin{array}{rrcl}
  \textrm{Session types} & \tau &::=& \send{c}{x : \sigma}{\tau(x)} \mid \; \recv{c}{x : \sigma}{\tau(x)} \mid \done
\end{array}$$

Each nontrivial construct does more than give the base type that should be sent or received on or from a channel.
We also bind a variable $x$, to stand for the value sent or received.
It may be unintuitive that we must introduce a binder even for sends, when the sender is in control of which value will be sent.
The reason is that we must allow the sender to then continue with different subprotocols for different values that might be sent.
We should not force the sender's hand by fixing a value in advance, when that value might depend on arbitrary program logic.

Very little change is needed in the typing rules.
$$\infer{\send{c}{v}{p} : \; \send{c}{x : \sigma}{\tau(x)}}{
  v : \sigma
  & p : \tau(v)
}
\quad \infer{\recv{c}{x}{p(x)} : \; \recv{c}{x : \sigma}{\tau(x)}}{
  \forall v : \sigma. \; p(v) : \tau(v)
}
\quad \infer{\done : \done}{}$$

Our deadlock-freedom property is easy to reestablish.

\begin{theorem}
  If $p_1 : \tau$ and $p_2 : \compl{\tau}$, then it is an invariant of $\parl{p_1}{p_2}$ that an intermediate process is either $\parl{\done}{\done}$ or can take a step.
\end{theorem}
\begin{proof}
  Literally the same Coq proof script as for Theorem \ref{stsound1}!
\end{proof}

\section{Multiparty Session Types}\index{multiparty session types}

New complications arise when more than two parties are communicating in a protocol.
The Coq code demonstrates a case of an online merchant, a customer sending it orders, and a warehouse being queried by the merchant to be sure a product is in stock.
Many other such examples appear in the real world.

Now it is no longer possible to start from one party's view of a protocol and compute any other party's view.
The reason is that each message only involves two parties.
Any other party will not see that message in its own session type, making it impossible to preserve that message in a complement-like operation.

Instead, we define one global session type that includes only ``send'' operations.
However, we name the parties and parameterize on a mapping $\channels$ from channels to unique parties that own their send and receive ends.
That is, for any given channel and operation on it (send and receive), precisely one party is given permission to perform the operation -- and indeed, when the time comes, that party is \emph{obligated} to perform the operation, to avoid deadlock.

With that view in mind, our type language gets even simpler.
$$\begin{array}{rrcl}
  \textrm{Session types} & \tau &::=& \send{c}{x : \sigma}{\tau(x)} \mid \done
\end{array}$$

We redefine the typing judgment as $\mpty{p}{\alpha}{b}{\tau}$.
Here $\alpha$ is the identifier of the party running $p$, and $b$ is a Boolean that, when set, enforces that $p$'s next action (if any) is a receive.
$$\infer{\mpty{\send{c}{v}{p}}{\alpha}{\bot}{\send{c}{x : \sigma}{\tau(x)}}}{
  v : \sigma
  & \channels(c) = (\alpha, \beta)
  & \beta \neq \alpha
  & \mpty{p}{\alpha}{\bot}{\tau(v)}
}$$

$$\infer{\mpty{\recv{c}{x}{p(x)}}{\alpha}{b}{\send{c}{x : \sigma}{\tau(x)}}}{
  \channels(c) = (\beta, \alpha)
  & \beta \neq \alpha
  & \forall v : \sigma. \; \mpty{p(v)}{\alpha}{\bot}{\tau(v)}
}$$

$$\infer{\mpty{p}{\alpha}{b}{\send{c}{x : \sigma}{\tau(x)}}}{
  \channels(c) = (\beta, \gamma)
  & \beta \neq \alpha
  & \gamma \neq \alpha
  & \forall v : \sigma. \; \mpty{p}{\alpha}{\top}{\tau(v)}
}$$

$$\infer{\mpty{\done}{\alpha}{b}{\done}}{}$$

The first two rules encode the simple cases where the current party $\alpha$ is one of the two designated to step next in the protocol, as we verify by looking up the channel in $\channels$.
It is important that the send and receive ends of the channel are owned by different parties, or we would clearly have a deadlock, as that party would either wait forever for a message from itself or try futilely to send itself a message!
The $\neq$ premises enforce that condition.
Also, the Boolean subscript enforces that we cannot be running a send operation if we have been instructed to run a receive next.
That flag is reset to false in the recursive premises, since we only use the flag to express an obligation for the very next command.

The third rule is crucial: it applies to a process that is not participating in the next step of the protocol.
That is, we look up the owners of the channel that comes next, and we verify that neither owner is $\alpha$.
In this case, we merely proceed to the next protocol step, leaving the process unchanged.
Crucially, we must be prepared for any value that might be exchanged in this skipped step, even though we do not see it ourselves.

Why does the last premise of the third rule set the Boolean flag, forcing the next action to be a receive?
Otherwise, at some point in the protocol, we could have multiple parties trying to send messages.
In such a scenario, there might not be a unique step that the composed parties can take.
The proofs are easier if we can assume deterministic execution within a protocol, which is why we introduced this static restriction.

To amend our theorem statement, we need to characterize when a process implements a set of parties correctly.
We use the judgment $\mptys{p}{\vec{\alpha}}{\tau}$ to that end, where $p$ is the process, $\vec{\alpha}$ is a list of all the involved parties, and $\tau$ is the type they must follow collectively.
$$\infer{\mptys{\done}{[]}{\tau}}{}
\quad \infer{\mptys{\parl{p_1}{p_2}}{\concat{\alpha}{\vec{\beta}}}{\tau}}{
  \mpty{p_1}{\alpha}{\bot}{\tau}
  & \mptys{p_2}{\vec{\beta}}{\tau}
}$$

The heart of the proof is demonstrating the existence of a unique sequence of steps to a point where all parties are done.
Here is a sketch of the key lemmas.

\begin{lemma}\label{forever_done}
  If $\mptys{p}{\vec{\alpha}}{\done}$, then $p$ can't take any silent step.
\end{lemma}
\begin{proof}
  By induction on any derivation of a silent step, followed by inversion on $\mptys{p}{\vec{\alpha}}{\done}$.
\end{proof}

\begin{lemma}\label{comm_stuck}
  If $\mptys{p}{\vec{\alpha}}{\; \send{c}{x : \sigma}{\tau(x)}}$ and at least one of sender or receiver of channel $c$ is missing from $\vec{\alpha}$, then $p$ can't take any silent step.
\end{lemma}
\begin{proof}
  By induction on any derivation of a silent step, followed by inversion on $\mptys{p}{\vec{\alpha}}{\; \send{c}{x : \sigma}{\tau(x)}}$.
\end{proof}

\begin{lemma}\label{preserve_unused}
  Assume that $\vec{\alpha}$ is a duplicate-free list of parties excluding both sender and receiver of channel $c$.
  If $\mptys{p}{\vec{\alpha}}{\; \send{c}{x : \sigma}{\tau(x)}}$, then for any $v : \sigma$, we have $\mptys{p}{\vec{\alpha}}{\tau(v)}$.
  In other words, when we have well-typed code for a set of parties that do not participate in the first step of a protocol, that code remains well-typed when we advance to the next protocol step.
\end{lemma}
\begin{proof}
  By induction on the derivation of $\mptys{p}{\vec{\alpha}}{\; \send{c}{x : \sigma}{\tau(x)}}$.
\end{proof}

\begin{lemma}\label{find_sender}
  Assume that $\vec{\alpha}$ is a duplicate-free list of parties, at least comprehensive enough to include the sender of channel $c$.
  However, $\vec{\alpha}$ should \emph{exclude} the receiver of $c$.
  If $\mptys{p}{\vec{\alpha}}{\; \send{c}{x : \sigma}{\tau(x)}}$ and $\lts{p}{\writel{c}{v}}{p'}$, then $\mptys{p'}{\vec{\alpha}}{\tau(v)}$.
\end{lemma}
\begin{proof}
  By induction on steps followed by inversion on multiparty typing.
  As we step through elements of $\vec{\alpha}$, we expect to ``pass'' parties that do not participate in the current protocol step.
  Lemma \ref{preserve_unused} lets us justify those passings.
\end{proof}

\begin{theorem}
  Assume that $\vec{\alpha}$ is a duplicate-free list of \emph{all} parties for a protocol.
  If $\mptys{p}{\vec{\alpha}}{\tau}$, then it is an invariant of $p$ that an intermediate process is either inert (made up only of $\done$s and parallel compositions) or can take a step.
\end{theorem}
\begin{proof}
  By invariant induction, after strengthening the invariant to say that any intermediate process $p'$ satisfies $\mptys{p'}{\vec{\alpha}}{\tau'}$ for some $\tau'$.
  The inductive case uses Lemma \ref{forever_done} to rule out steps by finished protocols, and it uses Lemma \ref{comm_stuck} to rule out cases that are impossible because parties that are scheduled to go next are not present in $\vec{\alpha}$.
  Interesting cases are where we find that one of the active parties is at the head of $\vec{\alpha}$.
  That party either sends or receives.
  In the first case, we appeal to Lemma \ref{find_sender} to find a receiver among the remaining parties.
  In the second case, we appeal to an analogous lemma (not stated here) to find a sender.

  The other crucial case of the proof is showing that existence of a multiparty typing implies that, if a process is not inert, it can take a step.
  The reasoning is quite similar to in the inductive case, but where instead of showing that any possible step preserves typing, we demonstrate that a particular step exists.
  The head of the session type telegraphs what step it is: for the communication at the head of the type, the assigned sending party sends to the assigned receiving party.
\end{proof}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\appendix

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{\label{coqref}The Coq Proof Assistant}

Coq\index{Coq} is a proof-assistant software package developed as open source, primarily by Inria\index{Inria}, the French national computer-science lab.

\section{Installation and Basic Use}

The project home page is:
\begin{center}
  \url{https://coq.inria.fr/}
\end{center}
The code associated with this book is designed to work with Coq versions 8.9 and higher.
The project Web site makes a number of versions available, and versions are also available in popular OS package distributions, along with binaries for platforms where open-source package systems are less common.
We assume that readers have installed Coq by one of those means or another.
It will also be almost essential to use some graphical interface for Coq editing.
The author prefers Proof General\index{Proof General}, an Emacs\index{Emacs} mode:
\begin{center}
  \url{http://proofgeneral.inf.ed.ac.uk/}
\end{center}
It should be possible to follow along using CoqIDE\index{CoqIDE}, a standalone tool distributed with Coq itself, but we will not give any CoqIDE-specific instructions.

The Proof General instructions are simple: after installing, within a regular Emacs session, open a file with the Coq extension \texttt{.v}.
Move the point (cursor) to a position where you would like to examine the current state of a proof, etc.
Then press C-C C-RET (``control-C, control-enter'') to run Coq up to that point.
Several display panes will open, showing different aspects of Coq's state, any error messages it wants to report, etc.
This feature is the main workhorse of Proof General.
It can be used both to move \emph{forward}, checking that Coq accepts a command; and to move \emph{backward}, to undo commands processed previously.

Proof General has plenty of other bells and whistles, but we won't go into them here.

\section{Tactic Reference}

\emph{Tactics} are the commands run in Coq to advance the state of a proof, corresponding to deduction steps at different granularities.
Here we collect all of the short explanations of tactics that appear in Coq source files associated with the chapters included in this document.
Note that many of these are specific to the \texttt{Frap} library distributed with this book, where built-in tactics often do quite similar things, but in a way that the author judges to be more of a hassle for beginners.

\begin{description}
  \item[\texttt{apply} $H$] For $H$ a hypothesis or previously proved theorem, establishing some fact that matches the structure of the current conclusion, switch to proving $H$'s own hypotheses.  This is \emph{backwards reasoning} via a known fact.
  \item[\texttt{apply} $H$ \texttt{with} \texttt{(}$x_1$\texttt{ := }$e_1$\texttt{) ... (}$x_n$\texttt{ := }$e_n$\texttt{)}] Like the last one, supplying values for quantified variables in $H$'s statement, especially for those variables whose values aren't immediately implied by the current goal.
  \item[\texttt{apply} $H_1$ \texttt{in} $H_2$] Like \texttt{apply} $H_1$, but used in a \emph{forward} direction rather than \emph{backward}.  For instance, if $H_1$ proves $P \Rightarrow Q$ and $H_2$ proves $P$, then the effect is to change $H_2$ to $Q$.
  \item[\texttt{assert} $P$] First prove proposition $P$, then continue with it as a new hypothesis.
  \item[\texttt{assumption}] Prove a conclusion that matches a hypothesis exactly.
  \item[\texttt{cases} $e$] Break the proof into one case for each constructor that might have been used to build the value of expression $e$.  In the special case where $e$ essentially has a Boolean type, we consider whether $e$ is true or false.
  \item[\texttt{constructor}] When proving an instance of an inductive predicate, \texttt{apply} the first matching rule of that predicate.
  \item[\texttt{eapply} $H$] Like \texttt{apply} but will work even when some quantified variables from $H$ do not have their values determined immediately by the form of the goal.  Instead, \emph{existential variables} (with names starting with question marks) are introduced for those values.
  \item[\texttt{eassumption}] Like \texttt{assumption} but will figure out values of existential variables.
  \item[\texttt{econstructor}] When proving an instance of an inductive predicate, \texttt{eapply} the first matching rule of that predicate.
  \item[\texttt{eexists}] To prove $\exists x. \; P(x)$, switch to proving $P(?y)$, for a new existential variable $?y$.
  \item[\texttt{equality}] A complete decision procedure for the theory of equality and uninterpreted functions.  That is, the goal must follow from only reflexivity, symmetry, transitivity, and congruence of equality, including that functions really do behave as functions.  See Section \ref{decidable}.
  \item[\texttt{exfalso}] From any proof state, switch to proving \texttt{False}.  In other words, indicate a switch to a proof by contradiction.
  \item[\texttt{exists} $e$] Prove $\exists x. \; P(x)$ by proving $P(e)$.
  \item[\texttt{first\_order}] Simplify a goal into zero or more new goals, based on the rules of first-order logic alone.  \emph{Warning:} this tactic is especially likely to run forever, on complex enough goals!  (While entailment for propositional logic is decidable, entailment for first-order logic isn't.)
  \item[\texttt{f\_equal}] When the goal is an equality between two applications of the same function, switch to proving that the function arguments are pairwise equal.
  \item[\texttt{induct} $x$] Where $x$ is a variable in the theorem statement, structure the proof by induction on the structure of $x$.  You will get one generated subgoal per constructor in the inductive definition of $x$.  (Indeed, it is required that $x$'s type was introduced with \texttt{Inductive}.)
  \item[\texttt{invert} $H$] Replace hypothesis $H$ with other facts that can be deduced from the structure of $H$'s statement.  More detail to be added here soon!
  \item[\texttt{linear\_arithmetic}] A complete decision procedure for linear arithmetic.  Relevant formulas are essentially those built up from variables and constant natural numbers and integers using only addition and subtraction, with equality and inequality comparisons on top.  (Multiplication by constants is supported, as a shorthand for repeated addition.) See Section \ref{decidable}.  Also note that this tactic goes a bit beyond that theory, by (1) converting multivariable terms into a standard polynomial form and then (2) treating each different product of powers of variables as one variable in a linear-arithmetic problem.  So, for instance, \texttt{linear\_arithmetic} can prove $x \times y = y \times x$ simply by deciding that a new variable $z = x \times y$, rewriting the goal to $z = z$ after putting polynomials in canonical form (in this case, commuting argument order in products to make it consistent).
  \item[\texttt{left}] Prove a disjunction by proving its left side.
  \item[\texttt{maps\_equal}] Prove that two finite maps are equal by considering all the relevant cases for mappings of different keys.
  \item[\texttt{propositional}] Simplify a goal into zero or more new goals, based on the rules of propositional logic alone.
  \item[\texttt{replace} $e_1$ \texttt{with} $e_2$ \texttt{by} \texttt{tac}] Replace occurrences of $e_1$ with $e_2$, proving $e_2 = e_1$ with tactic \texttt{tac}.
  \item[\texttt{rewrite} $H$] Where $H$ is a hypothesis or previously proved theorem, establishing \texttt{forall x1 .. xN, e1 = e2}, find a subterm of the goal that equals \texttt{e1}, given the right choices of \texttt{xi} values, and replace that subterm with \texttt{e2}.
  \item[\texttt{rewrite} $H_1$ \texttt{in} $H_2$] Like \texttt{rewrite} $H_1$ but performs the rewrite in hypothesis $H_2$ instead of in the conclusion.
  \item[\texttt{right}] Prove a disjunction by proving its right side.
  \item[\texttt{ring}] Prove goals that are equalities over some registered ring or semiring, in the sense of algebra, where the goal follows solely from the axioms of that algebraic structure.  See Section \ref{decidable}.
  \item[\texttt{simplify}] Simplify throughout the goal, applying the definitions of recursive functions directly.  That is, when a subterm matches one of the \texttt{match} cases in a defining \texttt{Fixpoint}, replace with the body of that case, then repeat.
  \item[\texttt{subst}] Remove all hypotheses like $x = e$ for variables $x$, simply replacing all uses of $x$ by $e$.
  \item[\texttt{symmetry}] When proving $X = Y$, switch to proving $Y = X$.
  \item[\texttt{transitivity} $X$] When proving $Y = Z$, switch to proving $Y = X$ and $X = Z$.
  \item[\texttt{trivial}] Coq maintains a database of simple proof steps, such as proving a fact by direct appeal to a matching hypothesis.  \texttt{trivial} asks to try all such simple steps.
  \item[\texttt{unfold} $X$] Replace $X$ by its definition.
  \item[\texttt{unfold} $X$ \texttt{in} \texttt{*}] Like the last one, but unfolds in hypotheses as well as conclusion.
\end{description}

\section{Further Reading}

For more Coq information, we recommend a few books (beyond the Coq reference manual).  Some focus purely on introducing Coq:

\begin{itemize}
  \item Adam Chlipala, \emph{Certified Programming with Dependent Types}, MIT Press, \url{http://adam.chlipala.net/cpdt/}
  \item Yves Bertot and Pierre Cast\'eran, \emph{Interactive Theorem Proving and Program Development: Coq'Art: The Calculus of Inductive Constructions}, Springer, \url{https://www.labri.fr/perso/casteran/CoqArt/}
\end{itemize}

The first of these two, especially, goes in-depth on the automated proof-scripting principles showcased from time to time in the Coq example code associated with the present book.

There are also other sources that introduce program-reasoning principles at the same time, including:

\begin{itemize}
  \item Benjamin C. Pierce et al., \emph{Software Foundations}, \url{http://www.cis.upenn.edu/~bcpierce/sf/}
\end{itemize}

\emph{Software Foundations} generally proceeds at a slower pace than this book does.

\backmatter
%    Bibliography styles amsplain or harvard are also acceptable.
%% \bibliographystyle{amsalpha}
%% \bibliography{}
%    See note above about multiple indexes.
\printindex

\end{document}