-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.tex
195 lines (155 loc) · 17.4 KB
/
intro.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
Our modern world depends on software in many aspects.
In the morning, software on our mobile phone can wake us up.
Then we often travel to work either by car or by public transport.
In either case, there is probably software in the vehicle, including life-critical software in engine control and safety systems such as ABS (brakes anting-blocking system).
In work, many people use computers to do their job and to communicate with coworkers and the rest of the world.
In the meanwhile, our mobile phone and the Internet accompanies us on almost every step.
We also use much infrastructure which can have important software components in its control systems -- including energy grids, smart traffic signs, and flight control.
Any of this plethora of software system can contain bugs.
These bugs can range from minor ones that cause inconvenience only, to safety-critical problems that can cause deaths of many people.
For this reason, ensuring software correctness is a desirable goal pursued by software developers and testers, as well as researchers.
Many techniques provide assistance to both developers and testers, with a wide range of capabilities and ease of use.
At the basic level, developers can leverage type system, especially if they use strongly statically typed programming language, and they can use compiler warnings and linters\mnote{Lightweight analysis tools that try to find likely bugs. They rely on approximation and usually perform more thorough analysis then compilers do for warnings, but still prefer speed and a low number of spurious error reports to the discovery of hard-to-find bugs.} to check for mistakes and potential problems in the code.
Code is also routinely tested with a large variety of test types, starting from unit tests that test small parts of the program in isolation and continuing to tests of functionality of the whole system.
These testing and code analysis techniques are wide-spread, can discover significant amounts of bugs, and are often reasonably easy to use for developers.
However, these techniques cannot prove the absence of bugs, and certain types of bugs are notoriously hard to discover by these testing techniques.
For example, testing parallel software is inherently hard due to thread interactions which can cause different executions of the same program (or function) to produce different results.
The program can run right in almost all cases, but can occasionally fail, maybe once in a few minutes, maybe once in a few months.
A conventional approach to this problem is using stress testing.
For example, a thread-safe data structure might be stress tested by performing millions of operations with it from many threads.
Such stress tests are designed to increase the likelihood of triggering a bug in the test, but they still cannot show its absence and can miss truly rare bugs.
Even worse, tests of parallel software are often nondeterministic on buggy software, i.e., the same test of the same software version can sometimes succeed and sometimes fail due to different interleaving of threads in different executions.
This behaviour might make bug fixing extremely difficult.
For example, a test may fail in a nightly automatic build, and therefore the developers know there is a bug in the program, but they are unable to reproduce the bug, and the test result might not be sufficient to find what the bug is without reproducing it.
Similarly, it is hard to ensure by testing that a (parallel) program terminates in all cases, or that it works correctly even on hardware with relaxed memory (which can, for example, delay stores to the main memory).\begin{marginnote}%
Consider this code:
\medskip
\begin{cppcode}
int x = 0;
int y = 0;
void t0() {
y = 1;
int a = x;
}
void t1() {
x = 1;
int b = y;
}
\end{cppcode}
Where \texttt{t0} and \texttt{t1} are executed by diffrent threads.
Intuitivelly we could expect that $\texttt{a} = 0 \land \texttt{b} = 0$ is not a possible outcome.
However, under the common x86 architecture, this outcome can happen due to delayed stores.
\end{marginnote}
To make the discovery of hard-to-find bugs easier, many techniques were proposed.
These include both testing-based methods, such as race detectors and thread sanitisers or record and replay debuggers, and methods based on \emph{formal verification} including theorem proving, symbolic execution and various variants of model checking.\mnote{The distinction between testing and formal verification is not clear and many techniques derived from formal verification are unable to prove correctness.
This blurring is further increased methods like stateless model checking that is also presented under the name systematic concurrency testing.}
In this work, we will focus on methods with a more formal basis that are capable of the discovery of hard-to-find bugs.
However, for these methods to be useful to programmers, they must not only be theoretically capable of bug discovery, but they must also be reasonably convenient to use.
For example, it is highly desirable that the program analysis tool can analyse the program directly, without the need to create a model of the program that is to be analysed (many, especially older, techniques for analysis of parallel programs require models).
For this reason, we will focus on analysis tools which work with \emph{realistic programs}, i.e., programs written in high-level programming languages which can be both executed on the desired platform and analysed without a separate modelling step.
\section{Problem Statements and Contributions}
In this work, we will describe several techniques that improve analysis of realistic parallel programs in the DIVINE model checker \mcite{DIVINEToolPaper2017, DIVINEWeb}, and we will compare them to many other methods that aim at the discovery of hard-to-find bugs in parallel programs.
In our analysis, we will focus primarily on programs written in C and C++.
Nevertheless, analysis of other real-world high-level imperative programming
languages (such as Java, C\#, Rust and many other) should follow similar
principles.
The work presented in this thesis is also accompanied by an open-source implementation in DIVINE.
At its core, DIVINE is an explicit-state model checker extended with data abstraction capabilities.
It can be used to discover large classes of bugs, including assertion violations, use of uninitialised memory, bad memory accesses, memory leaks, and concurrency related bugs such as deadlocks.
\subsection{Analysis of Realistic Programs}\label{sec:intro:programs}
Analysis of realistic (parallel) programs is a hard task because these programs have many features that are complex to handle by an analysis tool.
First, the syntax of the programming language is often complex -- modern programming languages such as C++, C\#, Java or Python are designed to be expressive and well usable by programmers and are usually not easy to parse and process.
Furthermore, the language can have many features which are hard to analyse even when the code is understood.
For example, dynamic memory complicates program analysis because the amount of used memory and identifiers of memory locations are not known beforehand.
Pointer arithmetic and memory unsafety of programming languages such as C and C++ further complicate the handling of memory.
Even procedure calls and recursion, which are available in virtually any general-purpose programming language, make some kinds of program analysis harder.
Realistic programming languages are also defined not only by the language syntax and semantics but also by their libraries.
The programming language usually comes with a standard library that provides basic abstractions and operations that most of the programmers will expect to work.
Program analysis tools must also understand these features if the programmers are to use them.
Even relatively minimalistic standard libraries, such as the C standard library, contain features like memory allocation and deallocation, basic string manipulations, math operations, basic input and output (including basic filesystem access), and since C11 also threads and their synchronisation.
Most standard libraries are even larger and more complex -- they contain various data structures, algorithms for working with them, and often abstractions over filesystem and network access.
For a program analysis tool to be useful to programmers, it should be able to process the code they are creating with minimal extra effort.
This means it must be able to handle complex language features and libraries.
For example, tools which understand basics of the C language, but do not allow dynamic memory allocation, are not very useful for programmers, as they will probably not be able to use them to analyse their usual programs.
On the other hand, if the language support is good enough, it enables the programmer to use the analysis tool directly on their program or its fragments, both during the development and for older code.
Apart from the ease of use, program analysis without modelling or simplification also increases the reliability of the analysis results --
if the analysed program differs from the program which is actually executed, the difference can hide problems or introduce new, spurious bugs.
\paragraph{Our Contribution}
In \autoref{chap:lang} we present a wider look at the problem of analysis of realistic
programs from the point of features of programming languages and their libraries.
We show that library reuse can be a viable approach for program analysers with
good support for language features and that it can significantly simplify
support for languages with complex libraries such as C++.
We also identify functionality which is a good candidate for
verification-specific modelling instead of reuse.
Finally, we focus on the particular case of exception support
for C++ in DIVINE and how it can be achieved by a combination of library reuse
and creation of verification-specific libraries.
Other works focusing on language support in program analysis tools include, for example, \mcite{Ramalho2013,Garzella2020}.
Exceptions are also supported in many Java tools and some tools with C++ support \mcite{Visser2003,Kroening2014,Cordeiro2019}.
However, to the best of our knowledge, our support of C++ exceptions is the only complete support for C++ exceptions in a comparable tool.
Furthermore, our method of implementation of exceptions is rather generic and leverages existing language-specific exception-handling code from C++ standard library implementation.
Therefore, it should be possible to adapt it to other programming languages with exception support with reasonable effort.
This contribution was first presented in \mcite{SRB2017}. \autoref{chap:lang} is further extended with unpublished content about language support in general.
\subsection{Parallelism and Relaxed Memory}
To fully utilise the capabilities of modern hardware, programmers are encouraged to write parallel software.
However, parallelism adds more complexity both for the author of the code and for the analysis tool as it has to be able to handle representation of threads and synchronisation in the given programming language and the semantics of parallel execution of the program.
Furthermore, relaxed memory can come into play with parallelism.
Modern processors use cache memories and out-of-order execution to improve their speed and hide speed difference between the processor's cores and the main memory.
For efficiency reasons, these mechanisms are often not transparent to the programmer and can be observed by multi-threaded programs, yielding possible executions that violate ordering of actions in their threads (for example, a write to a memory can be delayed past an independent read).
When this behaviour is observable on a hardware platform, we say that the given hardware platform has relaxed memory.
Relaxed memory adds substantial complexity to the program analysis --
an analysis tool has to be able to understand the particular relaxed memory model (different platforms exhibit different behaviour) and the amount of resources required to run the verification is also often significantly increased.
We believe that analysis under relaxed memory models is an important topic,
in particular in the context of lock-free programs.
\paragraph{Our Contribution}
\autoref{chap:mm} shows how efficient support for the memory model of the common Intel and AMD x86-64 processors was added to the DIVINE verifier.
In our work, we leverage the LLVM infrastructure to implement support for
relaxed memory as a preprocessing step, without the need to modify the verifier
itself (provided it has support for nondeterministic choice).
Furthermore, we show that the amount of nondeterminism in the resulting program
can be limited by careful design of data structures used to simulate the
behaviour of \xtso store buffers.
The decreased nondeterminism directly translates to smaller state space and therefore, faster and less memory-hungry program analysis.
Analysis of programs under various relaxed memory models is an active research area with a lot of related techniques, including \mcite{Alglave2013,Zhang2015,Abdulla2017tso,Kokologiannakis2017,Abdulla2018}.
Our work, which was originally presented in \mcite{SB2018x86tso}, presents a significant improvement over existing explicit-state techniques and complements well techniques based on bounded or stateless model checking.
\subsection{Local Nontermination of Parallel Programs}
An important part of the correctness of programs is that they eventually do the work they are supposed to do.
This implies the program should terminate or, often in the case of parallel or event-driven programs, that it should handle each request within a finite amount of time.
For example, a server is often running an infinite loop that spawns request handlers, and each of these request handlers must terminate.
Similarly, a critical section of a program is usually required to terminate to ensure the program does not get stuck.
Lock-free programs can sometimes rollback actions and it is important to check that the action will eventually finish.
Therefore, termination checking (and more generally checking of liveness properties) is an important companion to safety checking\mnote{Safety checking aims to show that the program does not perform any forbidden action. Examples include assertion safety (absence of violation of \emph{assertions} -- programmer specified properties that should hold at the given point in the code), memory safety (which includes correct access to arrays, correct handling of dynamic memory, and absence of stack overflows), or absence of use of undefined values (in programming languages such as C and C++ that can work with uninitialised memory).} in the pursuit of correct parallel programs.
Furthermore, it is often not sufficient to check that the whole program terminates.
It is more desirable to check that some designated part of the program terminates (e.g., critical section, event handler).
\paragraph{Our Contribution}
In \autoref{chap:lnterm}, we introduce a generic method for
detection of nontermination caused by communication between threads.
It uses lightweight annotations in the code to mark parts of the program that
must terminate, together with a set of pre-defined parts of programs that must
terminate which allow it to be used with common synchronisation primitives out
of the box.
Our method can also be applied to deliberately nonterminating programs (i.e.,
daemons, services) in which it can detect parts that should terminate but do
not terminate.
% Termination analysis is mostly considered in the context of sequential programs \mcite{Cook2006,Berdine2006,Dietsch2015,Brockschmidt2016,Hensel2017,Giesl2017}.
Termination of parallel programs is less explored than for the sequential case.
Nevertheless, there are many existing works, including specialized tools like \mcite{CC14,agarwal2010detection,bensalem2005scalable} (which are specialized to given synchronisation primitives) and \mcite{Chaki2005,Demartini99} (which can detect global deadlocks).
More general tools include those based on modular proofs \mcite{Cook2007thr,Popeea2012} or loop detection \mcite{Atig2012term}.
Our work, originally presented in \mcite{SB2019}, is based on state-space exploration and focuses primarily on nontermination caused by thread communication -- it can handle arbitrary synchronization primitives but does not handle symbolic data so far.
Furthermore, one of its main novelties is that it can detect nonterminating parts of programs with user-defined granularity (e.g., a specified function or its part).
\section{Supplementary Materials}
Supplementary materials for this work can be found on \href{http://vstill.eu/phd}{vstill.eu/phd}.
This web page contains links to the relevant publications this work is based on and to their respective supplementary pages.
\section{Thesis Structure}
After this introduction, \autoref{chap:preliminaries} gives the preliminaries
need for the rest of the work, including an introduction to program analysis, parallelism and relaxed memory models, C++, and DIVINE.
\autoref{chap:stateoftheart} presents state of the art in the program analysis
of realistic parallel programs.
The next three chapters present the main contributions of this work: analysis of
realistic programs and C++ exceptions in \autoref{chap:lang}, support for the
\xtso memory model in \autoref{chap:mm}, and nontermination detection in
\autoref{chap:lnterm}.
\autoref{chap:conclusion} then concludes the main body of this work.
The list of my published results can be found in \autoref{chap:published}.
% vim: colorcolumn=80 expandtab sw=4 ts=4 spell spelllang=en