11
11
This article is intended as overview of debugging techniques and motivation for
12
12
uniform execution representation and setup to efficiently mix and match the
13
13
appropriate technique for system level debugging with focus on statically
14
- optimizing compiler languages to keep complexity and scope limited. The author
15
- accepts the irony of such statements by "C having no ABI"/many systems in
14
+ optimizing compiler languages to keep complexity and scope limited.
15
+ The reader may notice that there are several documented deficits
16
+ across platforms and tooling on documentation or functionality, which will be improved.
17
+ The author accepts the irony of such statements by "C having no ABI"/many systems in
16
18
practice having no ABI, but reality is in this text simplified for brevity and
17
19
sanity.
18
20
21
+ Section 1 (theory) feels complete, but are planned to be more dense to
22
+ become an appropriate definition for bug, debugging and debugging process.
23
+ Section 2 (practical) is tailored towards non micro Kernels, which are based
24
+ on process abstraction, but is currently missing content and scalability numbers
25
+ for tooling.
26
+ The idea is to provide understanding and numbers to estimate for system design,
27
+ 1 if formal proof of correctness is feasible and on what parts,
28
+ 2 problems and methods applicable for dynamic program analysis.
29
+ Followup sections will be on speculative and more advanced ideas, which
30
+ should be feasible based on numbers.
31
+
19
32
- 1.[Theory of debugging](#theory)
20
33
- 2.[Practical methods with trade-offs](#practice)
21
34
- 3.[Uniform execution representation](#uniform_execution_representation)
@@ -46,8 +59,7 @@ The process of debugging means to use static and dynamic program analysis
46
59
and its automation and adaption to speed up bug (classes) elimination for the
47
60
(classes of) target systems.
48
61
49
- One can generally categorize methods into the following list (**asoul**)
50
- **a**utomate, **s**implify, **o**bserve, **u**nderstand, **l**earn)
62
+ One can generally categorize methods into the following list [**a**utomate, **s**implify, **o**bserve, **u**nderstand, **l**earn] (**asoul**)
51
63
- **a**utomate the process to minimize errors/oversights during debugging,
52
64
against probabilistic errors, document the process etc
53
65
- **s**implify and isolate system components and changes over time
@@ -58,7 +70,7 @@ One can generally categorize methods into the following list (**asoul**)
58
70
for example user-space processes, kernel, build system, compiler, source code, linker,
59
71
object code, assembly, hardware etc
60
72
61
- with the fundamental constrains being (**feel**)
73
+ with the fundamental constrains being [**f**inding, **ee**nsuring, **l**imited] (**feel**)
62
74
- **f**inding out correct system components semantics
63
75
- **ee**nsuring deterministic reproducibility of the problem
64
76
- **l**imited time and effort
@@ -88,10 +100,10 @@ semantics are then typically a mix of
88
100
- **Virtualisation** as **isolation or simplification** of a hardware- or software
89
101
subsystem to reduce system complexity.
90
102
91
- Isolation and simplification are typically applied on all potential
103
+ Further, isolation and simplification are typically applied on all potential
92
104
sub-components including, but not limited to hardware, code versioning
93
105
including dependencies, source system, compiler framework and target system.
94
- Typical methods are
106
+ Methods are usually
95
107
- **Bisection** via git or the actual binaries.
96
108
- **Reduction** via removal of system parts or trying to reproduce with
97
109
(a minimal) example.
@@ -101,22 +113,27 @@ Typical methods are
101
113
**Debugging** is domain- and design-specific and **relies on** core component(s)
102
114
of **the to be debugged system to provide necessary debug functionality**.
103
115
For example, software based hardware debugging relies on interfaces to
104
- the hardware like JTAG, Kernel debugging on Kernel compilation or
116
+ the hardware like JTAG, kernel debugging on kernel compilation or
105
117
configuration and elevated (user), user-space debugging on process and
106
118
user permissions, system configuration or a child process to be debugged
107
119
on Posix systems via `ptrace`.
108
120
109
- It depends on many factors, for example bug classes and target systems, to what degree the process of
110
- debugging can and should be automated or optimized.
121
+ Without costly hardware devices to trace and physical access to the computing unit
122
+ for exact recording of the system behavior including time information,
123
+ dynamic program analysis (to run the system) requires trade-offs on what
124
+ program parts and aspects to inspect and collect data from.
125
+ Therefore, it depends on many factors, for example bug classes and target
126
+ systems, to what degree the process of debugging can and should be automated or
127
+ optimized.
111
128
112
129
[]($section.id("practice"))
113
- ### Practical methods with tradeoffs
130
+ ### Practical methods with trade-offs
114
131
115
132
Usually semantics are not "set into stone" inclusive or do not offer
116
- sufficient tradeoffs , so formal verification is rarely an option aside of
133
+ sufficient trade-offs , so formal verification is rarely an option aside of
117
134
usage of models as design and planning tool or for fail-safe program functionality.
118
135
Depending on the domain and environment, problematic behavior of hardware
119
- or software components must be more or less 1. avoided and 2. traceable
136
+ or software components must be more or less 1 avoided and 2 traceable
120
137
and there exist various (domain) metrics as decision helper.
121
138
Very well designed systems explain users how to debug bugs regarding to
122
139
**functional behavior**, **time behavior** with **internal and
@@ -148,34 +165,55 @@ Memory and slowdown numbers are only reported for LLVM sanitizers. Zig does not
148
165
report own numbers yet (2025-01-11). Slowdown for dynamic sanitizer versions
149
166
increases by a factor of 10x in contrast to the listed static usage costs.
150
167
The leak sanitizer does only check for memory leaks, not other system resources.
151
- Besides various Kernel specific tools to track system resources,
168
+ Besides various kernel specific tools to track system resources,
152
169
Valgrind can be used on Posix systems for non-memory resources and
153
170
Application Verifier for Windows.
154
171
Address and thread sanitizers can not be combined in Clang and combined usage
155
172
of the Zig implementation is limited by virtual memory usage.
156
173
In Zig, aliasing can currently not be sanitized against, whereas in Clang only
157
174
typed based aliasing can be sanitized without any numbers reported by LLVM yet.
158
175
159
- [TODO: requirements on system design for formal verification vs debugging.]::
160
- [no surprise rule: core system enabling debugging (in any form) must be correct]::
161
- [to the degree necessary.]::
162
- [TODO: good argumentation on ignoring linker speak, language footguns etc.]::
163
- [1.Bugs related to functional behavior.]::
164
- [2.Bugs related to time behavior.]::
165
- [3.Internal and external system resources.]::
176
+ Besides adjusting source code semantics via 1 sanitizers, one can do 2 own dynamic
177
+ source code adjustments or use 3 tooling that use kernel APIs to trace and optionally
178
+ 3.1 run-time check information or 3.2 run-time check kernel APIs and with underlying state.
179
+ Kernels further may simplify access to information, for example the `proc` file
180
+ system simplifies access to process information.
181
+
182
+ TODO list standard Kernel tracing tooling, focus on dtrace
183
+ and drawback of no "works for all kernels" "trace processes"
184
+
185
+ TODO list standard Kernel tooling for tracing
186
+ TODO 3.1 list standard tooling for checking traced information
187
+
188
+ The following is a list of typical problems with simple solution tactics.
189
+ For simplicity no virtual machine/emulator approaches are listed, since they
190
+ also affect performance and run-time behavior leading (likely) to more complex
191
+ dynamic program analysis.
166
192
167
193
[]($section.id("uniform_execution_representation"))
168
194
### Uniform execution representation
169
195
170
196
As it was shown before, modern languages simplify detection or elimination of
171
197
memory problems and runtime detectable undefined behavior. So far undetectable
172
- undefined behavior may be detected, if backend optimizers are redesignede with
173
- according APIs. Detecting miscompilations requires strict formal reasoning of
174
- executing the source code semantics or formal verification of the compiler
175
- itself, which shall not be discussed here. This leaves hardware problems,
176
- kernel problems, resource leaks, freezes, performance problems and logic
177
- problems. TODO: what they have in common + motivation TODO: Uniform execution
178
- representation and queries over program execution.
198
+ undefined behavior may be automatically reduced, if backend optimizers are
199
+ redesigned with according reduction APIs.
200
+ Detecting miscompilations requires strict formal reasoning of executing the
201
+ source code semantics or formal verification of the compiler itself,
202
+ which shall not be discussed here.
203
+ This leaves hardware problems, kernel problems, resource leaks, freezes,
204
+ performance problems and logic problems.
205
+
206
+ 1. leave hardware problems out for simplicity.
207
+ 2. resource leaks are a special case of platform problems, because platform
208
+ provides resources.
209
+ Automatically tracking resource leaks requires Valgrind logic over all
210
+ memory operations, reduction requires (limited) kernel object tracing.
211
+ Tracing platform solutions will always have trade-offs.
212
+ Complete solution tracing user process and related kernel logic is only
213
+ available as dtrace with non-optimal performance.
214
+
215
+ TODO: (currently unused) what they have in common + motivation
216
+ TODO: Uniform execution representation and queries over program execution.
179
217
180
218
[]($section.id("abstraction_problems"))
181
219
### Abstraction problems during problem isolation
@@ -185,6 +223,7 @@ TODO: origin detection, isolation and abstraction
185
223
[]($section.id("possible_implementations"))
186
224
### Possible implementations
187
225
188
- TODO: (query system data vs modify the system vs other) to validate approaches;
226
+ TODO: (currently unused)
227
+ query system data vs modify the system vs other to validate approaches;
189
228
Program modification and validation language, query language and alternatives.
190
229
0 commit comments