Skip to content

Commit 6eb424d

Browse files
committed
Update debugging article.
This takes a lot of time to write, but is very interesting on applicable methods.
1 parent 5f35aaf commit 6eb424d

File tree

2 files changed

+143
-47
lines changed

2 files changed

+143
-47
lines changed

content/articles/optimal_debugging.smd

Lines changed: 68 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,24 @@
1111
This article is intended as overview of debugging techniques and motivation for
1212
uniform execution representation and setup to efficiently mix and match the
1313
appropriate technique for system level debugging with focus on statically
14-
optimizing compiler languages to keep complexity and scope limited. The author
15-
accepts the irony of such statements by "C having no ABI"/many systems in
14+
optimizing compiler languages to keep complexity and scope limited.
15+
The reader may notice that there are several documented deficits
16+
across platforms and tooling on documentation or functionality, which will be improved.
17+
The author accepts the irony of such statements by "C having no ABI"/many systems in
1618
practice having no ABI, but reality is in this text simplified for brevity and
1719
sanity.
1820

21+
Section 1 (theory) feels complete, but are planned to be more dense to
22+
become an appropriate definition for bug, debugging and debugging process.
23+
Section 2 (practical) is tailored towards non micro Kernels, which are based
24+
on process abstraction, but is currently missing content and scalability numbers
25+
for tooling.
26+
The idea is to provide understanding and numbers to estimate for system design,
27+
1 if formal proof of correctness is feasible and on what parts,
28+
2 problems and methods applicable for dynamic program analysis.
29+
Followup sections will be on speculative and more advanced ideas, which
30+
should be feasible based on numbers.
31+
1932
- 1.[Theory of debugging](#theory)
2033
- 2.[Practical methods with trade-offs](#practice)
2134
- 3.[Uniform execution representation](#uniform_execution_representation)
@@ -46,8 +59,7 @@ The process of debugging means to use static and dynamic program analysis
4659
and its automation and adaption to speed up bug (classes) elimination for the
4760
(classes of) target systems.
4861

49-
One can generally categorize methods into the following list (**asoul**)
50-
**a**utomate, **s**implify, **o**bserve, **u**nderstand, **l**earn)
62+
One can generally categorize methods into the following list [**a**utomate, **s**implify, **o**bserve, **u**nderstand, **l**earn] (**asoul**)
5163
- **a**utomate the process to minimize errors/oversights during debugging,
5264
against probabilistic errors, document the process etc
5365
- **s**implify and isolate system components and changes over time
@@ -58,7 +70,7 @@ One can generally categorize methods into the following list (**asoul**)
5870
for example user-space processes, kernel, build system, compiler, source code, linker,
5971
object code, assembly, hardware etc
6072

61-
with the fundamental constrains being (**feel**)
73+
with the fundamental constrains being [**f**inding, **ee**nsuring, **l**imited] (**feel**)
6274
- **f**inding out correct system components semantics
6375
- **ee**nsuring deterministic reproducibility of the problem
6476
- **l**imited time and effort
@@ -88,10 +100,10 @@ semantics are then typically a mix of
88100
- **Virtualisation** as **isolation or simplification** of a hardware- or software
89101
subsystem to reduce system complexity.
90102

91-
Isolation and simplification are typically applied on all potential
103+
Further, isolation and simplification are typically applied on all potential
92104
sub-components including, but not limited to hardware, code versioning
93105
including dependencies, source system, compiler framework and target system.
94-
Typical methods are
106+
Methods are usually
95107
- **Bisection** via git or the actual binaries.
96108
- **Reduction** via removal of system parts or trying to reproduce with
97109
(a minimal) example.
@@ -101,22 +113,27 @@ Typical methods are
101113
**Debugging** is domain- and design-specific and **relies on** core component(s)
102114
of **the to be debugged system to provide necessary debug functionality**.
103115
For example, software based hardware debugging relies on interfaces to
104-
the hardware like JTAG, Kernel debugging on Kernel compilation or
116+
the hardware like JTAG, kernel debugging on kernel compilation or
105117
configuration and elevated (user), user-space debugging on process and
106118
user permissions, system configuration or a child process to be debugged
107119
on Posix systems via `ptrace`.
108120

109-
It depends on many factors, for example bug classes and target systems, to what degree the process of
110-
debugging can and should be automated or optimized.
121+
Without costly hardware devices to trace and physical access to the computing unit
122+
for exact recording of the system behavior including time information,
123+
dynamic program analysis (to run the system) requires trade-offs on what
124+
program parts and aspects to inspect and collect data from.
125+
Therefore, it depends on many factors, for example bug classes and target
126+
systems, to what degree the process of debugging can and should be automated or
127+
optimized.
111128

112129
[]($section.id("practice"))
113-
### Practical methods with tradeoffs
130+
### Practical methods with trade-offs
114131

115132
Usually semantics are not "set into stone" inclusive or do not offer
116-
sufficient tradeoffs, so formal verification is rarely an option aside of
133+
sufficient trade-offs, so formal verification is rarely an option aside of
117134
usage of models as design and planning tool or for fail-safe program functionality.
118135
Depending on the domain and environment, problematic behavior of hardware
119-
or software components must be more or less 1. avoided and 2. traceable
136+
or software components must be more or less 1 avoided and 2 traceable
120137
and there exist various (domain) metrics as decision helper.
121138
Very well designed systems explain users how to debug bugs regarding to
122139
**functional behavior**, **time behavior** with **internal and
@@ -148,34 +165,55 @@ Memory and slowdown numbers are only reported for LLVM sanitizers. Zig does not
148165
report own numbers yet (2025-01-11). Slowdown for dynamic sanitizer versions
149166
increases by a factor of 10x in contrast to the listed static usage costs.
150167
The leak sanitizer does only check for memory leaks, not other system resources.
151-
Besides various Kernel specific tools to track system resources,
168+
Besides various kernel specific tools to track system resources,
152169
Valgrind can be used on Posix systems for non-memory resources and
153170
Application Verifier for Windows.
154171
Address and thread sanitizers can not be combined in Clang and combined usage
155172
of the Zig implementation is limited by virtual memory usage.
156173
In Zig, aliasing can currently not be sanitized against, whereas in Clang only
157174
typed based aliasing can be sanitized without any numbers reported by LLVM yet.
158175

159-
[TODO: requirements on system design for formal verification vs debugging.]::
160-
[no surprise rule: core system enabling debugging (in any form) must be correct]::
161-
[to the degree necessary.]::
162-
[TODO: good argumentation on ignoring linker speak, language footguns etc.]::
163-
[1.Bugs related to functional behavior.]::
164-
[2.Bugs related to time behavior.]::
165-
[3.Internal and external system resources.]::
176+
Besides adjusting source code semantics via 1 sanitizers, one can do 2 own dynamic
177+
source code adjustments or use 3 tooling that use kernel APIs to trace and optionally
178+
3.1 run-time check information or 3.2 run-time check kernel APIs and with underlying state.
179+
Kernels further may simplify access to information, for example the `proc` file
180+
system simplifies access to process information.
181+
182+
TODO list standard Kernel tracing tooling, focus on dtrace
183+
and drawback of no "works for all kernels" "trace processes"
184+
185+
TODO list standard Kernel tooling for tracing
186+
TODO 3.1 list standard tooling for checking traced information
187+
188+
The following is a list of typical problems with simple solution tactics.
189+
For simplicity no virtual machine/emulator approaches are listed, since they
190+
also affect performance and run-time behavior leading (likely) to more complex
191+
dynamic program analysis.
166192

167193
[]($section.id("uniform_execution_representation"))
168194
### Uniform execution representation
169195

170196
As it was shown before, modern languages simplify detection or elimination of
171197
memory problems and runtime detectable undefined behavior. So far undetectable
172-
undefined behavior may be detected, if backend optimizers are redesignede with
173-
according APIs. Detecting miscompilations requires strict formal reasoning of
174-
executing the source code semantics or formal verification of the compiler
175-
itself, which shall not be discussed here. This leaves hardware problems,
176-
kernel problems, resource leaks, freezes, performance problems and logic
177-
problems. TODO: what they have in common + motivation TODO: Uniform execution
178-
representation and queries over program execution.
198+
undefined behavior may be automatically reduced, if backend optimizers are
199+
redesigned with according reduction APIs.
200+
Detecting miscompilations requires strict formal reasoning of executing the
201+
source code semantics or formal verification of the compiler itself,
202+
which shall not be discussed here.
203+
This leaves hardware problems, kernel problems, resource leaks, freezes,
204+
performance problems and logic problems.
205+
206+
1. leave hardware problems out for simplicity.
207+
2. resource leaks are a special case of platform problems, because platform
208+
provides resources.
209+
Automatically tracking resource leaks requires Valgrind logic over all
210+
memory operations, reduction requires (limited) kernel object tracing.
211+
Tracing platform solutions will always have trade-offs.
212+
Complete solution tracing user process and related kernel logic is only
213+
available as dtrace with non-optimal performance.
214+
215+
TODO: (currently unused) what they have in common + motivation
216+
TODO: Uniform execution representation and queries over program execution.
179217

180218
[]($section.id("abstraction_problems"))
181219
### Abstraction problems during problem isolation
@@ -185,6 +223,7 @@ TODO: origin detection, isolation and abstraction
185223
[]($section.id("possible_implementations"))
186224
### Possible implementations
187225

188-
TODO: (query system data vs modify the system vs other) to validate approaches;
226+
TODO: (currently unused)
227+
query system data vs modify the system vs other to validate approaches;
189228
Program modification and validation language, query language and alternatives.
190229

0 commit comments

Comments
 (0)