-
Notifications
You must be signed in to change notification settings - Fork 17
/
defective.fqa
529 lines (439 loc) · 38.1 KB
/
defective.fqa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
Defective C++
{}
This page summarizes the major defects of the C++ programming language
(listing all minor quirks would take eternity). To be fair, some of the items /by themselves/
could be design choices, not bugs. For example, a programming language doesn't have to provide garbage collection.
It's the /combination/ of the things that makes them /all/ problematic.
For example, the lack of garbage collection makes C++ exceptions and operator overloading inherently defective. Therefore, the problems are not listed in the order of "importance"
(which is subjective anyway - different people are hit the hardest by different problems).
Instead, most defects are followed by one of their complementary defects, so that when a defect causes
a problem, the next defect in the list makes it worse.
`<!-- h2toc -->`
`<h2>No compile time encapsulation</h2>`
In naturally written C++ code, changing the private members of a class requires [7.4 recompilation] of
the code using the class. When the class is used to instantiate member objects of other classes,
the rule is of course applied recursively.
This makes C++ interfaces very unstable - a change
invisible at the interface level still requires to rebuild the calling code, which can be
very problematic when that code is not controlled by whoever makes the change.
So shipping C++ interfaces to customers can be [6.3 a bad idea].
Well, at least when all relevant code is controlled by the same team of people, the only problem is the frequent
rebuilds of large parts of it. This wouldn't be too bad by itself with almost any language,
but C++ has...
`<h2>Outstandingly complicated grammar</h2>`
"Outstandingly" should be interpreted literally, because /all popular languages/ have [http://en.wikipedia.org/wiki/Context-free_grammar context-free]
(or "nearly" context-free) grammars, while C++ has [http://en.wikipedia.org/wiki/Undecidability undecidable] grammar.
If you like compilers
and parsers, you probably know what this means. If you're not into this kind of thing, there's a
[10.19 simple example] showing the problem with parsing C++: is |AA BB(CC);| an object definition
or a function declaration? It turns out that the answer depends heavily on the code /before/ the statement -
the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.
In practice, this means three things. First, C++ compiles slowly
(the complexity takes time to deal with). Second, when it doesn't compile,
the error messages are frequently incomprehensible
(the smallest error which a human reader wouldn't notice completely confuses the compiler).
And three, parsing C++ right is very hard, so different compilers will interpret
it differently, and tools like debuggers and IDEs periodically get awfully confused.
And /slow compilation/ interacts badly with /frequent recompilation/. The latter is caused
by the lack of encapsulation mentioned above, and the problem is amplified by the fact
that C++ has...
`<h2>No way to locate definitions</h2>`
OK, so before we can parse |AA BB(CC);|, we need to find out whether |CC| is defined as an object or a type.
So let's locate the definition of |CC| and move on, right?
This would work in most modern languages, in which |CC| is either defined in the same module
(so we've already compiled it), or it is imported from another module (so either we've already compiled it, too,
or this must be the first time we bump into that module - so let's compile it now, /once/, but of course /not/
the next time we'll need it). So to compile a program, we need to compile each module, once,
no matter how many times each module is used.
In C++, things are [23.10 different] - there are no modules. There are /files/, each of which can contain many
different definitions or just small parts of definitions, and there's /no way/ to tell in which
files |CC| is defined, or which files must be parsed in order to "understand" its definition.
So who is responsible to arrange all those files into a sensible string of C++ code?
/You/, of course! In each compiled file, you |#include| a bunch of header files
(which themselves include other files); the |#include| directive basically issues
a copy-and-paste operation to the C preprocessor, inherited by C++ without changes.
The compiler then parses the result of all those copy-and-paste operations. So to compile a program,
we need to compile each file /the number of times it is used in other files/.
This causes two problems. First, it multiplies the long time it takes to compile C++ code
by the number of times it's used in a program. Second, the only way to figure out what /should/ be recompiled after a change to the code is to
check which of the |#include| files have been changed since the last build. The set of files to rebuild generated by this
inspection is usually a superset of the files that /really/ must be recompiled according to the
C++ rules of dependencies between definitions. That's because most files |#include| definitions
they don't really need, since people can't spend all their time removing redundant inclusions.
Some compilers support "precompiled headers" - saving the result of the parsing of "popular"
header files to some binary file and quickly loading it instead of recompiling from scratch.
However, this only works well with definitions that almost never change, typically third-party libraries.
And now that you've waited all that time until your code base recompiles, it's time to run and test the program,
which is when the next problem kicks in.
`<h2>No run time encapsulation</h2>`
Programming languages have rules defining "valid" programs - for example, a valid program shouldn't
divide by zero or access the 7th element of an array of length 5. A valid program isn't necessarily correct
(for example, it can delete a file when all you asked was to move it). However, an invalid program is necessarily
incorrect (there is no 7th element in the 5-element array). The question is, what happens when an
invalid program demonstrates its invalidity by performing a meaningless operation?
If the answer is something like "an exception is raised", your program runs in a managed environment.
If the answer is "anything can happen", your program runs somewhere else. In particular, C and C++
are not designed to run in managed environments (think about pointer casts), and while in theory
they could run there, in practice all of them run elsewhere.
So what happens in a C++ program with the 5-element array? Most frequently, you access something
at the address that /would/ contain the 7th element, but since there isn't any, it contains something
else, which just happens to be located there. Sometimes you can tell from the source code what that is,
and sometimes you can't. Anyway, you're /really/ lucky if the program crashes; because if it keeps
running, you'll have hard time understanding why it ends up crashing or misbehaving /later/.
If it doesn't scare you (you debugged a couple of buffer overflows and feel confident), wait until you get to many megabytes
of machine code and many months of execution time. That's when the real fun starts.
Now, the ability of a piece of code to modify a random object when in fact it tries to access
an unrelated array indicates that [7.2 C++ has no run time encapsulation]. Since it doesn't
have compile time encapsulation, either, one can wonder why it calls itself object-oriented. Two
possible answers are warped perspective and marketing
(these aren't mutually exclusive).
But if we leave the claims about being object-oriented aside, the fact that a language runs
in unmanaged environments can't really be called a "bug". That's because managed environments
check things at run time to prevent illegal operations, which translates to a certain (though frequently overestimated)
performance penalty. So when performance isn't that important, a managed environment is the way to go.
But when it's critical, you just have to deal
with the difficulties in debugging. However, C++ (compared to C, for example) makes /that/ much harder that it already has to be, because there are...
`<h2>No binary implementation rules</h2>`
When an invalid program finally crashes (or enters an infinite loop, or goes to sleep forever), what you're left
with is basically the binary snapshot of its state (a common name for it is a "core dump"). You have to make sense of it in order to find the
bug. Sometimes a debugger will show you the call stack at the point of crash; frequently that information is
overwritten by garbage. Other things which can help the debugger figure things out may
be overwritten, too.
Now, figuring out the meaning of partially corrupted memory snapshots is definitely not the most pleasant way to spend one's
time. But with unmanaged environments you /have/ to do it and it /can/ be done, /if/ you know how your
source code maps to binary objects and code. Too bad that with C++, there's a ton of these rules and
each compiler uses different ones. Think about exception handling or various kinds of inheritance or
virtual functions or the layout of standard library containers. In C, there's no standard binary
language implementation rules, either, but it's an order of magnitude simpler and in practice compilers
use the same rules. Another reason making C++ code hard to debug is the above-mentioned complicated grammar,
since debuggers frequently can't deal with many language features
(place breakpoints in templates, parse pointer casting commands in data display windows, etc.).
The lack of a standard ABI (application binary interface) has another consequence - it makes shipping
C++ interfaces to other teams \/ customers impractical since the user code won't work unless it's compiled
with the same tools and build options. We've already seen another source of this problem - the instability
of binary interfaces due to the lack of compile time encapsulation.
The two problems - with debugging C++ code and with using C++ interfaces - don't show up until your project
grows complicated in terms of code and \/ or human interactions, that is, until it's too late.
But wait, couldn't you deal with both problems programmatically? You could generate C or other
wrappers for C++ interfaces /and/ write programs automatically shoveling through core dumps and deciphering
the non-corrupted parts, using something called reflection.
Well, actually, you couldn't, not in a reasonable amount of time - there's...
`<h2>No reflection</h2>`
It is impossible to programmatically iterate over the methods or the attributes or the base classes of a class in a
portable way defined by the C++ standard. Likewise, it is impossible to programmatically determine the type of an object
(for dynamically allocated objects, this can be justified to an extent by performance penalties of RTTI, but
not for statically allocated globals, and if you could /start/ at the globals, you could decipher lots
of memory pointed by them). Features of this sort - when a program can access the structure of programs,
in particular, its own structure - are collectively called reflection, and C++ doesn't have it.
As mentioned above, this makes generating wrappers for C++ classes and shoveling through memory snapshots a pain,
but that's a small fraction of the things C++ programmers are missing due to this single issue.
Wrappers can be useful not only to work around the problem of shipping C++ interfaces - you could automatically handle things
like remote procedure calls, logging method invocations, etc. A very common application of reflection is
[15.14 serialization] - converting objects to byte sequences and vice versa. With reflection, you can
handle it for all types of objects with the same code - you just iterate over the attributes of compound objects,
and only need special cases for the basic types. In C++, you must maintain serialization-related code
and\/or data structures for every class involved.
But perhaps we could deal with /this/ problem programmatically then? After all, debuggers do manage
to display objects somehow - the debug information, emitted in the format supported by your tool chain,
describes the members of classes and their offsets from the object base pointer and all that sort of meta-data.
If we're stuck with C++, perhaps we could parse this information and thus have non-standard, but working reflection?
Several things make this pretty hard - not all compilers can produce debug information /and/
optimize the program aggressively enough for a release build,
not all debug information formats are documented, and then in C++, we have a...
`<h2>Very complicated type system</h2>`
In C++, we have standard and compiler-specific built-in types, structures, enumerations, unions, classes with single,
multiple, virtual and non-virtual inheritance, |const| and |volatile| qualifiers, pointers, references and arrays,
|typedef|s, global and member functions and function pointers, and /templates/, which can have specializations on (again) /types/ (or integral constants),
and you can "partially specialize" templates by /pattern matching their type structure/
(for example, have a specialization for |std::vector<MySillyTemplate<T> >| for arbitrary values of |T|), and each template can have base classes
(in particular, it can be /derived from its own instantiations recursively/, which is a /well-known practice documented
in books/), and inner |typedef|s, and... We have lots of kinds of types.
Naturally, representing the types used in a C++ program, say, in debug information, is not an easy task.
A trivial yet annoying manifestation of this problem is the expansion of |typedef|s done by debuggers when they show
objects (and compilers when they produce error messages - another reason why these are so cryptic). You may think it's a
|StringToStringMap|, but only until the tools enlighten you - it's actually more of a...
@
\/\/ don't read this, it's impossible. just count the lines
std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> >
>, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>,
std::allocator<char> > const, std::basic_string<char, std::char_traits<char>,
std::allocator<char> > > > >
@
But wait, there's more! C++ supports a wide variety of explicit and implicit /type conversions/, so now we
have a nice set of rules describing the cartesian product of all those types, specifically, how conversion should
be handled for each pair of types. For example, if your function accepts |const std::vector<const char*>&|
(which is supposed to mean "a reference to an immutable vector of pointers to immutable built-in strings"), and I have
a |std::vector<char*>| object ("a mutable vector of mutable built-in strings"), then
[18.1 I can't pass it to your function] because the types aren't convertible.
You /have/ to admit that it /doesn't make any sense/, because your function guarantees
that it won't change anything, and I guarantee that I don't even mind having anything changed,
and still the C++ type system gets in the way and the only sane workaround is to /copy the vector/.
And this is an /extremely simple/ example - no [25.13 virtual inheritance], no user-defined conversion operators, etc.
But conversion rules by themselves are still not the worst problem with the complicated type system. The worst problem is the...
`<h2>Very complicated type-based binding rules</h2>`
Types lie at the core of the C++ /binding rules/. "Binding" means "finding the program entity
corresponding to a name mentioned in the code".
When the C++ compiler compiles something like |f(a,b)| (or even |a+b|), it relies on the
argument types to figure out which version of |f| (or |operator+|) to call. This includes
overload resolution (is it |f(int,int)| or |f(int,double)|?), the handling of function template specializations
(is it |template<class T> void f(vector<T>&,int)| or |template<class T> void f(T,double)|?),
and the argument-dependent lookup (ADL) in order to figure out the namespace
(is it |A::f| or |B::f|?).
When the compiler "succeeds" (translates source code to object code), it doesn't mean that /you/ are equally successful
(that is, you think |a+b| called what the compiler thought it called). When the compiler "fails"
(translates source code to error messages), most humans also fail
(to understand these error messages; multiple screens listing all available overloads
of things like |operator<<| are [15.1 less than helpful]). By the way, the C++ FAQ
has very few items related to the [35.11 unbelievably complicated
static binding], like
overload resolution or ADL or template specialization. Presumably people get too depressed to ask any questions
and silently give up.
In short, the complicated type system interacts very badly with /overloading/ - having multiple
functions with the same name and having the compiler figure out which of them to use
based on the argument types
(don't confuse it with /overriding/ - |virtual| functions, though very far from perfect, do follow rules
[20.1 quite sane] by C++ standards). And probably the worst kind of overloading is...
`<h2>Defective operator overloading</h2>`
C++ [13.1 operator overloading] has all the problems of C++ function overloading (incomprehensible overload resolution rules),
and then some. For example, overloaded operators have to return their results by value - naively returning references
to objects allocated with |new| would cause temporary objects to "leak" when code like |a+b+c| is evaluated. That's
because C++ doesn't have [16.1 garbage collection], since that, folks, is inefficient. Much better to have
your code copy massive temporary objects and hope to have them optimized out by our friend the clever compiler.
Which, of course,
[10.9 won't happen] any time soon.
Like several other features in C++, operator overloading is not necessarily a bad thing /by itself/ - it just happens to interact really badly
with other things C++. The lack of automatic memory management is one thing making operator overloading less than useful. Another such thing is...
`<h2>Defective exceptions</h2>`
Consider error handling in an overloaded operator or a [17.2 constructor]. You can't use the return value, and setting\/reading error
flags may be quite cumbersome. How about throwing an exception?
This could be a good idea in some cases if [17.1 C++ exceptions] were any good. They aren't, and can't be - as usual, because of another
C++ "feature", the oh-so-efficient manual memory management. If we use exceptions, we have to write exception-safe code
- code which frees all resources when the control is transferred from the point of failure (|throw|) to the point
where explicit error handling is done (|catch|). And the vast majority of "resources" happens to be /memory/,
which is managed manually in C++. To solve this, you are supposed to use [17.4 RAII], meaning that all pointers have to be "smart"
(be wrapped in classes freeing the memory in the destructor, and then you have to design their copying semantics, and...).
Exception safe C++ code is almost infeasible to achieve in a non-trivial program.
Of course, C++ exceptions have other flaws, following from /still other/ C++ misfeatures. For example, the above-mentioned lack
of reflection in the special case of exceptions means that when you catch an exception, you
[17.7 can't] get the call stack
describing the context where it was thrown. This means that debugging illegal pointer dereferencing may be easier
than figuring out why an exception was thrown, since a debugger /will/ list the call stack in many cases of the former.
At the bottom line, |throw\/catch| are about as useful as |longjmp\/setjmp|
(BTW, the former typically runs faster, but it's mere /existence/ makes /the rest of the code/ run slower, which is almost [17.1 never] acknowledged by C++ aficionados).
So we have two features, each with its own flaws, and no interoperability between them. This is true
for the vast majority of C++ features - most are...
`<h2>Duplicate facilities</h2>`
If you need an [10.5 array] in C++, you can use a C-like |T arr[]| or a C++ |std::vector<T>| or any of the array classes written
before |std::vector| appeared in the C++ standard. If you need a [13.6 string], use |char*| or |std::string| or
any of the pre-standard string classes. If you need to take [8.6 the address of an object],
you can use a C-like pointer, |T*|, or a C++ reference, |T&|. If you need to [10.12 initialize] an object,
use C-like aggregate initialization or C++ constructors. If you need to [15.1 print] something, you can
use a C-like |printf| call or a C++ |iostream| call. If you need to [35.1 generate] many similar definitions
with some parameters specifying the differences between them, you can use C-like macros or C++ templates. And so on.
Of course you can do the same thing in many ways in almost any language. But the C++ feature duplication is quite special.
First, the many ways to do the same thing are usually not purely syntactic options directly supported by the compiler -
you can compute |a+b| with |a-b*-1|, but that's different from having |T*| and |T&| in the same language. Second,
you probably noticed a pattern - C++ adds features duplicating functionality already in C. This is bad by itself,
because the features don't interoperate well
(you can't |printf| to an |iostream| and vice versa, code mixing |std::string| and |char*| is [13.3 littered] with casts
and calls to |std::string::c_str|, etc.). This is made even worse by the /pretty amazing/ fact that the new C++
features are actually /inferior/ to the old C ones in many aspects.
And the best part is that C++ devotees /dare/ to refer to the C features as
[6.15 evil], and frequently will actually resort to finger pointing and name calling when someone uses them in C++ code
(not to mention using plain C)! And /at the same time/ they ([6.11 falsely]) claim that C++ is compatible with C and
it's one of its strengths (why, if C is so evil?). The [6.2 real] reason to leave the C syntax in C++ was
of course marketing - there's absolutely NO technical reason to /parse C-like syntax/ in order to
/work with existing C code/ since that code can be compiled separately. For example, mixing C and [http://www.digitalmars.com/d/ the
D programming language] isn't harder than [32.1 mixing C and C++]. D is a good example since its stated goals are similar
to those of C++, but almost all other popular languages have ways to work with C code.
So IMO all that old syntax was kept for strictly commercial purposes - to market the language to non-technical
managers or programmers who should have known better and didn't understand the difference between "syntax" and "compatibility with existing code" and
simply asked whether the old code will compile with this new compiler. Or maybe they thought it would be easier
to learn a pile of new syntax when you also have the (smaller) pile of old syntax than when you have just the new syntax.
Either way, C++ got wide-spread by exploiting misconceptions.
Well, it doesn't matter anymore why they kept the old stuff. What matters is that the new stuff isn't really new, either - it's obsessively
built in ways exposing the C infrastructure underneath it. And /that/ is purely a wrong design decision, made
without an axe to grind. For example, in C++ there's...
`<h2>No high-level built-in types</h2>`
C is a pretty low-level language. Its atomic types are supposed to fit into machine registers
(usually one, sometimes two of them). The compound types are designed to occupy a flat chunk of memory
with of a size known at compile time.
This design has its virtues. It makes it relatively easy to estimate the performance & resource consumption of code.
And when you have hard-to-catch low-level bugs, which sooner or later happens in unmanaged environments, having
a relatively simple correspondence between source code definitions and machine memory helps to debug the problem.
However, in a high-level language, which is supposed to be used when the development-time-cost \/ execution-time-cost ratio
is high, you need things like resizable arrays, key-value mappings, integers that don't overflow and other
such gadgets. Emulating these in a low-level language is possible, but is invariably painful since the tools
don't understand the core types of your program.
C++ doesn't add any built-in types to C /[corr.2 (correction)]/. All higher-level types must be implemented as user-defined classes and templates,
and this is when the defects of C++ classes and templates manifest themselves in their full glory.
The lack of syntactic support for higher-level types
(you can't initialize |std::vector| with |{1,2,3}| or initialize an |std::map| with something like |{"a":1,"b":2}|
or have large integer constants like |3453485348545459347376|) is the small part of the problem.
Cryptic multi-line or /multi-screen/ [35.17 compiler error messages],
debuggers that can't display the standard C++ types and [35.12 slow build times] unheard of anywhere outside of the C++ world
are the larger part of the problem. For example, here's a simple piece of code using the C++ standard library followed by an error message
produced from it by gcc 4.2.0. Quiz: what's the problem?
@
\/\/ the code
typedef std::map<std::string,std::string> StringToStringMap;
void print(const StringToStringMap& dict) {
for(StringToStringMap::iterator p=dict.begin(); p!=dict.end(); ++p) {
std::cout << p->first << " -> " << p->second << std::endl;
}
}
\/\/ the error message
test.cpp: In function 'void print(const StringToStringMap&)':
test.cpp:8: error: conversion from
'std::_Rb_tree_const_iterator<std::pair<const std::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::basic_string<char,
std::char_traits<char>, std::allocator<char> > > >' to non-scalar type
'std::_Rb_tree_iterator<std::pair<const std::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::basic_string<char,
std::char_traits<char>, std::allocator<char> > > >' requested
@
The decision to avoid new built-in types yields other problems, such as the [17.6 ability] to throw anything, but
without the ability to /catch/ it later. |class Exception|, a built-in base class for all exception classes treated specially by the compiler,
could solve this problem with C++ exceptions (but not others). However, the most costly problem with having no
new high-level built-in types is probably the lack of easy-to-use containers. But to have those, we need more
than just new built-in types and syntax in the C++ compiler. Complicated data structures can't be manipulated
easily when you only have...
`<h2>Manual memory management</h2>`
Similarly to low-level built-in types, C++ manual memory management is inherited from C without changes
(but with the mandatory addition of duplicate syntax - |new\/delete|, which normally call |malloc\/free|
but don't [16.5 have to] do that, and of course can be [11.10 overloaded]).
Similarly to the case with low-level built-in types, what makes sense for a low-level language doesn't work
when you add higher-level features. Manual memory management is incompatible with features such as [17.1 exceptions] & [13.1 operator overloading],
and makes working with non-trivial data structures very hard, since you have to worry about the life cycles
of objects so they won't leak or die while someone still needs them.
The most common solution is [11.1 copying] -
since it's dangerous to point to an object which can die before we're done with it, make yourself a copy
and become an "owner" of that copy to control its life cycle. An "owner" is a C++ concept not represented
in its syntax; an "owner" is the object that is responsible to deallocate a dynamically allocated chunk
of memory or some other resource. The standard practice in C++ is to assign each "resource" (a fancy name for memory, most of the time)
to an owner object, which is supposed to prevent resource leaks.
What it doesn't prevent is access to dead objects; we have copying for that.
Which is slow and doesn't work when you need many pointers to /the same/ object
(for example, when you want other modules to see your modifications to the object).
An alternative solution to copying is using "smart" pointer classes, which could [16.26 emulate] automatic memory management
by maintaining [16.22 reference counts] or what-not. To implement the pointer classes for the many different types
in your program, you're encouraged to use...
`<h2>Defective metaprogramming facilities</h2>`
There are roughly two kinds of metaprogramming: code that generates other code and code that processes other code.
The second kind is practically impossible to do with C++ code - you can't reliably process source code due to the extremely
complicated grammar and you can't portably process compiled code because there's no reflection. So this section
is about the first kind - code generation.
You can generate C++ code from within a C++ program using C macros and C++ templates. If you use macros, you risk
getting [6.15 clubbed to death] by C++ fanatics. Their irrational behavior left aside, these people do have a point -
C macros are pretty lame. Too bad templates are probably even worse. They are
[35.1 limited in ways macros aren't] (however, the opposite is also true). They [35.12 compile forever].
Being the only way to do metaprogramming, they are routinely [35.2 abused] to do things they weren't designed for.
And they are a
[35.16 rats' nest] of bizarre syntactic problems.
That wouldn't necessarily be so bad if C++ didn't /rely/ on metaprogramming for doing essential programming tasks.
One reason C++ has to do so is that in C++, the common practice is to use static binding (overload resolution, etc.)
to implement polymorphism, not dynamic binding. So you can't take an arbitrary object at run time and print it,
but in many programs you /can/ take an arbitrary /type/ at compile time and print objects of this type. Here's
one common (and broken) application of metaprogramming - the ultimate purpose is to be able to print arbitrary
object /at run time/:
@
\/\/ an abstract base class wrapping objects of arbitrary types.
\/\/ there can be several such classes in one large project
struct Obj {
virtual void print(std::ostream&) const = 0;
};
template<class T> struct ObjImpl : Obj {
T wrapped;
virtual void print(std::ostream& out) const { out << wrapped; }
};
\/\/ now we can wrap int objects with ObjImpl<int> and string objects
\/\/ with ObjImpl<std::string>, store them in the same collection of Obj*
\/\/ and print the entire collection using dynamic polymorphism:
void print_them(const std::vector<Obj*>& objects) {
for(int i=0; i<(int)objects.size(); ++i) {
objects[i]->print(std::cout); \/\/ prints wrapped ints, strings, etc.
std::cout << std::endl;
}
}
@
Typically there are 10 more layers of syntax involved, but you get the idea. This sort of code doesn't really work
because it requires all relevant overloads of |operator<<| to be visible /before/ the point where |ObjImpl|
is defined, and that doesn't happen unless you routinely sort your |#include| directives according to that rule.
Some compilers will compile the code correctly with the rule violated, some will complain, some will silently generate
wrong code.
But the most basic reason to rely on the poor C++ metaprogramming features for everyday tasks
is the above-mentioned ideological decision to avoid adding high-level built-in types. For example, templates
are at the core of the...
`<h2>Unhelpful standard library</h2>`
Most things defined by the C++ standard library are templates, and relatively sophisticated ones, causing
the users to deal with quite sophisticated manifestations of the problems with templates, discussed above.
In particular, a special program called [http://www.bdsoft.com/tools/stlfilt.html STLFilt] exists
for /decrypting the error messages/ related to the C++ standard library.
Too bad it doesn't patch the debug information in a similar way.
Another problem with the standard library is all the functionality that's not there.
A large part of the library duplicates the functionality from the C standard library (which is itself available to C++ programs, too).
The main new thing is containers
("algorithms" like |max| and |adjacent_difference| don't count as [7.3 "functionality"] in my book). The standard library
doesn't support listing directories, opening GUI windows or network sockets. You may think that's because these things
are non-portable. Well, the standard library doesn't have matrices or regular expressions, either.
And when you use the standard library in your code, one reason it compiles slowly to a large binary image is
that the library extensively uses the...
`<h2>Defective inlining</h2>`
First, let's define the terms.
"Inlining" in the context of compilers refers to a technique for /implementing/ function calls
(instead of generating a sequence calling the implementation of the function, the compiler integrates that implementation
at the point where the call is made). [9.1 "Inlining" in the context of C++] refers to a way to /define/ functions in order to /enable/
(as opposed to "force") such implementation of the calls to the function (the decision whether to actually use the opportunity is made by the compiler).
Now, the major problem with this C++ way to enable inlining is that you have to place the definition of the function
in header files, and have it recompiled over and over again from source. This doesn't have to be that way - the
recompilation from source can be avoided by having higher-level object file formats
(the way it's done in [http://llvm.org LLVM] and [http://gcc.gnu.org gcc starting from version 4]). This approach -
link-time inlining - is one aspect of "whole program optimization" supported by modern compilers. But the recompilation
from source could also be avoided in simpler ways if C++ had a way to locate definitions instead of recompiling them, which,
as we've seen, it hasn't.
The crude support for inlining, designed with a traditional implementation of a C tool chain in mind,
wouldn't be as bad if it wasn't /used all the time/. People define large functions inline for two reasons.
[10.9 Some] of them "care" (emotionally) about performance, but never actually measure it, and someone told them
that inlining speeds things up, and forgot to [9.3 tell] them how it can slow them down. Another reason is that
it's simply /annoying/ to define functions non-inline, since that way, you place the full function definition
in a |.cpp| file and its prototype in a |.h| file. So you write the prototype twice, /with small changes/
(for example, if a class method returns an object of a type itself defined in the class, you'll need an extra namespace
qualification in the |.cpp| file since you're now /outside of the namespace of the class/). Much easier
to just have the body written right in the |.h| file, making the code compile more slowly and recompile
more frequently (changing the function body will trigger a recompilation).
And you don't even need to actually /write/ any inline functions to get most of their benefits! A large subset
of the inline functions of a program are...
`<h2>`Implicitly called & generated functions`</h2>`
Here's a common "design pattern" in C++ code. You have a huge class.
Sometimes there's a single pseudo-global object of this class.
In that case, you get all the /drawbacks/ of global variables because everybody has a pointer to the
thing and modifies it and expects others to see the changes. But you get no /benefits /of global variables since
the thing is allocated on the stack and when your program crashes with a buffer overflow, you can't
find the object in a debugger. And at other times there are many of these objects,
typically kept in a pseudo-global collection.
Anyway, this huge class has no constructors, no destructor and no |operator=|. Of course people create
and destroy the objects, and sometimes even assign to them. How is this handled by the compiler?
This is handled by the compiler by generating a /gigantic/ pile of code at the point where it would
call the user-defined functions with magic names (such as |operator=|) if there were any. When you /crash/ somewhere
at that point, you get to see /kilobytes/ of assembly code in the debugger, all generated
from /the same source code line/. You can then try and figure out which variable didn't like
being assigned to, by guessing where the class member offsets are in the assembly listing and looking
for symbolic names of the members corresponding to them. Or you can try and guess who forgot
all about the fact that these objects were assigned to using the "default" |operator=|
and added something like built-in pointer members to the class. Because that wouldn't work,
and could have caused the problem.
Implicit generation of functions is problematic because it slows compilation down, inflates
the program binaries and gets in the way when you debug. But the problem with /implicitly calling/ functions
(whether or not they were /implicitly generated/) is arguably even worse.
When you see code like |a=f(b,c)| (or even |a=b+c|, thanks to operator overloading), you don't
know whether the objects are passed by reference or by value (see [8.6 "information hiding"]). In the latter case, the objects are
copied with implicitly called functions; in the former case, that's possible, too, if implicit type
conversions were involved. Which means that you don't really understand what the program does
unless you know the relevant information about the relevant overloads and types.
And by the way, the fact that you can't see whether the object is passed by reference or by value
at the point of call is /another/ example of implicit stuff happening in C++.
One more problem with automatically generated functions (such as constructors and destructors)
is that they must be /regenerated/ when you add private members to a class, so changing
the private parts of a class triggers recompilation... Which brings us back to square 1.