This repository has been archived by the owner on Sep 27, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathlog02.txt
534 lines (403 loc) · 14.1 KB
/
log02.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
So log01.txt concluded with a nice little demonstration
of programatically inlining machine code at runtime to
"compile" a program and run it.
The next step is to start to turn this into an actual
language by creating headers for words (I've decided
I'll use the Forth term "word" to refer to the functions
we create in this language).
[x] Look up word length from header so it doesn't
have to be manually created and sent to the
inline function.
[x] Look up word by stored ASCII name in header at
runtime. That'll be exciting. I'll practically
have a programming language at that point.
I think I'll use a linked list of words like many
traditional Forths, since that's what I learned how to
implement in my JONESFORTH port, nasmjf.
Note: I added design-notes.txt to this repo because I
have been having some on-going thoughts about how
to implement this program as a whole, but they're
not things I can act upon right away and I don't
want to have to come back here searching in these
logs to find them (or worse, forget about them
entirely!)
Okay, now I've got #1 from above list working. Instead
of a "header", I've got "tails" at the end of my words.
Ha ha, cats have tails. So this just keeps getting
better.
I did it that way because then it becomes trivial to get
the length of the machine code. Here's the definition of
the exit word now, with its tail:
exit:
mov ebx, 0 ; exit with happy 0
mov eax, SYS_EXIT
int 0x80
exit_tail:
dd 0 ; null link is end of linked list
dd (exit_tail - exit) ; len of machine code
db "exit", 0 ; name, null-terminated
So now I don't have to give the length of the word's
machine code to inline anymore, just the tail address.
inline gets the stored length and does all the rest!
Here's the new inline:
; inline function!
; input: esi - tail of the word to inline
inline:
mov edi, [here] ; destination
mov ecx, [esi + 4] ; get len into ecx
sub esi, ecx ; sub len from esi (start of code)
rep movsb ; movsb copies from esi to esi+ecx into edi
add edi, ecx ; update here pointer...
mov [here], edi ; ...and store it
ret
Still not too complicated. And I think this might even
be its final form?
Let's see if this works...
Program received signal SIGSEGV, Segmentation fault.
inline () at meow5.asm:67
67 rep movsb
Darn it.
Oh, wait! It was inlining the meows just fine, it was
doing exit that failed. I simply hadn't updated it to
point to the tail yet. Simple mistake:
; inline exit
mov esi, exit <---- oops!
call inline
needs to be:
; inline exit
mov esi, exit_tail
call inline
How about now...
$ mrun
Meow.
Meow.
Meow.
Meow.
Meow.
Awesome! Guess I can start making it find words by ASCII
name in the tails, searching by linked list. Very
exciting progress tonight!
I've got two more todos:
[x] Add tails to anything that should be a word
[ ] Make all words take params from the stack, not
from pre-defined registers. Yes, we're losing
some speed by going to main memory, but I have
a feeling the stack is surely in CPU cache most
of the time? I should look that up someday...
So I'm going to call my word that looks up other words
by string name by searching through a linked list of
words 'find', just like in Forth. (Well, except there
it's FIND, of course.)
Two nights later: I've written the 'find' word and added
tails to all of my words so far. But I've got a
segfault:
dave@cygnus~/meow5$ mr
./build.sh: line 33: 1966 Segmentation fault ./$F
So it's GDB time:
dave@cygnus~/meow5$ mb
Reading symbols from meow5...
...
143 push temp_meow_name ; the name string to find
144 call find ; answer will be in eax
81 pop eax ; first param from stack!
84 mov ecx, [last]
86 cmp ecx, 0 ; a null pointer (0) is end of list
87 je .not_found
93 lea edx, [ecx + 8] ; set dictionary name pointer
94 mov ebx, eax ; (re)set name to find pointer
Okay, so here's where I'm comparing the search string to
be found against the first (well, last) word's name in
the linked list ("dictionary"). So let's see if I got
the name from the dictionary entry's "tail" correctly.
Oh, and here's my comment block from 'find' explaining
the register use:
; input:
; stack -> eax
; register use:
; eax - start of null-terminated name to find
; ebx - name to find byte pointer
; ecx - dictionary list pointer
; edx - dictionary name byte pointer
The first thing in the tail should be a link to the next
word in the dictionary. The ecx register should have that
link:
(gdb) x/a $ecx
0x804908d <find_tail>: 0x8049052 <inline_tail>
Yup! That's right. The next word is 'inline'.
The next thing is the length of the word's machine code:
(gdb) x/dw $ecx+4
0x8049091: 39
39 bytes seems reasonable. Okay, the next thing should be
the null-terminated string of the word name:
(gdb) x/s $ecx+8
0x8049095: "find"
Yes!
And have I correctly pointed to the first byte of this
string in the edx register?
(gdb) x/s $edx
0x8049095: "find"
Wow, also yes!
Okay, so the next thing to confirm is that I have the
address of the string to match in register eax:
(gdb) x/a $eax
0x80490c1 <inline_a_meow+10>: 0x74e8308b
Oops! That's not right. That's an address somewhere in my
loop that inlines meow five times...
I see it now!
143 push temp_meow_name ; the name string to find
144 call find ; answer will be in eax
81 pop eax ; first param from stack!
I forgot that 'call' will push the return address onto
the stack. Which is why I can't just pop my parameter.
I need to use the stack pointer and an offset to get the
value...
I use arrays as stacks all the time in higher level
languages, so a PUSH and POP are second nature to me.
But I must confess that in an assembly language context,
I get super confused by terms like "top", "bottom" and
"low" and "high".
So I prefer to make all of this SUPER CONCRETE. Here's
my own personal explanation:
push eax ; containing 0xAAA
push ebx ; containing 0xBBB
push ecx ; containing 0xCCC
push edx ; containing 0xDDD
pop edx
pop ecx
The Stack:
----------
0xAAA <-- esp + 4
0xBBB <-- esp
0xCCC <-- esp - 4
0xDDD <-- esp - 8
Heck, I'm gonna verify that for myself right now with
all of you watching:
(gdb) s
125 mov eax, 0xAAA
126 mov ebx, 0xBBB
127 mov ecx, 0xCCC
128 mov edx, 0xDDD
129 push eax
130 push ebx
131 push ecx
132 push edx
133 pop edx
134 pop ecx
(gdb) x $esp + 4
0xffffd77c: 0x00000aaa
(gdb) x $esp
0xffffd778: 0x00000bbb
(gdb) x $esp - 4
0xffffd774: 0x00000ccc
(gdb) x $esp - 8
0xffffd770: 0x00000ddd
Whew! At least I've got that much right. :-)
So my fix is this:
mov eax, [esp + 4] ; first param from stack!
And now let's see what we've got in eax:
(gdb) x/a $eax
0x804a006 <temp_meow_name>: 0x776f656d
Perfect. And $ebx should be the same to begin with:
(gdb) x/a $ebx
0x804a006 <temp_meow_name>: 0x776f656d
Yup. Good so far.
...wait. This next line isn't right:
96 cmp edx, ebx
What am I doing? I'm comparing the two addresses here,
not the characters they point to. Even worse, I can't
compare two pointed-to *values* at the same time. I need
to actually store at least one of the two *values* to
compare in a register!
Sheesh. Lemme fix this up. Okay, so here's the new
register use, which I'm trying to make as conventional
as I know how...
; register use:
; al - to-find name character being checked
; ebx - start of dict word's name string
; ecx - byte offset counter (each string character)
; edx - dictionary list pointer
; ebp - start of to-find name string
And the code has changed quite a bit, so I'm gonna step
through it again:
(gdb) s
146 push temp_meow_name ; the name string to find
147 call find ; answer will be in eax
find () at meow5.asm:80
80 mov ebp, [esp + 4] ; first param from stack!
83 mov edx, [last]
find.test_word () at meow5.asm:85
85 cmp edx, 0 ; a null pointer (0) is end of list
86 je .not_found
92 lea ebx, [edx + 8] ; set dict. word name pointer
93 mov ecx, 0 ; reset byte offset counter
Okay, first the ebx register should now point to the
current dictionary word's name that we're gonna test:
(gdb) x/s $ebx
0x804909f: "find"
Good.
And the ebp register should point to the to-find name:
(gdb) x/s $ebp
0x804a006 <temp_meow_name>: "meow"
Good.
find.compare_names_loop () at meow5.asm:95
95 mov al, [ebp + ecx] ; get next to-find name byte
96 cmp al, [ebx + ecx] ; compare with next dict word byte
Now the character in byte register al should be the first
one from the to-find name "meow":
(gdb) p/c $al
$2 = 109 'm'
Good.
And the character pointed to by ebx+ecx should be the
first one from the dict word "find":
(gdb) x/c $ebx+$ecx
0x804909f: 102 'f'
Good.
And since these don't match, the jump should take us to
the next word...
97 jne .try_next_word ; found a mismatch!
find.try_next_word () at meow5.asm:102
102 mov ecx, [ecx] ; follow the tail! (linked list)
Program received signal SIGSEGV, Segmentation fault.
Oh, right. Silly me. I'm storing the dictionary word
links in the edx register now, not ecx! I missed this
one...
Okay, how about now?
find.try_next_word () at meow5.asm:103
103 mov edx, [edx] ; follow the tail! (linked list)
(gdb) x/a $edx
0x8049097 <find_tail>: 0x8049052 <inline_tail>
(gdb) s
104 jmp .test_word
That's better. Let's see if we're testing "meow" vs
"inline" now (well, 'm' vs 'i'):
(gdb) p/c $al
$1 = 109 'm'
(gdb) x/c $ebx+$ecx
0x804905a: 105 'i'
Good!
And the next word should be "meow", so 'm' vs 'm':
(gdb) p/c $al
$2 = 109 'm'
(gdb) x/c $ebx+$ecx
0x8049037: 109 'm'
98 jne .try_next_word ; found a mismatch!
99 cmp al, 0 ; both hit 0 terminator at same time
100 je .found_it
find.try_next_word () at meow5.asm:103
103 mov edx, [edx] ; follow the tail! (linked list)
What?
Oh. <facepalm> It just dropped through. I forgot the
jmp .compare_names_loop
at the end of my loop...
I'll spare you the second go where I had an infinite loop
because I had *also* forgotten to increment the ecx
register to check the next letter in the strings...
Okay, and now?
Reading symbols from meow5...
(gdb) break 97
Breakpoint 1 at 0x8049081: file meow5.asm, line 97.
1: /c $al = <error: No registers.>
(gdb) r
Starting program: /home/dave/meow5/meow5
Breakpoint 1, find.compare_names_loop () at meow5.asm:97
97 cmp al, [ebx + ecx] ; compare with next dict word byte
(gdb) display /c *($ebx + $ecx)
(gdb) display /c $al
1: /c $al = 109 'm'
2: /c *($ebx + $ecx) = 102 'f'
(gdb) c
Continuing.
Breakpoint 1, find.compare_names_loop () at meow5.asm:97
97 cmp al, [ebx + ecx] ; compare with next dict word byte
1: /c $al = 109 'm'
2: /c *($ebx + $ecx) = 105 'i'
...
1: /c $al = 109 'm'
2: /c *($ebx + $ecx) = 109 'm'
...
1: /c $al = 101 'e'
2: /c *($ebx + $ecx) = 101 'e'
...
1: /c $al = 111 'o'
2: /c *($ebx + $ecx) = 111 'o'
...
1: /c $al = 119 'w'
2: /c *($ebx + $ecx) = 119 'w'
...
1: /c $al = 0 '\000'
2: /c *($ebx + $ecx) = 0 '\000'
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
inline_a_meow () at meow5.asm:152
152 mov esi, [eax] ; putting directly in reg for now
Yay! (Not the segfault, but the apparent correct matching
of the strings.)
Now let's see what's happening once we get a match,
because clearly eax is not getting returned with a valid
word tail address...
(gdb) break find.found_it
...
Breakpoint 1, find.found_it () at meow5.asm:113
113 mov eax, ecx ; pointer to tail of dictionary word
Gah! I see it. Another ecx that should be an edx. I
could have sworn I searched for these...
Reading symbols from meow5...
(gdb) break find.found_it
Breakpoint 1 at 0x8049097: file meow5.asm, line 113.
(gdb) r
Starting program: /home/dave/meow5/meow5
Breakpoint 1, find.found_it () at meow5.asm:113
113 mov eax, edx ; pointer to tail of dictionary word
(gdb) p/a $edx
$1 = 0x804902f <meow_tail>
That's better. So yeah, we definitely found the meow
word by string. Very cool. Let's see what happens next...
(gdb) s
114 ret ; (using call/ret for now)
(gdb)
inline_a_meow () at meow5.asm:152
152 mov esi, [eax] ; putting directly in reg for now
(gdb)
153 call inline
(gdb)
inline () at meow5.asm:62
62 mov edi, [here] ; destination
Yes, very nice...
Breakpoint 1, find.found_it () at meow5.asm:113
Breakpoint 1, find.found_it () at meow5.asm:113
Breakpoint 1, find.found_it () at meow5.asm:113
Breakpoint 1, find.found_it () at meow5.asm:113
That's four more 'meow's getting inlined...
Breakpoint 1, find.found_it () at meow5.asm:113
That's the 'exit'...
113 mov eax, edx ; pointer to tail of dictionary word
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
inline () at meow5.asm:63
63 mov ecx, [esi + 4] ; get len into ecx
Wait, how did esi get the wrong value?
Oh jeez, I have these brackets around eax here:
mov esi, [eax] ; putting directly in reg for now
But I want the address in eax, not the value it's pointing
to. Yet another easy fix:
mov esi, eax ; putting directly in reg for now
You know what? I feel like this should be good now.
Let's do this:
dave@cygnus~/meow5$ mr
Meow.
Meow.
Meow.
Meow.
Meow.
Yes!
I'm now able to find words by string name in the
dictionary and "compile" them into memory and run them.
The only TODO "checkbox" I didn't check in this log was
this one:
[ ] Make all words take params from the stack, not
from pre-defined registers.
Which should be no problem. That'll be a nice easy way
to start the next log, so I'll see you in log03.txt
with that!