This repository has been archived by the owner on Sep 27, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
log03.txt
496 lines (381 loc) · 15.2 KB
/
log03.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
I left myself a nice easy one to start this log:
[x] Make all words take params from the stack, not
from pre-defined registers.
Which ought to be simple: just push the values
I need before calling the function. Then have
the function pop the values into the registers
and off we go.
+----------------------------------------------------+
| NOTE: I'm still using call/ret to use the |
| 'find' and 'inline' words when the program |
| initially runs. I've got a bit of a |
| chicken-and-egg problem here because without a |
| return, these won't seamlessly move on to the |
| next instructions when they're done and I |
| can't find or inline them because THEY are |
| 'find' and 'inline'! |
| |
| I feel like that will be solved when I've got |
| more of the interpreter or REPL in place. If |
| not, I've got a puzzle on my hands. For the |
| moment, things are just a bit...messy. |
+----------------------------------------------------+
Well, I thought that was going to be easy. I mean, it is
pretty easy. But it has a few snags I hadn't yet
considered.
It turns out, using the same stack for call/ret return
address storage AND for passing values between functions
in a truly concatenative manner gets real complicated
real quick. And since I was using call/ret temporarily
anyway, I have zero desire to do anything fancy to make
it work.
So I'm going to basically do my own return by storing a
return address and jumping to it at the end of both
'inline' and 'find'.
I'm making a new variable in BSS to hold my return
address. I only need one, not a stack, because I'm not
making any nested calls.
temp_return_addr: resb 4
I'll put in a mockup of the code to get the assembled
instruction lengths right (I hope) so I can figure out
the address we just jump back to as a "return". (I'm
pretty sure I can't just store the instruction pointer
register because that'll be a point before the "call"
jump and then I'll have an infinite loop, right?)
I'll use NASM's listing feature for that. It comes out
super wide (well, compared to the 60 columns I give
myself on my little split screen setup!), so I'll see if
I can reformat it enough to fit here:
152 00DA 68[0600] push temp_meow_name
153 00DD 66C706[0904]- mov dword [temp_return_addr], $
153 00E2 [DD000000]
154 00E6 EB8A jmp find
155 00E8 6650 push eax
The listing is so fun to look at and I find it almost
fetishisticly beautiful. I mean, I've had all of these
_questions_ about how all of this actually works and
here, if you can read them, are all of the _answers_. I
mean, I know the CPU still has secrets down below even
this machine code layer. But for the application
programmer, this is _it_. This is the bedrock upon which
we lay all of our hopes and dreams. In the hex column on
the left are the real instructions, no longer hidden by
mnemonics or symbols.
Anyway, where was I?
Oh, yeah, so '$' is NASM for "the address at the
beginning of this line". Which is very handy. And that's
exactly what's gonna be put into temp_return_addr:
0904 is little-endian for temp_return_addr at 0409
(which I could see further up my listing)
DD000000 is the address returned by $
(again, in little-endian)
And assuming the assembled code won't change, it looks
like I want my return address to be an additional...
E8 - DD
...bytes. Which, uh, I'll ask my cat to subtract for me.
Hmm. No, he purred, but was not forthcoming with the
answer. Okay, how about dc?
$ dc
16 i
e8 dd - p
dc: 'e' (0145) unimplemented
E8 DD - p
11
Okay, so dc hissed at me once for not entering the hex
values in upper case. So score one point for my cat. But
then it gave me the correct answer after that, so score
one point for dc. Looks like this match is even.
So I wanna add 11 bytes to my return addresses.
Here's the new listing:
153 00DD 66C706[0904]- mov dword [temp_return_addr], ($ + 11)
153 00E2 [E8000000]
154 00E6 EB8A jmp find
155 00E8 6650 push eax
Looks right to me, we want to jump ("ret") back to 00E8
after the jump ("call") to find.
Of course, this seems super fragile, but it's also super
temporary. Let's just see if it works...
Okay, dang it, a segfault. My changes have required
another change and now the addresses are a little
different, but the 11 bytes should still be the same:
154 000000DF 66C706[0904]- mov dword [temp_return_addr], ($ + 11)
154 000000E4 [EA000000]
155 000000E8 EB88 jmp find
156 000000EA 6650 push eax
Let's try it now:
(gdb) break find.found_it
Breakpoint 1, find.found_it () at meow5.asm:116
116 mov eax, edx ; pointer to tail of dictionary word
(gdb) p/a $edx
$1 = 0x8049030 <meow_tail>
So far so good. Now the return jump?
(gdb) s
117 jmp [temp_return_addr]
(gdb) p/a (int)temp_return_addr
$2 = 0x80490df <inline_a_meow+16>
And just where might that be, exactly?
0x080490d4 <+5>: c7 05 19 a4 04 08 df 90 04 08 movl
$0x80490df, 0x804a419
0x080490de <+15>: eb 88 jmp 0x8049068 <find>
0x080490e0 <+17>: 50 push %eax
0x080490e1 <+18>: e8 57 ff ff ff call 0x804903d <inline>
Hmmm...looks off by 1. 0x80490df points to the second
byte of the jmp find instruction...
Program received signal SIGSEGV, Segmentation fault.
0x080490df in inline_a_meow () at meow5.asm:155
155 jmp find ; answer will be in eax
Yeah. So... 12 bytes?
And to think, I waxed all poetic about the NASM listing.
I don't know how to explain the byte discrepancy. Let's
see if this works:
(gdb) break find.found_it
Breakpoint 1, find.found_it () at meow5.asm:116
116 mov eax, edx ; pointer to tail of dictionary word
(gdb) s
117 jmp [temp_return_addr]
(gdb) s
inline_a_meow () at meow5.asm:156
156 push eax ; put it on the stack for inline
Yes! But that was evidently even _more_ fragile than I'd
expected. So I'll just bit the bullet and hold my nose
and use some temporary labels. It's still quite
compact, so I'll just paste it here:
push temp_meow_name ; the name string to find
mov dword [temp_return_addr], t1
jmp find ; answer will be in eax
t1: push eax ; put it on the stack for inline
mov dword [temp_return_addr], t2
jmp inline
t2: dec byte [meow_counter]
jnz inline_a_meow
; inline exit
push temp_exit_name ; the name string to find
mov dword [temp_return_addr], t3
jmp find ; answer will be in eax
t3: push eax ; put it on the stack for inline
mov dword [temp_return_addr], t4
jmp inline
t4:
; Run!
push 0 ; push exit code to stack for exit
jmp data_segment ; jump to the "compiled" program
Does it work?
dave@cygnus~/meow5$ mr
Meow.
Meow.
Meow.
Meow.
Meow.
Yes!
One last thing, now - 'find' is still leaving its answer
in the eax register. If I have it push the answer to the
stack instead, 'inline' will pop it and have what it
needs - no need for that "push eax" beween the two
functions/words (at labels t1 and t3 above).
Now find.not_found and find.found_it push their return
values on the stack:
.not_found:
push 0 ; return 0 to indicate not found
jmp [temp_return_addr]
.found_it:
push edx ; return pointer to tail of dictionary word
jmp [temp_return_addr]
And the calls simply flow one after the other without
any explicit data passing:
jmp find
t1: mov dword [temp_return_addr], t2
jmp inline
t2: ...
And does _that_ work?
dave@cygnus~/meow5$ mr
Meow.
Meow.
Meow.
Meow.
Meow.
Yes, and now I can check that little box at the top of
this log. We're doing pure stack-based concatenative
programming now.
Next step:
[x] Parse the string "meow meow meow meow meow exit"
as a program (pretend we're already in "compile
mode" and we're gathering word tokens and
compiling them) and execute it.
It begins! Here's the string in the .data segment:
input_buffer_start:
db 'meow meow meow meow meow exit', 0
input_buffer_end:
And here's the .bss segment "variables":
token_buffer: resb 32 ; For get_token
input_buffer_pos: resb 4 ; Save position of read tokens
Yup, just 32 chars for token names (well, 31 because I'm
null-terminating the string). Hey, it's my language. Ha
ha, I can always bump this up later. But 31 is actually
quite long, you know?
abcdefghijklmnopqrstuvwxyz01234
I've created a word called 'get_token' which will do the
job of both 'WORD' and 'KEY' in Forth. And I was just
about to 'call' it to test it, but I can't bear to put
in another manual temporary label
So, it's macro time!
%macro CALLWORD 1
mov dword [return_addr], %%return_to
jmp %1
%%return_to:
%endmacro
And it should be super easy to use. First, I'll test my
temporary "manual" 'meow' and 'exit' inlines to make
sure it works. They're about to go away, but they'll
make a good test.
Look at how clean the 'exit' one is:
push temp_exit_name ; the name string to find
CALLWORD find
CALLWORD inline
But does it work?
dave@cygnus~/meow5$ mr
Meow.
Meow.
Meow.
Meow.
Meow.
First try! No way. I mean, of _course_ it worked first
try and I never doubted it would.
Okay, now let's get into this get_token function:
(gdb) break get_next_token
Breakpoint 1 at 0x804912e: file meow5.asm, line 189.
(gdb) r
Starting program: /home/dave/meow5/meow5
Breakpoint 1, get_next_token () at meow5.asm:189
189 mov dword [return_addr], %%return_to ; CALLWORD
190 jmp %1 ; CALLWORD
150 mov ebx, [input_buffer_pos] ; set input read addr
151 mov edx, token_buffer ; set output write addr
152 mov ecx, 0 ; position index
154 mov al, [ebx + ecx] ; input addr + position index
155 cmp al, 0 ; end of input?
(gdb) p/c $al
$2 = 109 'm'
Nice! So the 'm' from the first 'meow' has been
collected so far. Now the rest of the token...
(gdb) break 155
Breakpoint 2 at 0x80490c7: file meow5.asm, line 155.
(gdb) c
155 cmp al, 0 ; end of input?
(gdb) p/c $al
$3 = 101 'e'
...
$4 = 111 'o'
$5 = 119 'w'
$6 = 32 ' '
We have 'meow' and the space should be our token
separator.
155 cmp al, 0 ; end of input?
156 je .end_of_input ; yes
157 cmp al, ' ' ; token separator? (space)
158 je .return_token ; yes
170 add [input_buffer_pos], ecx ; save input position
171 mov [edx + ecx], byte 0 ; terminate str null
Looks good. Did we collect 4 characters as expected?
(gdb) p $ecx
$7 = 4
Yup. Then get_token will "return" the token string
address so 'find' can use it to find the 'meow' word:
172 push DWORD token_buffer ; return str address
173 jmp [return_addr]
219 cmp DWORD [esp], 0 ; check return without popping
220 je run_it ; all out of tokens!
189 mov dword [return_addr], %%return_to ; CALLWORD
190 jmp %1 ; CALLWORD
96 pop ebp ; first param from stack!
find () at meow5.asm:99
99 mov edx, [last]
Okay, the execution looks right. And did we pass the
address correctly on the stack?
(gdb) p $ebp
$9 = (void *) 0x804a43d <token_buffer>
Yup! And does it contain the expected 'meow' token?
(gdb) x/s $ebp
0x804a43d <token_buffer>: "meow"
Nice!
I'm going to assume 'find' and 'inline' will work
correctly. Let's see if we can get the next token from
the input string:
(gdb) c
Continuing.
Breakpoint 2, get_token.get_char () at meow5.asm:155
155 cmp al, 0 ; end of input?
Alright, we're back in get_token. This should be the
first character of the second 'meow' token:
(gdb) p/c $al
$10 = 32 ' '
Uh oh. That doesn't look right. I'll continue anyway...
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
inline () at meow5.asm:76
76 mov ecx, [esi + 4] ; get len into ecx
Yeah, that makes sense. 'find' will have
failed to find the '' token and then 'inline' crashes when
trying to read from address 0 (the null pointer return
value from 'find').
The best way to handle this is probably to ignore any
leading spaces - that will not only be useful later, it
will take care of this current character problem.
In a higher-level language, I might choose to do this
with nested logic, like so:
if (char === ' ')
if (token.len > 0)
'eat' space (move to next input char)
else
return the token
end
end
But in assembly, this all gets flattened. It's a
surprisingly interesting exercise to formulate the logic
in terms of jumps. (At least at first. I'm sure the
novelty wears off after a while.)
Anyway, here's my solution:
cmp al, ' ' ; token separator? (space)
jne .add_char ; nope! get char
cmp ecx, 0 ; yup! do we have a token yet?
je .eat_space ; no
jmp .return_token ; yes, return it
.eat_space:
inc ebx ; 'eat' space by advancing input
jmp .get_char
I'll make sure that works in GDB. I changed the input
string to:
db ' meow meow meow meow meow exit', 0
with a leading space and two spaces before the second
meow token. That'll make it easy to test:
155 cmp al, 0 ; end of input?
(gdb) p/c $al
$1 = 32 ' '
157 cmp al, ' ' ; token separator? (space)
158 jne .add_char ; nope! get char
162 cmp ecx, 0
163 je .eat_space
171 inc ebx
172 jmp .get_char
Yup! That line 171 is my "eat the leading space" action
and now we should get the 'm' in "meow" and store it:
154 mov al, [ebx + ecx] ; input addr + position index
155 cmp al, 0 ; end of input?
156 je .end_of_input ; yes
(gdb) p/c $al
$2 = 109 'm'
157 cmp al, ' ' ; token separator? (space)
158 jne .add_char ; nope! get char
175 mov [edx + ecx], al ; write character
Yeah. This is looking good.
You know what? I'm just gonna go for it:
dave@cygnus~/meow5$ mr
Meow.
Meow.
Meow.
Meow.
Meow.
Yes! So that's another item checked off!
This is really coming along.
At nearly 500 lines, this log is complete. I'll see you
in the next one, log04.txt. :-)