This repository has been archived by the owner on Sep 27, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
log04.txt
428 lines (328 loc) · 12.7 KB
/
log04.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
The last update was very exciting. Now I'm actually
reading a single word's definition from a string,
inlining all of the code into memory, and executing it.
To put it in concrete terms, this 'meow5' definition:
"meow meow meow meow meow exit"
Was turned into this in memory:
<meow word machine code>
<meow word machine code>
<meow word machine code>
<meow word machine code>
<exit word machine code>
The 'exit' word even pops the exit status code from the
stack. Between that and all of the meowing, we're
getting extremely "conCATenative" here. Sorry.
So I need to figure out what step comes next. I need to:
1. Get user input from STDIN
2. Figure out how immediate mode will work
(currently, i start in compile mode and when
that's done, I execute whatever was compiled!)
3. Create the colon ':' and semicolon ';' words to
toggle compile mode (and create word definitions!)
I would also like to have introspection and diagnostics
and visualizations as early in this project as possible!
But for now, I'm gonna stay the course towards an
absolutely minimumal proof of concept. I want to be able
to type this:
: meow5 meow meow meow meow meow exit ;
meow5
And see (something like) this:
Meow.
Meow.
Meow.
Meow.
Meow.
BYE!
$
So how about #2 and/or #3 from the list above - how
simple can the colon command be?
So I've updated the input string:
db 'meow : meow5 meow meow meow meow meow ;
meow5 exit', 0
(ignore the newline)
Which reads as:
1. call meow right now in "immediate" mode
2. : switches to compile mode and
3. store "meow5" as name
4. inline 5 meow words
5. ; writes tail (including saved name) and
6. switches back to immediate mode
7. call new meow5 word
8. exit
and have created a mode var and added imm/comp flags to
tails. todo:
[ ] colon word store name somewhere
[ ] find should also match mode flag (use &)
[ ] semicolon should write tail
[ ] immediate mode should find and exec words...somehow
Next two nights: Hmm...okay, so adding more words that
will execute as they're entered ("immediate" words) is
forcing me to deal with how they should return execution
to whatever called them.
To recap:
* Compiled code in meow5 will be concatenated
together, so there is no such thing as "return"
_within_ a compiled word - execution truly just
flows from the end of one word to the beginning of
the next.
* Many words (':' or 'colon' is an example), which
must be able to operate outside of a compiled word
because it is needed to do the compiling!
* Some words can execute _both_ ways in a single
definition. 'exit' is my only example currently -
it's simple because no part of the program needs
to execute after it's done, of course.
* A select few words will even need to be executed
from within the meow5 binary itself (in assembly)
to make the initial functionality of the
interpreter available. 'find' and 'inline' are two
such fundamental words.
* I've slowly been converting all of the traditional
procedure calls in this prototype into simple
jumps and manually keeping track of a single level
of return address.
Now the ':' command forces me to implement a return
stack for immediate execution, at the very least,
because it will need to call, for instance, 'get_token',
to get the name of the word being defined:
: meow 5 ...;
Here 'meow5' is the name of the new word.
Anyway, after sleeping on it, I think I'll solve this by
having macros to start and end a word in assembly. In
addition to taking care of the housekeeping duties of
creating the tail metadata, they'll also setup return
jumping and stack poppin'. The length of the word in the
tail will NOT include the return stuff so it won't be
included when the word is inlined.
Anyway, it makes sense in my head.
The basic word-making macros are easy enough:
%macro DEFWORD 1 ; takes name of word to make
%1:
%endmacro
%macro ENDWORD 3
end_%1:
; todo: immediate "return" goes here
tail_%1:
dd LAST_WORD_TAIL ; linked list
%define LAST_WORD_TAIL tail_%1
dd (tail_%1 - %1) ; length of word
dd %3 ; flags
db %2, 0 ; name as string
%endmacro
I tested this and I'll spare you the GDB walkthrough. It
works and I was able to execute this word from my input
string.
DEFWORD foo
mov eax, 42
ENDWORD foo, "foo", IMMEDIATE
So I'll test a call/return action with this foo, then
convert them all.
It worked. Now converting...
Worked out some bugs.
Silly little mistakes.
Here's the thing: it's getting pretty annoying to have
to bust out GDB, guess where to set a break point, step
through the code, try to remember the C-dominated syntax
to print stuff, etc., only to find out that I forgot to
add a line or I put the wrong thing in a string data
declaration.
Don't get me wrong, I'm grateful for GDB. It's been a
good tool and I know I should probably re-learn some of
its customization options.
But what I really want is better debugging in my program
itself.
So I've added "word not found" handling in the main
routine, so it goes like this:
get_next_token:
CALLWORD get_token
if all done, jump to .run_it
CALLWORD find
if not found, jump to .token_not_found
CALLWORD inline
jmp get_next_token
.run_it:
jmp data_segment
.token_not_found:
print first part of error message
print token name
print last part of error message
I'll test it out:
input_buffer_start:
db 'honk meow meow meow meow meow exit', 0
$ mr
Could not find word "honk"
Excellent, that'll save me untold minutes of debugging
right there.
Now let's see if I've converted everthing to my new
macros DEFWORD ... ENDWORD properly:
$ mr
Meow!
Meow!
Meow!
Meow!
Meow!
Meow!
Meow!
...
Oh no! I've got an infinite loop somehow.
Even though I'm putting in some of the "infrastructure"
for it, I'm not doing any immediate mode execution yet,
so it's nothing like that.
Nothing for it but to debug with GDB...
(gdb) break get_next_token.run_it
Breakpoint 1 at 0x80491c2: file meow5.asm, line 272.
...
273 jmp data_segment ; jump to the "compiled" program
0x0804a054 in data_segment ()
(gdb)
Single stepping until exit from function data_segment,
which has no line number information.
Oh, right. There's no debugger info for the machine code
I've inlined into memory and executed.
All the more reason to have debugging tools built into
my program itself. But I don't have those yet, so at
least GDB can give me a disassembly:
(gdb) disas &data_segment,&here
Dump of assembler code from 0x804a054 to 0x804a454:
0x0804a054 <data_segment+0>: mov $0x1,%ebx
=> 0x0804a059: mov $0x804a006,%ecx
0x0804a05e: mov $0x6,%edx
0x0804a063: mov $0x4,%eax
0x0804a068: int $0x80
0x0804a06a: jmp *0x804a459
0x0804a070: mov $0x1,%ebx
0x0804a075: mov $0x804a006,%ecx
0x0804a07a: mov $0x6,%edx
0x0804a07f: mov $0x4,%eax
0x0804a084: int $0x80
0x0804a086: jmp *0x804a459
... repeats three more times...
0x0804a0e0: pop %ebx
0x0804a0e1: mov $0x1,%eax
0x0804a0e6: int $0x80
0x0804a0e8: jmp *0x804a459
0x0804a0ee: add %al,(%eax)
0x0804a0f0: add %al,(%eax)
So the nice thing about 5 "meows" in a row is that the
repetition is really easy to spot.
The weird thing is that they all end with a jump back to
the exact same place near the beginning (but not exactly
at the begining) of the inlined code.
Where is that jump coming from?
Oh, ha ha, I found it almost immediately. It's the
"return" that I put in my ENDWORD macro. That's not
supposed to be inlined with the "compiled" version of
words and it's due to a silly mistake.
The last line here:
end_%1:
jmp [return_addr]
tail_%1:
dd LAST_WORD_TAIL
dd (tail_%1 - %1)
Should be:
dd (end_%1 - %1)
So the jmp [return_addr] doesn't get inlined!
I'll fix that.
And now?
(gdb) disas &data_segment,&here
Dump of assembler code from 0x804a054 to 0x804a454:
0x0804a054 <data_segment+0>: push %es
0x0804a055: mov 0x6ba0804,%al
0x0804a05a: add %al,(%eax)
0x0804a05c: add %bh,0x4(%eax)
0x0804a062: int $0x80
0x0804a064: jmp *0x804a459
0x0804a06a: push %es
0x0804a06b: mov 0x6ba0804,%al
0x0804a070: add %al,(%eax)
0x0804a072: add %bh,0x4(%eax)
0x0804a078: int $0x80
0x0804a07a: jmp *0x804a459
0x0804a080: push %es
...
What on earth? That ain't right.
Next night: ohhhh...crud. Yeah, the problem is due to
the "return" code at the end of each word. My
dirt-simple inline is going to need an additional
length: there's a distance from the tail to the
beginning of the machine code and a separate length of
the machine code. (They used to be the same thing.)
The DEFWORD macro produces this for "meow":
meow:
...
end_meow:
jmp [return_addr]
tail_meow:
...
dd (end_meow - meow)
dd (tail_meow - meow) <-- need to add this
And any other code that reads the tail (I guess that's
just 'find' right now?) will also need to be updated. I
wonder if I should be storing these "tail offsets" in
NASM macros as constants so I don't have to hunt them
down if they change in the future?
Yeah, I'll do that too. In addition to making changes
painless, it will make my intent clearer in the code
than bare offset numbers and a comment ever could.
; Memory offsets for each item in tail:
%define T_CODE_LEN 4
%define T_CODE_OFFSET 8
%define T_FLAGS 12
%define T_NAME 16
Inline is re-worked to use the length and offset of the
machine code in relation to the tail address:
DEFWORD inline
pop esi ; param1: tail of word to inline
mov edi, [here] ; destination
mov eax, [esi + T_CODE_LEN] ; get len of code
mov ebx, [esi + T_CODE_OFFSET] ; get start of code
sub esi, ebx ; set start of code for movsb
mov ecx, eax ; set len of code for movsb
rep movsb ; copy [esi]...[esi+ecx] into [edi]
add [here], eax ; save current position
ENDWORD inline, "inline", (IMMEDIATE)
Crossing fingers...
$ mr
Meow.
Meow.
Meow.
Meow.
Meow.
Yay, working again!
Now I can try to do something _new_ with these changes:
find immediate mode and compile mode words.
And to _really_ do this right, I'll use the FORTH colon
word ':' as my immediate/compile mode separator.
Here's my new "input buffer" string:
db 'meow meow : meow meow meow exit', 0
For now the definition of ':' will _just_ set the mode:
DEFWORD colon
mov dword [mode], COMPILE
ENDWORD colon, ":", (IMMEDIATE)
And I've got two different definitions of 'meow' all
ready to go. They're both called "meow" in the
dictionary, but one of them has an IMMEDIATE flag and
the other has the COMPILE flag to specify which mode
they should match. The only difference is that they
print different strings.
If all goes well, the "input buffer" string I set above
should print two immediate meows and then compile three
compile meows and an exit and then run that...
$ mr
Immediate Meow!
Immediate Meow!
Meow.
Meow.
Meow.
Wow!
So I guess I've done two of the four TODOs I set at the
start of this log above:
[ ] colon word store name somewhere
[x] find should also match mode flag (use &)
[ ] semicolon should write tail
[x] immediate mode should find and exec words...somehow
The colon word isn't storing the word name and there's
no semicolon yet, so I'm not adding the new words to the
dictionary yet, but I also made progress in other areas.
I'll start a new log now with the other two TODOs.
See you in log05.txt!