-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks! #1
Comments
Hey that's great! I'd love having updates. Also, if you have any
questions about the generated code, feel free to ask.
The structure of the compiler was driven extensively by the need to
originally fit in a PDP-11 with 56KB of memory. The sources as you see
them didn't support the PDP-11 (we forked that one off so we could work on
improved optimizations and the like), but did support other environments
with somewhat limited memory resources. Thus all of the overlays, the
various ways of accessing semantic trees and symbol tables and the like.
Looking it over brought back a lot of memories, mostly nightmares :). I'll
take 56GB over 56KB any day.
…On Thu, Mar 23, 2023 at 12:48 AM Poul-Henning Kamp ***@***.***> wrote:
I just wanted to thank you a LOT for posting these sources!
I am working on a software emulation of the Rational R1000/s400 in
datamuseum.dk, where all the programs on the 68K IO processor is compiled
using this compiler.
Being able to study the internal logic of the compiler is a great aid to
reverse-compiling those IO-programs.
Thanks a LOT!
—
Reply to this email directly, view it on GitHub
<#1>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APSGK35PC7WZZTB4XB4VGZDW5P54LANCNFSM6AAAAAAWE2KHWM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Don Baccus
|
If you want to follow along in my disassembly effort, or just want some example binaries compiled with your compiler: |
I did some consulting work for Rational in the early 1990s, oddly enough.
But I didn't even know of the Rational 1000 as I was never a fan of Ada,
and our company had no interest in writing a compiler for the language (we
did front ends for Modula 2 and C/C++ instead). I actually was working
for a competitor of Rational, Verdix, whose Ada compiler software was used
to implement the F-16 software.
I poked a bit at a couple of disassembly files but I think they were for
the ADA machine, not the 68020 I/O machine, assuming the disassembler
outputs reasonably standard M68K assembly.
…On Mon, Mar 27, 2023 at 5:53 AM Poul-Henning Kamp ***@***.***> wrote:
If you want to follow along in my disassembly effort, or just want some
example binaries compiled with your compiler:
https://datamuseum.dk/aa//r1k_dfs/M200.html
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APSGK33BKFMNZZQYUKVDKBDW6GEVBANCNFSM6AAAAAAWE2KHWM>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Don Baccus
|
OK found some m68K disassembly ...
…On Mon, Mar 27, 2023 at 12:34 PM Don Baccus ***@***.***> wrote:
I did some consulting work for Rational in the early 1990s, oddly enough.
But I didn't even know of the Rational 1000 as I was never a fan of Ada,
and our company had no interest in writing a compiler for the language (we
did front ends for Modula 2 and C/C++ instead). I actually was working
for a competitor of Rational, Verdix, whose Ada compiler software was used
to implement the F-16 software.
I poked a bit at a couple of disassembly files but I think they were for
the ADA machine, not the 68020 I/O machine, assuming the disassembler
outputs reasonably standard M68K assembly.
On Mon, Mar 27, 2023 at 5:53 AM Poul-Henning Kamp <
***@***.***> wrote:
> If you want to follow along in my disassembly effort, or just want some
> example binaries compiled with your compiler:
>
> https://datamuseum.dk/aa//r1k_dfs/M200.html
>
> —
> Reply to this email directly, view it on GitHub
> <#1 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/APSGK33BKFMNZZQYUKVDKBDW6GEVBANCNFSM6AAAAAAWE2KHWM>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
--
Don Baccus
--
Don Baccus
|
It appears that the code has been compiled with at least array bounds
checking enabled. Can't tell if range checks for variable assignments was
enabled, usually the compiler could remove a good amount of array bounds
checking if so, if variables used to access arrays used the same range
declaration as the array. i.e. 'i:1..10" used to access an array declared
"array [1..10]". Also the compiler computed ranges of variable values as
best it could which could to some degree suppress both range and array
bounds checking.
It's cool that the code was compiled with any checking left enabled at
all. Most customers weren't willing to pay the space and time penalty to
do so, despite the fact that the overhead was quite a bit lower than
existing competitor compilers at the time.
…On Mon, Mar 27, 2023 at 12:37 PM Don Baccus ***@***.***> wrote:
OK found some m68K disassembly ...
On Mon, Mar 27, 2023 at 12:34 PM Don Baccus ***@***.***> wrote:
> I did some consulting work for Rational in the early 1990s, oddly
> enough. But I didn't even know of the Rational 1000 as I was never a fan
> of Ada, and our company had no interest in writing a compiler for the
> language (we did front ends for Modula 2 and C/C++ instead). I actually
> was working for a competitor of Rational, Verdix, whose Ada compiler
> software was used to implement the F-16 software.
>
> I poked a bit at a couple of disassembly files but I think they were for
> the ADA machine, not the 68020 I/O machine, assuming the disassembler
> outputs reasonably standard M68K assembly.
>
>
> On Mon, Mar 27, 2023 at 5:53 AM Poul-Henning Kamp <
> ***@***.***> wrote:
>
>> If you want to follow along in my disassembly effort, or just want some
>> example binaries compiled with your compiler:
>>
>> https://datamuseum.dk/aa//r1k_dfs/M200.html
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#1 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/APSGK33BKFMNZZQYUKVDKBDW6GEVBANCNFSM6AAAAAAWE2KHWM>
>> .
>> You are receiving this because you commented.Message ID:
>> ***@***.***>
>>
>
>
> --
> Don Baccus
>
>
--
Don Baccus
--
Don Baccus
|
Apologies for not giving more detailed directions. All the files in the link above are 68K binaries, but the top part of each page is the output from my "un-pascal'er" code, and the unadultered assembly follows below that. The runtime "library",also known as "FS" is here: https://datamuseum.dk/aa//r1k_dfs/44/442c504ed.html - also written (mostly) in PASCAL That again runs on top of there "KERNEL" which is here: https://datamuseum.dk/aa//r1k_dfs/f1/f1bf0e801.html - hand written assembler. The "un-pascal'er" code does not exploit the information in the limit-checks yet, I'm still trying to model the stack-content correctly, but once I get to value tracking, they will provide valuable information. Next step is to classify FP-relative data according to use, hoping to build first-order function prototypes from that, and then use those prototypes to identify the actual arguments to the calls. (All the prototypes you see now are created manually) With that in place it will be time for type-propagation via calls to local variables to calls to other functions and so on. The thing which confuse me most about the compiled PASCAL code is string literals: First the compiler outputs instructions to copy the string literal from the code "segment" onto the stack, and then it calls a function (FS@0x10ddc) which copies it from there to dynamically allocated memory. Why the detour over the stack ? Is that an artifact of the Rational-adaptation of the runtime, or is there a deeper reason ? PS: R1000 assembly looks like this: https://datamuseum.dk/aa//r1k_dfs/SEG.html and it is really weird: The R1000 instructions are Ada primitives. Also: The machine is bit-oriented so types needing 13 bits only allocate 13 bits. PPS: Based on the similarity of the generated code, and the use of A5 as "origin" pointer, I hypothesize that Hewlett-Packard used your C compiler for the HP3458A and HP3245A products ? |
I think Rational may've implemented their own string package but will look
more closely. You can look at the runtime library sources here in the
libsrc directory, and much is written in Pascal. The compiler supports
standard pascal array-of-char strings and towards the end turbo-pascal
strings were added (stored as length+data, which you'll see in the data
section).
Some comments::
1. Compiler will allocate a small number of variables to D and A
registers (and floating point variables to fp registers if you have the fp
processor). They'll never be stored to memory.
2. Compiler is able to put for loop variables into registers in some cases
if #1 hasn't done so.
3. Values computed in a registers may be stored on the stack if the
register is clobbered before all uses have been executed. This generally
happens with common subexpressions. The value will be retrieved from the
stack as needed by other expressions/vars using the value. Hoisting can
lead to additional CSEs and things like multidimensional array may only be
partially hoisted so trying to reconstruct the source is not always going
to be possible.
4. When a constant expression is folded and assigned to a var, in many
cases the constant will be used rather than the variable.
5. After a procedure/function is compiled, the compiler does several
peephole optimizations. If there are registers that haven't been used, the
compiler will look for things to stuff into them, keeping in mind the cost
of storing/restoring registers at proc entry/exit. This can include things
like constants, variable addresses, etc. Variable addresses because the
code generator in general doesn't know if there's a reference to the
in-memory value (unlike those variables allocated by the optimizer to
variables it knows aren't referenced outside the proc/function).
6. As you've seen, all parameters are passed on the stack.
…On Tue, Mar 28, 2023 at 4:48 AM Poul-Henning Kamp ***@***.***> wrote:
Apologies for not giving more detailed directions.
All the files in the link above are 68K binaries, but the top part of each
page is the output from my "un-pascal'er" code, and the unadultered
assembly follows below that.
The runtime "library",also known as "FS" is here:
https://datamuseum.dk/aa//r1k_dfs/44/442c504ed.html - also written
(mostly) in PASCAL
That again runs on top of there "KERNEL" which is here:
https://datamuseum.dk/aa//r1k_dfs/f1/f1bf0e801.html - hand written
assembler.
The "un-pascal'er" code does not exploit the information in the
limit-checks yet, I'm still trying to model the stack-content correctly,
but once I get to value tracking, they will provide valuable information.
Next step is to classify FP-relative data according to use, hoping to
build first-order function prototypes from that, and then use those
prototypes to identify the actual arguments to the calls. (All the
prototypes you see now are created manually)
With that in place it will be time for type-propagation via calls to local
variables to calls to other functions and so on.
The thing which confuse me most about the compiled PASCAL code is string
literals: First the compiler outputs instructions to copy the string
literal from the code "segment" onto the stack, and then it calls a
function ***@***.***) which copies it from there to dynamically allocated
memory. Why the detour over the stack ? Is that an artifact of the
Rational-adaptation of the runtime, or is there a deeper reason ?
PS: R1000 assembly looks like this:
https://datamuseum.dk/aa//r1k_dfs/SEG.html and it is really weird: The
R1000 instructions are Ada primitives. Also: The machine is bit-oriented so
types needing 13 bits only allocate 13 bits.
PPS: Based on the similarity of the generated code, and the use of A5 as
"origin" pointer, I hypothesize that Hewlett-Packard used your C compiler
for the HP3458A and HP3245A products ?
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APSGK37L3TTGTELSWE74323W6LFZTANCNFSM6AAAAAAWE2KHWM>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Don Baccus
|
The string is copied to the stack because it is being passed as a value
parameter. Pascal allows value parameters to be modified, so the compiler
can't pass these by reference. Pascal doesn't allow one to pass a
constant as a var parameter value either, of course. So the value is
passed on the stack. Nowadays one would look to see if a value parameter
is actually modified and allow pass-by-reference if it can be determined
that it is not. But our compiler series was designed to run on machines
with much less memory than we see today, so the compiler operated on one
procedure/function at a time with some very, very limited information to be
used when it is called, which did not extend to information on individual
parameters. There's a lot of optimizing not done by our compilers due to
memory constraints.
The compiler passes all strings by reference to the built-in library.
There is also a string package in the utility directory that similarly
implements string operations but in standard pascal (so one can avoid the
Turbo Pascal extension).
However in the disassembled code I see calls to things like "StringCat2"
and "StringDup". These do not appear in either the standard runtime
library (whose names begin with "p_" anyway), or the string package we
provided. Therefore I conclude they wrote their own.
Regarding various optimizations making the code hard to reverse-engineer,
so far as I've looked (not much) their code looks pretty simple, meaning
limited opportunities for the compiler to optimize stuff away.
…On Tue, Mar 28, 2023 at 10:23 AM Don Baccus ***@***.***> wrote:
I think Rational may've implemented their own string package but will look
more closely. You can look at the runtime library sources here in the
libsrc directory, and much is written in Pascal. The compiler supports
standard pascal array-of-char strings and towards the end turbo-pascal
strings were added (stored as length+data, which you'll see in the data
section).
Some comments::
1. Compiler will allocate a small number of variables to D and A
registers (and floating point variables to fp registers if you have the fp
processor). They'll never be stored to memory.
2. Compiler is able to put for loop variables into registers in some cases
if #1 hasn't done so.
3. Values computed in a registers may be stored on the stack if the
register is clobbered before all uses have been executed. This generally
happens with common subexpressions. The value will be retrieved from the
stack as needed by other expressions/vars using the value. Hoisting can
lead to additional CSEs and things like multidimensional array may only be
partially hoisted so trying to reconstruct the source is not always going
to be possible.
4. When a constant expression is folded and assigned to a var, in many
cases the constant will be used rather than the variable.
5. After a procedure/function is compiled, the compiler does several
peephole optimizations. If there are registers that haven't been used, the
compiler will look for things to stuff into them, keeping in mind the cost
of storing/restoring registers at proc entry/exit. This can include things
like constants, variable addresses, etc. Variable addresses because the
code generator in general doesn't know if there's a reference to the
in-memory value (unlike those variables allocated by the optimizer to
variables it knows aren't referenced outside the proc/function).
6. As you've seen, all parameters are passed on the stack.
On Tue, Mar 28, 2023 at 4:48 AM Poul-Henning Kamp <
***@***.***> wrote:
> Apologies for not giving more detailed directions.
>
> All the files in the link above are 68K binaries, but the top part of
> each page is the output from my "un-pascal'er" code, and the unadultered
> assembly follows below that.
>
> The runtime "library",also known as "FS" is here:
> https://datamuseum.dk/aa//r1k_dfs/44/442c504ed.html - also written
> (mostly) in PASCAL
>
> That again runs on top of there "KERNEL" which is here:
> https://datamuseum.dk/aa//r1k_dfs/f1/f1bf0e801.html - hand written
> assembler.
>
> The "un-pascal'er" code does not exploit the information in the
> limit-checks yet, I'm still trying to model the stack-content correctly,
> but once I get to value tracking, they will provide valuable information.
>
> Next step is to classify FP-relative data according to use, hoping to
> build first-order function prototypes from that, and then use those
> prototypes to identify the actual arguments to the calls. (All the
> prototypes you see now are created manually)
>
> With that in place it will be time for type-propagation via calls to
> local variables to calls to other functions and so on.
>
> The thing which confuse me most about the compiled PASCAL code is string
> literals: First the compiler outputs instructions to copy the string
> literal from the code "segment" onto the stack, and then it calls a
> function ***@***.***) which copies it from there to dynamically
> allocated memory. Why the detour over the stack ? Is that an artifact of
> the Rational-adaptation of the runtime, or is there a deeper reason ?
>
> PS: R1000 assembly looks like this:
> https://datamuseum.dk/aa//r1k_dfs/SEG.html and it is really weird: The
> R1000 instructions are Ada primitives. Also: The machine is bit-oriented so
> types needing 13 bits only allocate 13 bits.
>
> PPS: Based on the similarity of the generated code, and the use of A5 as
> "origin" pointer, I hypothesize that Hewlett-Packard used your C compiler
> for the HP3458A and HP3245A products ?
>
> —
> Reply to this email directly, view it on GitHub
> <#1 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/APSGK37L3TTGTELSWE74323W6LFZTANCNFSM6AAAAAAWE2KHWM>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
--
Don Baccus
--
Don Baccus
|
Oh, and you probably have figured this out, but the compiler allows the
assigning of variable to absolute memory locations, allowing memory-mapped
I/O to be done directly in Pascal.
So on the PDP-11 something like
var
device origin 1770b: record status: 0..65535; data: char end;
begin
while (200b and device.status) = 0 do;
device.data := 'a';
end.
Of course they might've done this low-level stuff in assembly.
One can also declare a procedure an interrupt procedure to handle
interrupts.
…On Tue, Mar 28, 2023 at 10:55 AM Don Baccus ***@***.***> wrote:
The string is copied to the stack because it is being passed as a value
parameter. Pascal allows value parameters to be modified, so the compiler
can't pass these by reference. Pascal doesn't allow one to pass a
constant as a var parameter value either, of course. So the value is
passed on the stack. Nowadays one would look to see if a value parameter
is actually modified and allow pass-by-reference if it can be determined
that it is not. But our compiler series was designed to run on machines
with much less memory than we see today, so the compiler operated on one
procedure/function at a time with some very, very limited information to be
used when it is called, which did not extend to information on individual
parameters. There's a lot of optimizing not done by our compilers due to
memory constraints.
The compiler passes all strings by reference to the built-in library.
There is also a string package in the utility directory that similarly
implements string operations but in standard pascal (so one can avoid the
Turbo Pascal extension).
However in the disassembled code I see calls to things like "StringCat2"
and "StringDup". These do not appear in either the standard runtime
library (whose names begin with "p_" anyway), or the string package we
provided. Therefore I conclude they wrote their own.
Regarding various optimizations making the code hard to reverse-engineer,
so far as I've looked (not much) their code looks pretty simple, meaning
limited opportunities for the compiler to optimize stuff away.
On Tue, Mar 28, 2023 at 10:23 AM Don Baccus ***@***.***> wrote:
> I think Rational may've implemented their own string package but will
> look more closely. You can look at the runtime library sources here in the
> libsrc directory, and much is written in Pascal. The compiler supports
> standard pascal array-of-char strings and towards the end turbo-pascal
> strings were added (stored as length+data, which you'll see in the data
> section).
>
> Some comments::
>
> 1. Compiler will allocate a small number of variables to D and A
> registers (and floating point variables to fp registers if you have the fp
> processor). They'll never be stored to memory.
>
> 2. Compiler is able to put for loop variables into registers in some
> cases if #1 hasn't done so.
>
> 3. Values computed in a registers may be stored on the stack if the
> register is clobbered before all uses have been executed. This generally
> happens with common subexpressions. The value will be retrieved from the
> stack as needed by other expressions/vars using the value. Hoisting can
> lead to additional CSEs and things like multidimensional array may only be
> partially hoisted so trying to reconstruct the source is not always going
> to be possible.
>
> 4. When a constant expression is folded and assigned to a var, in many
> cases the constant will be used rather than the variable.
>
> 5. After a procedure/function is compiled, the compiler does several
> peephole optimizations. If there are registers that haven't been used, the
> compiler will look for things to stuff into them, keeping in mind the cost
> of storing/restoring registers at proc entry/exit. This can include things
> like constants, variable addresses, etc. Variable addresses because the
> code generator in general doesn't know if there's a reference to the
> in-memory value (unlike those variables allocated by the optimizer to
> variables it knows aren't referenced outside the proc/function).
>
> 6. As you've seen, all parameters are passed on the stack.
>
>
> On Tue, Mar 28, 2023 at 4:48 AM Poul-Henning Kamp <
> ***@***.***> wrote:
>
>> Apologies for not giving more detailed directions.
>>
>> All the files in the link above are 68K binaries, but the top part of
>> each page is the output from my "un-pascal'er" code, and the unadultered
>> assembly follows below that.
>>
>> The runtime "library",also known as "FS" is here:
>> https://datamuseum.dk/aa//r1k_dfs/44/442c504ed.html - also written
>> (mostly) in PASCAL
>>
>> That again runs on top of there "KERNEL" which is here:
>> https://datamuseum.dk/aa//r1k_dfs/f1/f1bf0e801.html - hand written
>> assembler.
>>
>> The "un-pascal'er" code does not exploit the information in the
>> limit-checks yet, I'm still trying to model the stack-content correctly,
>> but once I get to value tracking, they will provide valuable information.
>>
>> Next step is to classify FP-relative data according to use, hoping to
>> build first-order function prototypes from that, and then use those
>> prototypes to identify the actual arguments to the calls. (All the
>> prototypes you see now are created manually)
>>
>> With that in place it will be time for type-propagation via calls to
>> local variables to calls to other functions and so on.
>>
>> The thing which confuse me most about the compiled PASCAL code is string
>> literals: First the compiler outputs instructions to copy the string
>> literal from the code "segment" onto the stack, and then it calls a
>> function ***@***.***) which copies it from there to dynamically
>> allocated memory. Why the detour over the stack ? Is that an artifact of
>> the Rational-adaptation of the runtime, or is there a deeper reason ?
>>
>> PS: R1000 assembly looks like this:
>> https://datamuseum.dk/aa//r1k_dfs/SEG.html and it is really weird: The
>> R1000 instructions are Ada primitives. Also: The machine is bit-oriented so
>> types needing 13 bits only allocate 13 bits.
>>
>> PPS: Based on the similarity of the generated code, and the use of A5 as
>> "origin" pointer, I hypothesize that Hewlett-Packard used your C compiler
>> for the HP3458A and HP3245A products ?
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#1 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/APSGK37L3TTGTELSWE74323W6LFZTANCNFSM6AAAAAAWE2KHWM>
>> .
>> You are receiving this because you commented.Message ID:
>> ***@***.***>
>>
>
>
> --
> Don Baccus
>
>
--
Don Baccus
--
Don Baccus
|
Do not let the names I have used for various functions confuse you: They are my best guesses, many from before I found out that this was PASCAL code. |
Ah, OK, that makes more sense. How did you find out it was Pascal code?
…On Thu, Mar 30, 2023 at 9:52 AM Poul-Henning Kamp ***@***.***> wrote:
Do not let the names I have used for various functions confuse you: They
are my best guesses, many from before I found out that this was PASCAL code.
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APSGK34BBNWBUTEYB5JPFYTW6W26VANCNFSM6AAAAAAWE2KHWM>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Don Baccus
|
I found error messages on the form "PASCAL error #" :-) Recently I got hold of Wayne Meretsky and he told me they used "Oregon Software Pascal-1 or Pascal-2 (I don't recall) running on RSX-11 on a PDP-11 development system." and a bit of searching brought me here :-) |
Excellent! These are the only sources remaining of what was once a large
compiler system.
Front-ends: Pascal, Modula-2, C/C++
Back-ends: PDP-11, M68K, NS32K, i386, VAX, SPARC, some obscure Honeywell
mini that we did under contract.
and a whole bunch of operating systems.
All designed and about 1/2 written by me.
So an old customer bought sources to the VAX/VMS->M68K cross compiler and
library and someone who had worked there had an old 9-track VAX/VMS backup
format tape of them and found me and sent them to me.
For which I'm extremely grateful because I thought all of that work from my
past had disappeared forever.
Glad you're finding the sources useful.
…On Thu, Mar 30, 2023 at 2:17 PM Poul-Henning Kamp ***@***.***> wrote:
I found error messages on the form "PASCAL error #" :-)
Recently I got hold of Wayne Meretsky and he told me they used "Oregon
Software Pascal-1 or Pascal-2 (I don't recall) running on RSX-11 on a
PDP-11 development system." and a bit of searching brought me here :-)
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APSGK32RY2CMEFVEN4R43BDW6XZ5VANCNFSM6AAAAAAWE2KHWM>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Don Baccus
|
I just wanted to thank you a LOT for posting these sources!
I am working on a software emulation of the Rational R1000/s400 in datamuseum.dk, where all the programs on the 68K IO processor is compiled using this compiler.
Being able to study the internal logic of the compiler is a great aid to reverse-compiling those IO-programs.
Thanks a LOT!
The text was updated successfully, but these errors were encountered: