Skip to content

Compiler

Thomas Castleman edited this page Dec 27, 2021 · 6 revisions

The compiler translates from the higher-level source language to the lower-level assembly language that the machine actually implements. The main goal of the compiler, when doing so, is to generate assembly programs that are as small as possible (due to the size constraints on programs that can be actually loaded onto the Stew 3000). To meet this end, the compiler applies a few optimizations for shrinking code size.

Source Language

The source language supported by the compiler could be described as a subset of C (dubbed "3000 C"), supporting basic features:

  • signed/unsigned integers, ASCII characters, and pointer types (all 8 bits in size)
  • arrays and strings
  • basic arithmetic / logical operators: add, subtract, multiply, divide, modulus, and, or, not, and bitwise operators
  • control flow with conditionals and loops
  • functions
  • a simple preprocessor for replacing names with expressions

For examples of programs in 3000 C, checking out the programs/source/ directory.

Runtime

The runtime for 3000 C contains implementations of functionality that is commonly needed, but big enough to keep one copy of (and not emit for each usage of that feature). For instance, the hardware does not support multiplication or division natively, so these operators are provided through runtime implementations.

However, including the entire runtime for every program (even those that do not even use the provided features) would be very wasteful. Instead, the compiler conditionally includes only the parts of the runtime that are actually depended upon by the program being compiled, and thus you incur no cost for not using these features.

If you want to check out the implementations for runtime functionality, check out the lang/compiler/runtime/ directory.

Compiler Optimizations

Currently, the compiler applies the following optimizations to shrink the size of compiled binaries.

Constant folding

Constant folding processes the AST before it is handed off to the compiler. It takes expressions made up of constants in the source program and evaluates them at compile time. As a simple example, the program

void main() {
  print(1 + 2);
}

would be effectively converted into

void main() {
  print(3);
}

before the compiler generates any code for it, so there is no longer any need to do the addition at runtime.

Dead code elimination

The compiler checks for code that is unreachable and removes it from the program before generating code (so as not to waste space with code that will never run). For example:

void main() {
  exit(0);
  print(100); // this is unreachable
}

The exit() built-in halts the program always, so there is no way for execution to reach the print statement on the next line. Thus, the compiler gives the following warning:

Warning: dead code
 --> dead-code.3000.c:4
  3 |   exit(0);
  4 |   print(100);
  5 | }
  = help: this code is unreachable, consider removing it

and leaves out code for the print statement.

Unused code elimination

Variables which are not used by the program are eliminated and thus do not take up any stack space at runtime. This applies also to variables which are used only by statements which are ultimately eliminated.

Similarly, functions which are not called anywhere in the program are left out from the generated code.

No-effect statement elimination

Statements which have no "effect" on the program (an effect being something that produces output on one of the displays, halts the program, or causes an infinite loop) are eliminated by the compiler. For instance, in the following program:

void main() {
  int x = 1;
  2 + 2;      // This addition gets eliminated
  print(x);
}

Peephole optimizations

After the compiler has generated assembly code, several "peephole" optimizations are applied to eliminate useless instructions (such as adding or subtracting 0) or to simplify certain patterns that might have been produced by the compiler.

Also, the peephole optimization stage replaces any instructions that have equivalents that take up less space. For instance, the instruction addi 1, b takes up two bytes and can be replaced with inr b which takes up only 1 byte.

The Stew 3000 features a special "zero" register, which always holds the value 0 and cannot be overwritten (in reality it is not an actual register but can be used as though it were). Many instructions which use 0 as an immediate can be made smaller by instead referencing the zero register. For instance, cmpi a, 0 takes up two bytes (the opcode and the immediate 0), where cmp a, z only takes up one (the opcode).