This is an informal specification for a computer architecture I'm designing. Details can be found in the various .md
files provided. This architecture is designed to incorporate a bunch of novel ideas not included in popular architectures. Not all will necessarily pan out as good ideas, but a designer only really has one chance with these things, as proven by the x86 architecture which is still stuck with long-outdated design philosophies and unused instructions. This repository is a work in progress and is subject to modification at any time. Do not consider it to be complete.
The first relatively novel idea is to ditch registers entirely and use a stack based design instead. This idea certainly isn't completely novel, but it does depart dramatically from traditional register architectures and the implementation details also depart partially from traditional stack based architectures as well. The rationale here is that a compiler cannot efficiently allocate registers without detailed knowledge of the micro-architecture of the CPU that the code will run on, as well as more macroscopic details like how many registers even exist at all, because typical code will be run on a wide variety of machines with different capabilities, so the compiler must optimize for the least common denominator. Additionally, the number of exposed registers is also a hard-coded part of the instruction set and cannot be trivially extended, leading to fewer programmer accessible registers than optimal for the hardware. This leads to sophisticated workarounds like register renaming to bring register allocation into the hands of the CPU, regardless of what the compiler outputted. Such strategies are necessary anyway due to things like out-of-order and speculative execution. As such, the ideal solution would seem to be one where there are no general purpose registers and it is fully up to the CPU to allocate as it sees fit, leading to a highly flexible approach that automatically adapts to the particular model of CPU.
Instead of registers, instructions operate on data values on the stack. These instructions can either operate on the values on top of the stack like usual stack architectures or they can specify how deep into the stack the values should be read from/written to, departing from traditional stack architectures. This offers most of the flexibility of traditional register architectures while eliminating the registers. The CPU will then utilize the registers almost as a sort of micro-cache for the stack, and most stack operations will not actually propagate to the physical RAM. For accessing the heap, a load/store style architecture is used, where data must be moved to the stack to be operated on and then returned to the heap. An efficient CPU will translate this internally into register load and store micro-operations.
The address space will be divided into three segments: stack, heap, and code. Most instructions operate on the stack segment. Load/store instructions additionally address the heap segment. Jump/label and call/return instructions additionally address the code segment. The code segment cannot be read from or written to, simply because all data manipulation instructions implicitly operate on the stack segment and/or heap segment, but never the code segment. The stack and heap segments cannot be executed because the instruction pointer always implicitly refers to the code segment. The kernel must be utilized to transfer data between the heap and code segments for applications like a JIT. Note that these segments are not like x86 segments. They do not index into different portions of the same global address space. Rather, they are three independent address spaces allocated onto RAM via paging. As such, the same address can refer to three different things depending on which segment is implied.
Control flow is also non-standard. The usual direct jump, indirect jump, direct call, indirect call, and return instructions all exist, with direct and indirect jumps existing in both relative and absolute form, and both conditional and unconditional form. Call instructions also exist in relative and absolute form. However, there are restrictions on where control flow is allowed to jump to, with violations resulting in a CPU fault. Three additional instructions are introduced to facilitate this mechanism: direct labels, indirect labels, and function labels. Direct and indirect jumps must jump to the location of a direct or indirect label. Note that the type of jump and type of label are not connected, i.e. direct jumps can jump to both direct and indirect labels, and same for indirect jumps. A direct label specifies (either relatively or absolutely) the address of the jump instruction that is allowed to jump to it. Jumping to the label from any other location is forbidden. An indirect label will allow jumps from any location. Function labels may only receive jumps originating from call instructions. Likewise, call instructions may only jump to function labels. Return instructions are slightly more complicated and come in two varieties. The first variety is an indirect return and permits returning to the location of any call instruction. The second variety is a direct return which specifies the location (either relatively or absolutely) of the function label corresponding to the current function (although it does not verify that such a label exists at that address) and only permits returning to direct calls that call that address or any indirect call. When labels are executed they behave as a no-op, except for call instructions which aren't just labels for return instructions to jump to but also have additional functionality.
The instruction format is also non-traditional. To mitigate improper code execution, eliminate the accidental inclusion of labels in the byte-sequences composing instructions, and enhance debugging, the most significant bit of each instruction byte specifies whether or not it is the first byte of the instruction. A 1
specifies that it is the first byte, and a 0
is used otherwise. This does reduce spatial efficiency of the instruction encoding to only 87.5%, but this is considered acceptable. Additionally, a special variable-length encoding is used for the op-codes. If the most significant of the remaining 7 bits is a 0
, it is the final op-code byte. Otherwise, additional op-code bytes follow. Op-codes are permitted to be up to 4 bytes. There are therefore 64 one byte op-codes, 4,096 two byte op-codes, 262,144 three byte op-codes, and 16,777,216 four byte op-codes, giving a total instruction set space of 17,043,520 possible instructions. Instructions with op-codes exceeding four bytes are considered invalid. Because the first and remaining bytes of instructions are distinguished from each other, this allows variable length instructions to exist without unique op-codes for each possible instruction length. Every instruction will have a list of valid lengths and any invalid length will throw a fault, as well as any length longer than the maximum valid length.
Most CPU architectures use a flag based system, but it is awkward because many operations then produce hidden side-effects that are most of the time quite unnecessary. This architecture does not utilize flags and instead uses dedicated comparison operations which work similar to their counterparts in high-level languages such as C. A boolean false is represented by the byte 0x00
and a boolean true is represented by the byte 0x01
. When executing conditional instructions, only the least significant bit is inspected to determine the value of a boolean, meaning that bitwise operations always have the expected effect on a boolean even if some or all of the upper bits are set.