This is of course a work in progress.
I had a lot of fun working on my 16 bit softcore processor (https://github.com/aslak3/cpu), and thought it would be interesting to extend the design to a 32 bit processor.
- I'm one of those odd programmers who enjoys writing code in assembly. I want to produce an ISA which is pleasent to program in assembly, even if this means it does not perfom as well in other envirnoments, such as when it is the target of a C compiler.
- Saying that, it would be terrific to look at producing an LLVM target for this design. And with that in mind, it should have the necessary ISA features to make running C code reasonably efficent, providing it doesn't compromise the fun of writing code in assembly.
- Sitting between RISC and CISC is a nice place to be.
- Stacking operations with multiple registers in one instruction is not very RISC like at all, and would certainly hinder a future pipelined design. None the less it's a big programmer convience.
- On the other hand, being a load/store based processor has obvious benifits.
- I'm happy to borrow ideas from other designs.
- An eventual goal is to look at introducing a pipeline, though this may entail a partial or even complete redisgn of the ISA and a scrapping of most of this implementation itself.
- This project is also a good place to explore my interest in processor design. For instance, it would be interesting to look at switching to a microcoded control unit, just for the experience of doing so.
- This seems like a nice logic block to use to explore other areas of computer systems design, such as memory controllers for SDRAM etc.
- 32 bit address and databuses
- 32 bit instruction word
- 16 x 32 bit general purpose registers
- 32 bit Program Counter
- No microcode: a coded state machine is used
- CustomASM (https://github.com/hlorenzi/customasm) is the current assembler
- Long, Word and Byte size memory accesses, with signed/unsigned extension on byte and word reads
- Bus error signal on unaligned word transfers
- Memory currenly must be 32 bits wide
- Some opcodes (like LOADI, JUMPs, BRANCHes, ALUMI, CALLs) have one following immediate value or address
- Load an immediate 32 bit quantity into a register found, the value being found at the following longword in the instruction stream
- Load the lower 16 bit portion into a register using a single instruction longword, the value is sign extended to 32 bits
- Load and store instructions operate either through a register, a register with an immediate displacement, the program counter with an immediate displacement, or an immediate memory address. Displacements may either be found in the following longword or an integrated (termed "quick" in the ISA) 12 bit quantity, which is sign extended.
- Clear instruction as assembler nicety, which uses a quick load of zero
- Simple status bits: zero, negative, carry and overflow
- ALU operations including
- add, add with carry, subtract, subtract with carry, signed and unsigned 8 bit to 16 bit multiply, and, or, xor, not, shift left, shift right, copy, negation, sign extensions, etc
- ALU operations are of the form DEST <= OPERAND1 op OPERAND2, or DEST <= op OPERAND
- ALUMI operates with an immediate longword operand extrated from the instruction stream, eg. add r0,r1,#123
- ALUMQ operates with an embedded sign exteded 12 bit quantity inside the instruction word, eg. addq r0,r1,#2
- Assembler provides shorthand versions, eg: add r0,#123 which is the same as: add r0,r0,#123
- Flow control, including calling subroutines and return: borrows the 15 conditions from ARM
- Jump and call subroutine through register
- Branch either with a 32 bit displacement or with a quick 12 bit displacement
- Return can also be conditional
- Flags (currently just the four condition codes) can be manually ORed/ANDed
- Nop and Halt instructions
- Register to register copy
- Push and pop a single register eg: push (r15),r0 pushes r0 onto r15
- push and pop multiple registers eg: pushmulti (r15),R1|R3|R5 - pushes r1, r3 and r5 onto r15 in sequence, decrementing it by 12
- Register File, Program Counter, Instruction Register
- ALU
- Bus Interface
- Control Unit (no testbench as yet)
- DataPath and external entity
- Simulation environment
- Expose condition code register to allow it to be stacked/transferred to a register
- Test bench for control unit
- Integration into FPGA environment
- Interrupts
- Support for narrower then 32 bit IO/memory ports
- Start thinking about supervisor level access
- ...
- 31 downto 24 : opcode (NOP, HALT, ORFLAGS, ANDFLAGS)
- 15 downto 0 : what to load (ORFLAGS, ANDFLAGS)
- 31 downto 24 : opcode (LOADLI, LOADWSQ)
- 23 downto 20 : destination register
- 15 downto 0 : what to load (LOADWSQ)
- 31 downto 24 : opcode (LOADR, STORER, LOADM, STORM, LOADRD, STORERD, LOADPCD, STOREPCD, LOADRDQ, STORERDQ, LOADQPCD, STOREPCDQ)
- 23 downto 20 : register
- 19 downto 16 : address register (not LOADM, STOREM, LOADPCD*, STOREPCD*)
- 15 downto 13 : transfer type
- 11 downto 0 : quick displacement (Q only)
- 31 downto 24 : opcode (JUMP, BRANCH, BRANCHQ, JUMPR, CALLJUMP, CALLBRANCH, CALLBRANCHQ, CALLJUMPR, RETURN)
- 23 downto 20 : new program counter register (JUMPR, CALLJUMPR)
- 19 downto 16 : stack register (for CALL*, RETURN)
- 15 downto 12 : condition
- 11 downto 0 : quick displacement (BRANCHQ, CALLBRANCHQ ony)
- 31 downto 24 : opcode (ALUM, ALUMI, ALUS)
- 23 downto 20 : destination register
- 19 downto 16 : operand register2
- 15 downto 12 : operation code
- 11 downto 8 : operand register3 (ALUM only)
- 31 downto 24 : opcode (ALUMQ)
- 23 downto 20 : destination register
- 19 downto 16 : operand register2
- 15 downto 12 : operation code
- 11 downto 0 : quick immediate value
- 31 downto 24 : opcode (PUSH, POP, PUSHMULTI, POPMULTI)
- 23 downto 20 : what to push/pop (PUSH, POP)
- 19 downto 16 : stack register
- 15 downto 0 : register mask (PUSHMULTI, POPMULTI)
- 31 downto 24 : opcode (COPY)
- 23 downto 20 : destination
- 19 downto 16 : source
Opcode | VHDL code | Extension longword | Processor cycles | Description |
---|---|---|---|---|
0x01 | NOP | - | 3 | Does nothing for one instruction |
0x02 | HALT | - | 3 + forever | Stops the processor and asserts HALT signal |
0x03 | ORFLAGS | - | 3 | Flags := Flags OR quick value |
0x04 | ANDFLAGS | - | 3 | Flags := Flags AND quick value |
0x10 | LOADLI | Long value | 3 | rN := Long value |
0x11 | LOADQWS | - | 3 | rN := sign extended quick word |
0x20 | LOADR | - | 3 | rN := (rA) |
0x21 | STORER | - | 3 | (rA) := (rN) |
0x22 | LOADM | Memory address | 4 | rN := (Memory address) |
0x23 | STOREM | Memory address | 4 | (Memory address) := rN |
0x24 | LOADRD | Memory displacement | 4 | rN := (rA + Memory displacement) |
0x25 | STORERD | Memory dispalcement | 4 | (rA + Memory displacement) := rN |
0x26 | LOADRDQ | - | 4 | rN := (rA + quick memory displacement) |
0x27 | STORERDQ | - | 4 | (rA + quick memory displacement) := rN |
0x28 | LOADPCD | Memory displacement | 4 | rN := (PC + Memory displacement) |
0x29 | STOREPCD | Memory dispalcement | 4 | (PC + Memory displacement) := rN |
0x2a | LOADPCDQ | - | 4 | rN := (PC + quick memory displacement) |
0x2b | STOREPCDQ | - | 4 | (PC + quick memory displacement) := rN |
0x30 | JUMP | Memory address | 3 | If condition -> PC := Memory address |
0x31 | BRANCH | Memory displacement | 4 | If condition -> PC := PC + Memory displacement |
0x32 | BRANCHQ | - | 4 | If condition -> PC := PC + quick memory displacement |
0x33 | CALLJUMP | Memory address | 5 | If condition -> rSP := rSP - 4 ; (rSP) := PC ; PC := Memory address |
0x34 | CALLBRANCH | Memory displacement | 5 | If condition -> rSP := rSP - 4 ; (rSP) := PC ; PC := PC + Memory displacement |
0x35 | CALLBRANCHQ | - | 5 | If condition -> rSP := rSP - 4 ; (rSP) := PC ; PC := PC + Quick memory displacement |
0x36 | JUMPR | - | 3 | If condition -> PC := rN |
0x37 | CALLJUMPR | - | 5 | If condition -> rSP := rSP - 4 ; (rSP) := PC ; PC := rN |
0x38 | RETURN | - | 3 | If condition -> PC := (rSP) ; rSP := rSP + 4 |
0x40 | ALUM | - | 3 | rD := rOP2 operation rOP3 |
0x42 | ALUMI | Operand | 3 | rD := rOP2 operation operand |
0x49 | ALUMS | - | 3 | rD := operation rOP2 |
0x50 | ALUMQ | - | 3 | rD := rOP2 operation Quick operand |
0x60 | PUSH | - | 4 | rSP := rSP - 4 ; (rSP) := rN |
0x61 | POP | - | 3 | rN := r(SP) ; rSP := rSP + 4 |
0x62 | PUSHMULTI | - | 3 + rN count * 2 | for each rN set do: rSP := rSP - 4 ; (rSP) := rN |
0x61 | POPMULTI | - | 3 + rN count * 2 | for each rN set do: rN := r(SP) ; rSP := rSP + 4 |
0x70 | COPY | - | 3 | rD := rS |
3 | 2 | 1 | 0 |
---|---|---|---|
V: Oerflow | C: Carry | Z: Zero | N: Negative |
Hex value | Assembly postfix | Description | Meaning |
---|---|---|---|
1 | eq AKA zs | Equal / equals zero | Z |
2 | ne AKA zc | Not equal | !Z |
3 | cs | Carry set | C |
4 | cc | Carry clear | !C |
5 | mi | Minus | N |
6 | pl | Plus | !N |
7 | vs | Overflow | V |
8 | vc | No overflow | !V |
9 | hi | Unsigned higher | !C and !Z |
A | ls | Unsigned lower or same | C or Z |
B | ge | Signed greater than or equal | N == V |
C | lt | Signed less than | N != V |
D | gt | Signed greater than | !Z and (N == V) |
E | le | Signed less than or equal | Z or (N != V) |
0, F | al | Always | any |
0b0000 | r0 |
0b0001 | r1 |
... | ... |
0b1110 | r14 |
0b1111 | r15 |
Value | Transfer size and extension mode (loads only) |
---|---|
0b000 | Byte unsigned |
0b001 | Word unsigned |
0b010 | Long unsigned |
0b011 | Reserved |
0b100 | Byte unsigned |
0b101 | Word signed |
0b110 | Long |
0b111 | Reserved |
0b0000 | Add |
0b0001 | Add with cary |
0b0010 | Subtract |
0b0011 | Subtract with borrow |
0b0100 | Bitwise AND |
0b0101 | Bitwise OR |
0b0110 | Bitwise XOR |
0b0111 | Copy (does not update flags) |
0b1000 | Compare |
0b1001 | Bitwise test |
0b1010 | Unsigned 16 bit to 32 bit multiply |
0b1011 | Signed 16 bit to 32 bit multiply |
0b1100-0b1111 | Unused |
0b0000 | Increment |
0b0001 | Decrement |
0b0010 | Bitwise NOT |
0b0011 | Logic shift left |
0b0100 | Logic shift right |
0b0101 | Arithmetic shift left |
0b0110 | Arithmetic shift right |
0b0111 | Negation |
0b1000 | Byte swap |
0b1001 | Compare with zero |
0b1010 | Sign extend word |
0b1011 | Sign extend byte |
0b1100-0b1111 | Unused |
The currently used CustomASM CPU definition makes it possible to write very presentable assembly by, for example, combing LOADI, LOADM, LOADR and LOADRD into a single "load" mnemonic with the width represented by .l, .ws, .wu, .bs or .bu. ALU operations are similarly represented.
#d32 start ; reset vector
start: load.l r15,#0x200 ; setup the stack pointer
loadq.ws r0,#1 ; intiail factorial
loadq.ws r3,#9 ; getting 1 to this number
load.l r2,#table ; output pointer
loop: calljump factorial ; get the factorial for r0 in r1
store.l (r2),r1 ; save it in the table
addq r2,#4 ; move to the next row
addq r0,#1 ; inc the number we are calculating
compare r0,r3 ; got all the factorials?
branchqne loop ; no? loop again
halt ; stop the proc
factorial: push (r15),r0 ; save the param, we will use it
copy r1,r0 ; start from this value
l: subq r0,#1 ; loop counter
branchzs factorialo ; done?
mulu r1,r0,r1 ; multiply running total by previous
branch l ; get the next one
factorialo: pop r0,(r15) ; restore the original param
return ; done
#d32 -1 ; start of table marker
table: ; table of output goes here