Skip to content

Latest commit

 

History

History
executable file
·
731 lines (677 loc) · 15.8 KB

README.md

File metadata and controls

executable file
·
731 lines (677 loc) · 15.8 KB

32 bit processor in VHDL

This is of course a work in progress.

Rationale and motivation

I had a lot of fun working on my 16 bit softcore processor (https://github.com/aslak3/cpu), and thought it would be interesting to extend the design to a 32 bit processor.

  • I'm one of those odd programmers who enjoys writing code in assembly. I want to produce an ISA which is pleasent to program in assembly, even if this means it does not perfom as well in other envirnoments, such as when it is the target of a C compiler.
  • Saying that, it would be terrific to look at producing an LLVM target for this design. And with that in mind, it should have the necessary ISA features to make running C code reasonably efficent, providing it doesn't compromise the fun of writing code in assembly.
  • Sitting between RISC and CISC is a nice place to be.
    • Stacking operations with multiple registers in one instruction is not very RISC like at all, and would certainly hinder a future pipelined design. None the less it's a big programmer convience.
    • On the other hand, being a load/store based processor has obvious benifits.
  • I'm happy to borrow ideas from other designs.
  • An eventual goal is to look at introducing a pipeline, though this may entail a partial or even complete redisgn of the ISA and a scrapping of most of this implementation itself.
    • This project is also a good place to explore my interest in processor design. For instance, it would be interesting to look at switching to a microcoded control unit, just for the experience of doing so.
  • This seems like a nice logic block to use to explore other areas of computer systems design, such as memory controllers for SDRAM etc.

Summary of features so far implemented

General

  • 32 bit address and databuses
  • 32 bit instruction word
  • 16 x 32 bit general purpose registers
  • 32 bit Program Counter
  • No microcode: a coded state machine is used
  • CustomASM (https://github.com/hlorenzi/customasm) is the current assembler

Memory

  • Long, Word and Byte size memory accesses, with signed/unsigned extension on byte and word reads
    • Bus error signal on unaligned word transfers
  • Memory currenly must be 32 bits wide

Instructions

  • Some opcodes (like LOADI, JUMPs, BRANCHes, ALUMI, CALLs) have one following immediate value or address
  • Load an immediate 32 bit quantity into a register found, the value being found at the following longword in the instruction stream
  • Load the lower 16 bit portion into a register using a single instruction longword, the value is sign extended to 32 bits
  • Load and store instructions operate either through a register, a register with an immediate displacement, the program counter with an immediate displacement, or an immediate memory address. Displacements may either be found in the following longword or an integrated (termed "quick" in the ISA) 12 bit quantity, which is sign extended.
  • Clear instruction as assembler nicety, which uses a quick load of zero
  • Simple status bits: zero, negative, carry and overflow
  • ALU operations including
    • add, add with carry, subtract, subtract with carry, signed and unsigned 8 bit to 16 bit multiply, and, or, xor, not, shift left, shift right, copy, negation, sign extensions, etc
  • ALU operations are of the form DEST <= OPERAND1 op OPERAND2, or DEST <= op OPERAND
    • ALUMI operates with an immediate longword operand extrated from the instruction stream, eg. add r0,r1,#123
    • ALUMQ operates with an embedded sign exteded 12 bit quantity inside the instruction word, eg. addq r0,r1,#2
    • Assembler provides shorthand versions, eg: add r0,#123 which is the same as: add r0,r0,#123
  • Flow control, including calling subroutines and return: borrows the 15 conditions from ARM
    • Jump and call subroutine through register
    • Branch either with a 32 bit displacement or with a quick 12 bit displacement
    • Return can also be conditional
  • Flags (currently just the four condition codes) can be manually ORed/ANDed
  • Nop and Halt instructions
  • Register to register copy

Stack

  • Push and pop a single register eg: push (r15),r0 pushes r0 onto r15
  • push and pop multiple registers eg: pushmulti (r15),R1|R3|R5 - pushes r1, r3 and r5 onto r15 in sequence, decrementing it by 12

Started

  • Register File, Program Counter, Instruction Register
  • ALU
  • Bus Interface
  • Control Unit (no testbench as yet)
  • DataPath and external entity
  • Simulation environment

TODO

  • Expose condition code register to allow it to be stacked/transferred to a register
  • Test bench for control unit
  • Integration into FPGA environment
  • Interrupts
  • Support for narrower then 32 bit IO/memory ports
  • Start thinking about supervisor level access
  • ...

Instruction formats

Base - Prefix 0x0

  • 31 downto 24 : opcode (NOP, HALT, ORFLAGS, ANDFLAGS)
  • 15 downto 0 : what to load (ORFLAGS, ANDFLAGS)

Load Immedaite Long, Word quick - Prefix 0x1

  • 31 downto 24 : opcode (LOADLI, LOADWSQ)
  • 23 downto 20 : destination register
  • 15 downto 0 : what to load (LOADWSQ)

Other Load and Stores - Prefix 0x2

  • 31 downto 24 : opcode (LOADR, STORER, LOADM, STORM, LOADRD, STORERD, LOADPCD, STOREPCD, LOADRDQ, STORERDQ, LOADQPCD, STOREPCDQ)
  • 23 downto 20 : register
  • 19 downto 16 : address register (not LOADM, STOREM, LOADPCD*, STOREPCD*)
  • 15 downto 13 : transfer type
  • 11 downto 0 : quick displacement (Q only)

Flow control - Prefix 0x3

  • 31 downto 24 : opcode (JUMP, BRANCH, BRANCHQ, JUMPR, CALLJUMP, CALLBRANCH, CALLBRANCHQ, CALLJUMPR, RETURN)
  • 23 downto 20 : new program counter register (JUMPR, CALLJUMPR)
  • 19 downto 16 : stack register (for CALL*, RETURN)
  • 15 downto 12 : condition
  • 11 downto 0 : quick displacement (BRANCHQ, CALLBRANCHQ ony)

ALU operations - Prefix 0x4

  • 31 downto 24 : opcode (ALUM, ALUMI, ALUS)
  • 23 downto 20 : destination register
  • 19 downto 16 : operand register2
  • 15 downto 12 : operation code
  • 11 downto 8 : operand register3 (ALUM only)

ALUQ operations - Prefix 0x5

  • 31 downto 24 : opcode (ALUMQ)
  • 23 downto 20 : destination register
  • 19 downto 16 : operand register2
  • 15 downto 12 : operation code
  • 11 downto 0 : quick immediate value

Push and Pop including Multiple - Prefix 0x6

  • 31 downto 24 : opcode (PUSH, POP, PUSHMULTI, POPMULTI)
  • 23 downto 20 : what to push/pop (PUSH, POP)
  • 19 downto 16 : stack register
  • 15 downto 0 : register mask (PUSHMULTI, POPMULTI)

Copy registers - Prefix 0x7

  • 31 downto 24 : opcode (COPY)
  • 23 downto 20 : destination
  • 19 downto 16 : source

Opcode details

Opcode VHDL code Extension longword Processor cycles Description
0x01 NOP - 3 Does nothing for one instruction
0x02 HALT - 3 + forever Stops the processor and asserts HALT signal
0x03 ORFLAGS - 3 Flags := Flags OR quick value
0x04 ANDFLAGS - 3 Flags := Flags AND quick value
0x10 LOADLI Long value 3 rN := Long value
0x11 LOADQWS - 3 rN := sign extended quick word
0x20 LOADR - 3 rN := (rA)
0x21 STORER - 3 (rA) := (rN)
0x22 LOADM Memory address 4 rN := (Memory address)
0x23 STOREM Memory address 4 (Memory address) := rN
0x24 LOADRD Memory displacement 4 rN := (rA + Memory displacement)
0x25 STORERD Memory dispalcement 4 (rA + Memory displacement) := rN
0x26 LOADRDQ - 4 rN := (rA + quick memory displacement)
0x27 STORERDQ - 4 (rA + quick memory displacement) := rN
0x28 LOADPCD Memory displacement 4 rN := (PC + Memory displacement)
0x29 STOREPCD Memory dispalcement 4 (PC + Memory displacement) := rN
0x2a LOADPCDQ - 4 rN := (PC + quick memory displacement)
0x2b STOREPCDQ - 4 (PC + quick memory displacement) := rN
0x30 JUMP Memory address 3 If condition -> PC := Memory address
0x31 BRANCH Memory displacement 4 If condition -> PC := PC + Memory displacement
0x32 BRANCHQ - 4 If condition -> PC := PC + quick memory displacement
0x33 CALLJUMP Memory address 5 If condition -> rSP := rSP - 4 ; (rSP) := PC ; PC := Memory address
0x34 CALLBRANCH Memory displacement 5 If condition -> rSP := rSP - 4 ; (rSP) := PC ; PC := PC + Memory displacement
0x35 CALLBRANCHQ - 5 If condition -> rSP := rSP - 4 ; (rSP) := PC ; PC := PC + Quick memory displacement
0x36 JUMPR - 3 If condition -> PC := rN
0x37 CALLJUMPR - 5 If condition -> rSP := rSP - 4 ; (rSP) := PC ; PC := rN
0x38 RETURN - 3 If condition -> PC := (rSP) ; rSP := rSP + 4
0x40 ALUM - 3 rD := rOP2 operation rOP3
0x42 ALUMI Operand 3 rD := rOP2 operation operand
0x49 ALUMS - 3 rD := operation rOP2
0x50 ALUMQ - 3 rD := rOP2 operation Quick operand
0x60 PUSH - 4 rSP := rSP - 4 ; (rSP) := rN
0x61 POP - 3 rN := r(SP) ; rSP := rSP + 4
0x62 PUSHMULTI - 3 + rN count * 2 for each rN set do: rSP := rSP - 4 ; (rSP) := rN
0x61 POPMULTI - 3 + rN count * 2 for each rN set do: rN := r(SP) ; rSP := rSP + 4
0x70 COPY - 3 rD := rS

Condition flags

3 2 1 0
V: Oerflow C: Carry Z: Zero N: Negative

Conditions (jumps, branches, and return)

Hex value Assembly postfix Description Meaning
1 eq AKA zs Equal / equals zero Z
2 ne AKA zc Not equal !Z
3 cs Carry set C
4 cc Carry clear !C
5 mi Minus N
6 pl Plus !N
7 vs Overflow V
8 vc No overflow !V
9 hi Unsigned higher !C and !Z
A ls Unsigned lower or same C or Z
B ge Signed greater than or equal N == V
C lt Signed less than N != V
D gt Signed greater than !Z and (N == V)
E le Signed less than or equal Z or (N != V)
0, F al Always any

Registers

0b0000 r0
0b0001 r1
... ...
0b1110 r14
0b1111 r15

Transfer types

Value Transfer size and extension mode (loads only)
0b000 Byte unsigned
0b001 Word unsigned
0b010 Long unsigned
0b011 Reserved
0b100 Byte unsigned
0b101 Word signed
0b110 Long
0b111 Reserved

ALU multi (destination and operand) operations

0b0000 Add
0b0001 Add with cary
0b0010 Subtract
0b0011 Subtract with borrow
0b0100 Bitwise AND
0b0101 Bitwise OR
0b0110 Bitwise XOR
0b0111 Copy (does not update flags)
0b1000 Compare
0b1001 Bitwise test
0b1010 Unsigned 16 bit to 32 bit multiply
0b1011 Signed 16 bit to 32 bit multiply
0b1100-0b1111 Unused

ALU single (destination only) operations

0b0000 Increment
0b0001 Decrement
0b0010 Bitwise NOT
0b0011 Logic shift left
0b0100 Logic shift right
0b0101 Arithmetic shift left
0b0110 Arithmetic shift right
0b0111 Negation
0b1000 Byte swap
0b1001 Compare with zero
0b1010 Sign extend word
0b1011 Sign extend byte
0b1100-0b1111 Unused

Sample code

The currently used CustomASM CPU definition makes it possible to write very presentable assembly by, for example, combing LOADI, LOADM, LOADR and LOADRD into a single "load" mnemonic with the width represented by .l, .ws, .wu, .bs or .bu. ALU operations are similarly represented.

            #d32 start                    ; reset vector

start:      load.l r15,#0x200             ; setup the stack pointer
            loadq.ws r0,#1                ; intiail factorial
            loadq.ws r3,#9                ; getting 1 to this number
            load.l r2,#table              ; output pointer
loop:       calljump factorial            ; get the factorial for r0 in r1
            store.l (r2),r1               ; save it in the table
            addq r2,#4                    ; move to the next row
            addq r0,#1                    ; inc the number we are calculating
            compare r0,r3                 ; got all the factorials?
            branchqne loop                ; no? loop again
            halt                          ; stop the proc

factorial:  push (r15),r0                 ; save the param, we will use it
            copy r1,r0                    ; start from this value
l:          subq r0,#1                    ; loop counter
            branchzs factorialo           ; done?
            mulu r1,r0,r1                 ; multiply running total by previous
            branch l                      ; get the next one
factorialo: pop r0,(r15)                  ; restore the original param
            return                        ; done

            #d32 -1                        ; start of table marker
table:                                     ; table of output goes here