Skip to content

Latest commit

 

History

History
192 lines (155 loc) · 8.05 KB

README.md

File metadata and controls

192 lines (155 loc) · 8.05 KB

GameBrian - Attempt at GBA Flash Cartridge

This project was generated from the Chisel Template and uses Chisel3.

Supported targets are:

The top module is dev.meirl.gamebrian.Top, which instantiates all of the different components and their connections. The real top module for FPGA are contained in their own packages ice and ecp. This was done to remove FPGA specific primitives from the core design. Also in those packages are the chisel wrappers for the primitives such as PLLs and IO buffers.

For synthesis and implementation, yosys and nextpnr are used. Builds for iCE40 and ECP5 devices are supported by Project IceStorm and Project Trellis, respectively, as mentioned in the nextpnr readme.

Building

iCE40

To build and flash the iCE40 breakout board:

 $ TARGET=ice make

To just build and not flash:

 $ TARGET=ice make build

For timing analysis:

$ TARGET=ice make time

ECP5

To build and configure the ECP5 evaluation board:

 $ TARGET=ecp make

To flash the SPI flash:

$ make ecp_flash

To just build:

 $ TARGET=ecp make build

There is no explicit timing analysis tool for ECP5 (I think)

Arty A7

This project was originally designed to run on the Arty A7 board, but I'm not sure where that code went. Maybe in the arty branch?

Elaboration

Sometimes, especially for debugging and simulation, just the elaborated verilog files are desired. There are 3 targets so far, listed in src/main/scala/main.scala: ice, ecp, gba, and logic.

The main.ice target produces generated_output/ICETop.v which can be run through yosys and nextpnr.

$ sbt "runMain main.ice" # similar things can be run; e.g. "runMain main.ecp"

Simulation

There are some simulation testbenches for some modules. They can be found in the sim/ directory. The resulting VCD files should also be produced in that directory.

GBA

$ make sim

Logic

Simulating the logic analyzer

$ sbt "runMain main.logic" # elaborate verilog
$ iverilog -g2012 -o build/Logic.out generated_output/Logic.v \
    sim/Logic_tb.v ext/uart/rtl/uart_tx.v # build simulation with testbench
$ build/Logic.out # run simulation

Module Descriptions

Attempts were made to explain my design choices...

Board Modules

For each board supported, there are two class/modules; for example, ECPTop and ECPBoard. The board module contains parameters for that board, such as the clock speed and different input/output ports. These port widths and names should match those in the constraints file. The top module extends the board module and contains the actual logic. When using PLL output as the clock, the parameter boardClockFrequency should be updated correctly so that the modules instantiated inside it gets the correct frequency. Modules such as blinky and UART require correct frequency parameters.

GBA

This is the main interface with the GameBoy Advance device. It contains the logic for handling the ROM reads, as well as RAM reads/writes. The GBA logic is a little annoying; the 16-bit AD lines are used as lower 16 bits of the ROM address input, as well as the ROM data output. The 8-bit A lines are used as the upper 8 bits of the address. The AD lines are also used for RAM address and the RAM data is output on the A lines.

When reading from ROM, the GameBoy provides the address on the AD and A lines, then pulls the nCS line LOW. Then the nRD line is pulled low to signal a read. The cartridge must then put the data on the AD lines. When the nRD line is pulsed (HIGH then back to LOW), the next address is put on the AD lines. Therefore, the original lower 16-bits of the address must be latched on falling edge of nCS and also incremented on rising edge of nRD.

The GBA module communicates with other modules with a simple memory interface. From the GameBoy's perspective, memory-mapped IO. This is provided by the interconnect.

Interconnect

The interconnect maps the reads and writes from the gameboy to the cartridge's various modules. Mainly, the RAM, but some addresses are mapped to registers in different modules. The SD card module is one of such modules.

SDCard

The name says SDCard, but it's just an SPI interface, since it was easier to implement than the SD protocol. Mapped to specific RAM addresses (currently 0xE00F000 and 0xE00F001) by the interconnect are the SDCard module's SPI_CONTROL and SPI_DATA registers. The control register:

  • bit 0: (R/W) set to start a read/write operation. Stays set while operation is in progress and is cleared when done.
  • bit 1: (R/W) controls the chip select HIGH or LOW
  • bit 2-6: unused
  • bit 7: (R) 1 is card is physically present in the slot

To SPI transfer data:

  1. Write data to SPI_DATA (0xE00F000)
  2. Set bit 0 on SPI_CONTROL (0xE00F001)

The data is shifted out from and into the SPI_DATA register. The software counterpart can be found in gba/first_stage in files source/gba_sd.c, source/gba_spi_asm.s, include/gba_sd.h, include/gba_spi.h.

RAM

The interconnect maps all other addresses to a simple memory region.

Blinky

Just a simple counter blinky for sanity tests. Also a great example of boardClockFrequency being used.

FPGA-Specific Primitives

In each of the supported boards' packages, there are some other modules that wraps their FPGA-specific primitives in Chisel modules.

Logic Analyzer

Couldn't figure out why some things weren't working. Decided to build an on-chip logic analyzer. Has mainly 3 important parameters and 3 important pins/ports:

  • WIDTH: how many signals you wanna capture. (a 'word')
  • ADDR_WIDTH: how many samples to store. This is the width of the address line, so the actual memory is 2^ADDR_WIDTH 'words'.
  • BAUD: baud rate of UART
  • trigger: triggers capture and UART transfer on positive edge
  • signals: the actual signals/wires desired to be sampled
  • uart_out: output pin for UART transfer

So, connect signals to the data you want to capture, then pulse the trigger. Data is sent out via UART. There is a script in tools/logic.py to receive that data stream and output a VCD file.

The data stream is as follows: 4 bytes for identification/simple "synchronization". These are the bytes "\x4C\x4F\x47\x43" (aka "LOGC") 1 byte for width 4 bytes for length (big endian?) length*n bytes for the actual data stream (see below)

When sending captures wider than 8 bits, its sent in groups of 8 bits. For example, when capturing 20 bits: length bytes sent representing the lower 8 signals (7 to 0), another length bytes sent representing the second 8 signals (15 to 8), finally, another length bytes where the 4 least significant bits represent the last 4 signals (19 to 16)

In other words, when capturing 4 signals looks like this:

0: 0 1 1 0
1: 1 1 0 0
2: 0 0 0 1
3: 1 1 0 1

The captures get saved as \x0A\x0B\x01\x0C (read vertically, bottom to top) and is transferred as it is. When capturing 9 signals:

0: 0 1 1 0
1: 1 1 0 0
2: 0 0 0 1
3: 1 1 0 1
4: 0 1 1 0
5: 1 1 0 0
6: 0 0 0 1
7: 1 1 0 1
8: 1 1 0 1

This capture get saved as: \x1AA\x1BB\x011\x1CC (this is not a valid hex representation in python, I'm just using it to show 3 hex digits at a time). This capture gets sent as: \xAA\xBB\x11\xCC\x01\x01\x00\x01 (this lower bytes are sent first, then the rest)

UART

The UART module is simple; data is put on the data lines, send line is pulsed, and busy gets asserted HIGH until the send is finished. The actual module is from ben-marshall/uart

The Chisel module UART_T wraps that verilog code as a BlackBox module and renames the ports, as well as wiring up the implicit clock and reset.