This project was generated from the Chisel Template and uses Chisel3.
Supported targets are:
The top module is dev.meirl.gamebrian.Top
, which instantiates all of the different components and their connections.
The real top module for FPGA are contained in their own packages ice
and ecp
.
This was done to remove FPGA specific primitives from the core design.
Also in those packages are the chisel wrappers for the primitives such as PLLs and IO buffers.
For synthesis and implementation, yosys and nextpnr are used. Builds for iCE40 and ECP5 devices are supported by Project IceStorm and Project Trellis, respectively, as mentioned in the nextpnr readme.
To build and flash the iCE40 breakout board:
$ TARGET=ice make
To just build and not flash:
$ TARGET=ice make build
For timing analysis:
$ TARGET=ice make time
To build and configure the ECP5 evaluation board:
$ TARGET=ecp make
To flash the SPI flash:
$ make ecp_flash
To just build:
$ TARGET=ecp make build
There is no explicit timing analysis tool for ECP5 (I think)
This project was originally designed to run on the Arty A7 board, but I'm not sure where that code went.
Maybe in the arty
branch?
Sometimes, especially for debugging and simulation, just the elaborated verilog files are desired.
There are 3 targets so far, listed in src/main/scala/main.scala
: ice, ecp, gba, and logic.
The main.ice
target produces generated_output/ICETop.v
which can be run through yosys and nextpnr.
$ sbt "runMain main.ice" # similar things can be run; e.g. "runMain main.ecp"
There are some simulation testbenches for some modules. They can be found in the sim/
directory.
The resulting VCD files should also be produced in that directory.
$ make sim
Simulating the logic analyzer
$ sbt "runMain main.logic" # elaborate verilog
$ iverilog -g2012 -o build/Logic.out generated_output/Logic.v \
sim/Logic_tb.v ext/uart/rtl/uart_tx.v # build simulation with testbench
$ build/Logic.out # run simulation
Attempts were made to explain my design choices...
For each board supported, there are two class/modules; for example, ECPTop and ECPBoard.
The board module contains parameters for that board, such as the clock speed and different input/output ports.
These port widths and names should match those in the constraints file.
The top module extends the board module and contains the actual logic.
When using PLL output as the clock, the parameter boardClockFrequency
should be updated correctly so that the modules instantiated inside it gets the correct frequency.
Modules such as blinky and UART require correct frequency parameters.
This is the main interface with the GameBoy Advance device. It contains the logic for handling the ROM reads, as well as RAM reads/writes. The GBA logic is a little annoying; the 16-bit AD lines are used as lower 16 bits of the ROM address input, as well as the ROM data output. The 8-bit A lines are used as the upper 8 bits of the address. The AD lines are also used for RAM address and the RAM data is output on the A lines.
When reading from ROM, the GameBoy provides the address on the AD and A lines, then pulls the nCS line LOW. Then the nRD line is pulled low to signal a read. The cartridge must then put the data on the AD lines. When the nRD line is pulsed (HIGH then back to LOW), the next address is put on the AD lines. Therefore, the original lower 16-bits of the address must be latched on falling edge of nCS and also incremented on rising edge of nRD.
The GBA module communicates with other modules with a simple memory interface. From the GameBoy's perspective, memory-mapped IO. This is provided by the interconnect.
The interconnect maps the reads and writes from the gameboy to the cartridge's various modules. Mainly, the RAM, but some addresses are mapped to registers in different modules. The SD card module is one of such modules.
The name says SDCard, but it's just an SPI interface, since it was easier to implement than the SD protocol. Mapped to specific RAM addresses (currently 0xE00F000 and 0xE00F001) by the interconnect are the SDCard module's SPI_CONTROL and SPI_DATA registers. The control register:
- bit 0: (R/W) set to start a read/write operation. Stays set while operation is in progress and is cleared when done.
- bit 1: (R/W) controls the chip select HIGH or LOW
- bit 2-6: unused
- bit 7: (R) 1 is card is physically present in the slot
To SPI transfer data:
- Write data to SPI_DATA (0xE00F000)
- Set bit 0 on SPI_CONTROL (0xE00F001)
The data is shifted out from and into the SPI_DATA register.
The software counterpart can be found in gba/first_stage
in files source/gba_sd.c
, source/gba_spi_asm.s
, include/gba_sd.h
, include/gba_spi.h
.
The interconnect maps all other addresses to a simple memory region.
Just a simple counter blinky for sanity tests.
Also a great example of boardClockFrequency
being used.
In each of the supported boards' packages, there are some other modules that wraps their FPGA-specific primitives in Chisel modules.
Couldn't figure out why some things weren't working. Decided to build an on-chip logic analyzer. Has mainly 3 important parameters and 3 important pins/ports:
- WIDTH: how many signals you wanna capture. (a 'word')
- ADDR_WIDTH: how many samples to store. This is the width of the address line, so the actual memory is 2^ADDR_WIDTH 'words'.
- BAUD: baud rate of UART
- trigger: triggers capture and UART transfer on positive edge
- signals: the actual signals/wires desired to be sampled
- uart_out: output pin for UART transfer
So, connect signals
to the data you want to capture, then pulse the trigger.
Data is sent out via UART.
There is a script in tools/logic.py
to receive that data stream and output a VCD file.
The data stream is as follows: 4 bytes for identification/simple "synchronization". These are the bytes "\x4C\x4F\x47\x43" (aka "LOGC") 1 byte for width 4 bytes for length (big endian?) length*n bytes for the actual data stream (see below)
When sending captures wider than 8 bits, its sent in groups of 8 bits.
For example, when capturing 20 bits: length
bytes sent representing the lower 8 signals (7 to 0),
another length
bytes sent representing the second 8 signals (15 to 8), finally, another length
bytes where
the 4 least significant bits represent the last 4 signals (19 to 16)
In other words, when capturing 4 signals looks like this:
0: 0 1 1 0
1: 1 1 0 0
2: 0 0 0 1
3: 1 1 0 1
The captures get saved as \x0A\x0B\x01\x0C
(read vertically, bottom to top) and is transferred as it is.
When capturing 9 signals:
0: 0 1 1 0
1: 1 1 0 0
2: 0 0 0 1
3: 1 1 0 1
4: 0 1 1 0
5: 1 1 0 0
6: 0 0 0 1
7: 1 1 0 1
8: 1 1 0 1
This capture get saved as: \x1AA\x1BB\x011\x1CC
(this is not a valid hex representation in python, I'm just using it to show 3 hex digits at a time).
This capture gets sent as: \xAA\xBB\x11\xCC\x01\x01\x00\x01
(this lower bytes are sent first, then the rest)
The UART module is simple; data is put on the data
lines, send
line is pulsed, and busy
gets asserted HIGH until the send is finished.
The actual module is from ben-marshall/uart
The Chisel module UART_T
wraps that verilog code as a BlackBox module and renames the ports, as well as wiring up the implicit clock and reset.