Skip to content

Commit

Permalink
More work on docs
Browse files Browse the repository at this point in the history
  • Loading branch information
alex96295 committed Jan 17, 2024
1 parent 56fd70f commit c870d29
Showing 1 changed file with 47 additions and 23 deletions.
70 changes: 47 additions & 23 deletions docs/um/arch.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,31 @@ different purposes in terms of functional safety and reliability, security, and
capabilities.

Carfield relies on Cheshire as its host domain, and extends its minimal SoC with additional
interconnect ports and interrupts.
interconnect ports and interrupts.

The above block diagram depicts a fully-featured Carfield SoC, which currently provides:

- **Computing Domain**:
- *Host domain* (Cheshire), a Linux-capable RV64 system based on dual-core CVA6 processors with
self-invalidation coherency mechanism
- *Safe domain*, a Triple-Core-Lockstep (TCLS) RV32 microcontroller system based on CV32E40P, with fast interrupt
handling through the RISC-V CLIC
- *Secure domain*, a Dual-Core-Lockstep (DCLS) RV32 Hardware Root of Trust (HW RoT) systems that ensures the secure boot for the whole platform, serves as secure monitor for the entire system, and provides crypto acceleration services through various crypto-accelerators
- *Safe domain*, a Triple-Core-Lockstep (TCLS) RV32 microcontroller system based on CV32E40P,
with fast interrupt handling through the RISC-V CLIC
- *Secure domain*, a Dual-Core-Lockstep (DCLS) RV32 Hardware Root of Trust (HW RoT) systems that
ensures the secure boot for the whole platform, serves as secure monitor for the entire
system, and provides crypto acceleration services through various crypto-accelerators
- *Accelerator domain*, comprises two programmable multi-core accelerators (PMCAs), an 12-cores
integer cluster with Hybrid Modular Redundancy (HMR) capabilities oriented to compute intensive integer workloads such as AI, and a vectorial cluster with floating point vector processing capabilities to accelerate intensive control tasks
integer cluster with Hybrid Modular Redundancy (HMR) capabilities oriented to compute
intensive integer workloads such as AI, and a vectorial cluster with floating point vector
processing capabilities to accelerate intensive control tasks

- **Memory Domain**:
- *Dynamic SPM*: dynamically configurable scratchpad memory (SPM) for *interleaved* or
*contiguous* accesses aiming at reducing systematic bus conflicts to improve the time-predictability of the on-chip communication
*contiguous* accesses aiming at reducing systematic bus conflicts to improve the
time-predictability of the on-chip communication
- *LLC SPM*: the last-level cache (*host domain*) can be configured as SPM at runtime, as
described in Cheshire's [Architecture](https://pulp-platform.github.io/cheshire/um/arch/)
- *External DRAM*: off-chip HyperRAM (Infineon) interfaced with in-house, open-source AXI4
Hyberbus memory controller and digital PHY.
Hyberbus memory controller and digital PHY

- **Mailbox unit**
- Main communication vehicle among domains, based on an interrupt notification mechanism
Expand Down Expand Up @@ -406,29 +411,50 @@ Compared to vanilla OpenTitan, the secure domain integrated in Carfield is modif

#### Accelerator domain

To augment computational capabilities, Carfield incorporates two general-purpose accelerators
To augment computational capabilities, Carfield incorporates two PMCAs, described below. Both PMCAs
integrate DMA engines to independently fetch data from the on-chip SPM or external DRAM.

##### [HMR integer PMCA](https://github.com/pulp-platform/pulp_cluster/tree/yt/rapidrecovery)

The [hybrid modular redundancy (HMR) *integer PMCA*](https://arxiv.org/abs/2303.08706)
is specialized in executing reliable boosted Quantized Neural Network (QNN) operations, exploiting
the HMR technique for rapid fault recovery and integer arithmetic support in the ISA of the RISC-V
cores from 32-bit down to 2-bit and mixed-precision formats.

The HMR integer PMCA is configured as follows:

TODO
The [hybrid modular redundancy (HMR) *integer PMCA*](https://arxiv.org/abs/2303.08706) is
specialized in accelerating the inference of Deep Learning and Machine Learning models. The
multicore accelerator is built around 12 32-bit RISC-V cores empowered with ISA extensions, enabling
integer arithmetic from 32-bit down to 2-bit precision.

The integer PMCA does not integrate a fully-fledged FPU co-processor. Nevertheless, it features a
highly specialized domain specific architecture (DSA),
[RedMulE](https://www.sciencedirect.com/science/article/pii/S0167739X23002546), which enables fast
and energy-efficient floating-point GEMM on 16-bit and 8-bit data formats. This makes the PMCA
capable of on-chip training of generalized Deep Learning models.

As part of a MCS, the integer PMCA's general-purpose cores can be reconfigured for *redundant
execution*. A [Hybrid Modular Redundancy (HMR)](https://doi.org/10.1145/3635161) unit allows the
split/lock of the available cores in different redundant configurations during runtime, trading off
the computing performance and the fault resilience capability according to the criticality of the
application.

The PMCA can be configured in multiple redundant modes:
* **Independent:** All cores act independently with no redundancy mechanism. This configuration allows
higher performance but has no reliability.
* **Dual Modular Redundancy (DMR)**: The cores are grouped in lock-stepped pairs and rely on a
specialized hardware extension for fast fault recovery in less than 30 clock cycles in case of
fault detection. The PMCA provides the best trade-off between performance and fault recovery in
this configuration.
* **Triple Modular Redundancy (TMR)**: The cores are grouped in lock-stepped triplets and rely on
either hardware extension or software mechanisms to recover from incurring faults. The PMCA
provides the highest fault resilience in this configuration, at the cost of reduced performance.

##### [Vectorial PMCA](https://github.com/pulp-platform/spatz)

The [*vectorial PMCA*, or Spatz PMCA](https://dl.acm.org/doi/abs/10.1145/3508352.3549367) handles
vectorizable multi-format floating-point workloads.
vectorizable multi-format floating-point workloads.

It acts as a coprocessor of the [Snitch core](https://github.com/pulp-platform/snitch_cluster), a
tiny 64-bit scalar core which decodes and forwards vector instructions to the vector unit. Together
they are referred to as *Complex Cores (CCs)*.
A Spatz vector unit acts as a coprocessor of the [Snitch
core](https://github.com/pulp-platform/snitch_cluster), a tiny RV32IMA core which decodes and
forwards vector instructions to the vector unit.

The vectorial PMCA is composed by **two CCs**, each with the following configurations:
A Snitch core and a Spatz vector unit are together referred to as *Core Complex (CC)*. The vectorial
PMCA is composed by two CCs, each with the following configuration:

* 2 KiB of latch-based VRF
* 4 transprecision FPUs
Expand Down Expand Up @@ -517,8 +543,6 @@ Assuming each mailbox is identified with id `i`, the register file map reads:
The above register map can be found in the dedicated
[repository](https://github.com/pulp-platform/mailbox_uni) and is reported here for convenience.

TODO @alex96295: Add figure

## Platform control registers

PCRs provide basic system information, and control clock, reset and other functionalities of
Expand Down

0 comments on commit c870d29

Please sign in to comment.