More work on docs

pulp-platform · Jan 17, 2024 · c870d29 · c870d29
1 parent 56fd70f
commit c870d29
Showing 1 changed file with 47 additions and 23 deletions.
diff --git a/docs/um/arch.md b/docs/um/arch.md
@@ -7,26 +7,31 @@ different purposes in terms of functional safety and reliability, security, and
 capabilities.
 
 Carfield relies on Cheshire as its host domain, and extends its minimal SoC with additional
-interconnect ports and interrupts. 
+interconnect ports and interrupts.
 
 The above block diagram depicts a fully-featured Carfield SoC, which currently provides:
 
 - **Computing Domain**:
 	- *Host domain* (Cheshire), a Linux-capable RV64 system based on dual-core CVA6 processors with
 	  self-invalidation coherency mechanism
-	- *Safe domain*, a Triple-Core-Lockstep (TCLS) RV32 microcontroller system based on CV32E40P, with fast interrupt
-	  handling through the RISC-V CLIC
-	- *Secure domain*, a Dual-Core-Lockstep (DCLS) RV32 Hardware Root of Trust (HW RoT) systems that ensures the secure boot for the whole platform, serves as secure monitor for the entire system, and provides crypto acceleration services through various crypto-accelerators
+	- *Safe domain*, a Triple-Core-Lockstep (TCLS) RV32 microcontroller system based on CV32E40P,
+	  with fast interrupt handling through the RISC-V CLIC
+	- *Secure domain*, a Dual-Core-Lockstep (DCLS) RV32 Hardware Root of Trust (HW RoT) systems that
+      ensures the secure boot for the whole platform, serves as secure monitor for the entire
+      system, and provides crypto acceleration services through various crypto-accelerators
 	- *Accelerator domain*, comprises two programmable multi-core accelerators (PMCAs), an 12-cores
-	  integer cluster with Hybrid Modular Redundancy (HMR) capabilities oriented to compute intensive integer workloads such as AI,  and a vectorial cluster with floating point vector processing capabilities to accelerate intensive control tasks
+	  integer cluster with Hybrid Modular Redundancy (HMR) capabilities oriented to compute
+	  intensive integer workloads such as AI, and a vectorial cluster with floating point vector
+	  processing capabilities to accelerate intensive control tasks
 
 - **Memory Domain**:
 	- *Dynamic SPM*: dynamically configurable scratchpad memory (SPM) for *interleaved* or
-	  *contiguous* accesses aiming at reducing systematic bus conflicts to improve the time-predictability of the on-chip communication
+	  *contiguous* accesses aiming at reducing systematic bus conflicts to improve the
+	  time-predictability of the on-chip communication
 	- *LLC SPM*: the last-level cache (*host domain*) can be configured as SPM at runtime, as
 	  described in Cheshire's [Architecture](https://pulp-platform.github.io/cheshire/um/arch/)
 	- *External DRAM*: off-chip HyperRAM (Infineon) interfaced with in-house, open-source AXI4
-	  Hyberbus memory controller and digital PHY.
+	  Hyberbus memory controller and digital PHY
 
 - **Mailbox unit**
 	- Main communication vehicle among domains, based on an interrupt notification mechanism
@@ -406,29 +411,50 @@ Compared to vanilla OpenTitan, the secure domain integrated in Carfield is modif
 
 #### Accelerator domain
 
-To augment computational capabilities, Carfield incorporates two general-purpose accelerators
+To augment computational capabilities, Carfield incorporates two PMCAs, described below. Both PMCAs
+integrate DMA engines to independently fetch data from the on-chip SPM or external DRAM.
 
 ##### [HMR integer PMCA](https://github.com/pulp-platform/pulp_cluster/tree/yt/rapidrecovery)
 
-The [hybrid modular redundancy (HMR) *integer PMCA*](https://arxiv.org/abs/2303.08706)
-is specialized in executing reliable boosted Quantized Neural Network (QNN) operations, exploiting
-the HMR technique for rapid fault recovery and integer arithmetic support in the ISA of the RISC-V
-cores from 32-bit down to 2-bit and mixed-precision formats.
-
-The HMR integer PMCA is configured as follows:
-
-TODO
+The [hybrid modular redundancy (HMR) *integer PMCA*](https://arxiv.org/abs/2303.08706) is
+specialized in accelerating the inference of Deep Learning and Machine Learning models. The
+multicore accelerator is built around 12 32-bit RISC-V cores empowered with ISA extensions, enabling
+integer arithmetic from 32-bit down to 2-bit precision. 
+
+The integer PMCA does not integrate a fully-fledged FPU co-processor. Nevertheless, it features a
+highly specialized domain specific architecture (DSA),
+[RedMulE](https://www.sciencedirect.com/science/article/pii/S0167739X23002546), which enables fast
+and energy-efficient floating-point GEMM on 16-bit and 8-bit data formats. This makes the PMCA
+capable of on-chip training of generalized Deep Learning models.
+
+As part of a MCS, the integer PMCA's general-purpose cores can be reconfigured for *redundant
+execution*. A [Hybrid Modular Redundancy (HMR)](https://doi.org/10.1145/3635161) unit allows the
+split/lock of the available cores in different redundant configurations during runtime, trading off
+the computing performance and the fault resilience capability according to the criticality of the
+application.
+
+The PMCA can be configured in multiple redundant modes:
+* **Independent:** All cores act independently with no redundancy mechanism. This configuration allows
+  higher performance but has no reliability.
+* **Dual Modular Redundancy (DMR)**: The cores are grouped in lock-stepped pairs and rely on a
+  specialized hardware extension for fast fault recovery in less than 30 clock cycles in case of
+  fault detection. The PMCA provides the best trade-off between performance and fault recovery in
+  this configuration.
+* **Triple Modular Redundancy (TMR)**: The cores are grouped in lock-stepped triplets and rely on
+  either hardware extension or software mechanisms to recover from incurring faults. The PMCA
+  provides the highest fault resilience in this configuration, at the cost of reduced performance.
 
 ##### [Vectorial PMCA](https://github.com/pulp-platform/spatz)
 
 The [*vectorial PMCA*, or Spatz PMCA](https://dl.acm.org/doi/abs/10.1145/3508352.3549367) handles
-vectorizable multi-format floating-point workloads.
+vectorizable multi-format floating-point workloads. 
 
-It acts as a coprocessor of the [Snitch core](https://github.com/pulp-platform/snitch_cluster), a
-tiny 64-bit scalar core which decodes and forwards vector instructions to the vector unit. Together
-they are referred to as *Complex Cores (CCs)*.
+A Spatz vector unit acts as a coprocessor of the [Snitch
+core](https://github.com/pulp-platform/snitch_cluster), a tiny RV32IMA core which decodes and
+forwards vector instructions to the vector unit.
 
-The vectorial PMCA is composed by **two CCs**, each with the following configurations:
+A Snitch core and a Spatz vector unit are together referred to as *Core Complex (CC)*. The vectorial
+PMCA is composed by two CCs, each with the following configuration:
 
 * 2 KiB of latch-based VRF
 * 4 transprecision FPUs
@@ -517,8 +543,6 @@ Assuming each mailbox is identified with id `i`, the register file map reads:
 The above register map can be found in the dedicated
 [repository](https://github.com/pulp-platform/mailbox_uni) and is reported here for convenience.
 
-TODO @alex96295: Add figure
-
 ## Platform control registers
 
 PCRs provide basic system information, and control clock, reset and other functionalities of