Skip to content

Commit

Permalink
Deployed 8464418 with MkDocs version: 1.5.3
Browse files Browse the repository at this point in the history
  • Loading branch information
alex96295 committed Jan 6, 2024
1 parent b1daabb commit a8d64bd
Show file tree
Hide file tree
Showing 4 changed files with 169 additions and 77 deletions.
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

Binary file modified sitemap.xml.gz
Binary file not shown.
121 changes: 82 additions & 39 deletions um/arch/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -623,27 +623,36 @@
</li>

<li class="md-nav__item">
<a href="#dynamic-scratchpad-memory-spm" class="md-nav__link">
<a href="#interconnect" class="md-nav__link">
<span class="md-ellipsis">
Dynamic scratchpad memory (SPM)
Interconnect
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#interconnect" class="md-nav__link">
<a href="#dram" class="md-nav__link">
<span class="md-ellipsis">
Interconnect
DRAM
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#dram" class="md-nav__link">
<a href="#dynamic-scratchpad-memory-spm" class="md-nav__link">
<span class="md-ellipsis">
DRAM
Dynamic scratchpad memory (SPM)
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#mailbox-unit" class="md-nav__link">
<span class="md-ellipsis">
Mailbox unit
</span>
</a>

Expand Down Expand Up @@ -842,27 +851,36 @@
</li>

<li class="md-nav__item">
<a href="#dynamic-scratchpad-memory-spm" class="md-nav__link">
<a href="#interconnect" class="md-nav__link">
<span class="md-ellipsis">
Dynamic scratchpad memory (SPM)
Interconnect
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#interconnect" class="md-nav__link">
<a href="#dram" class="md-nav__link">
<span class="md-ellipsis">
Interconnect
DRAM
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#dram" class="md-nav__link">
<a href="#dynamic-scratchpad-memory-spm" class="md-nav__link">
<span class="md-ellipsis">
DRAM
Dynamic scratchpad memory (SPM)
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#mailbox-unit" class="md-nav__link">
<span class="md-ellipsis">
Mailbox unit
</span>
</a>

Expand Down Expand Up @@ -958,7 +976,26 @@ <h1 id="architecture">Architecture</h1>
</ul>
</li>
<li>
<p><strong>Peripherals</strong></p>
<p><strong>Dynamic SPM</strong>:</p>
<ul>
<li>Dynamically configurable scratchpad memory
for <em>interleaved</em> or <em>contiguous</em> accesses</li>
</ul>
</li>
<li>
<p><strong>DRAM</strong>:</p>
<ul>
<li>TODO</li>
</ul>
</li>
<li>
<p><strong>Mailbox unit</strong></p>
<ul>
<li>Main communication vehicle among domains, based on an interrupt notification mechanism</li>
</ul>
</li>
<li>
<p><strong>Peripherals</strong>:</p>
<ul>
<li>Generic timers</li>
<li>PWM timers</li>
Expand Down Expand Up @@ -2764,17 +2801,6 @@ <h4 id="vectorial-pmca"><a href="https://github.com/pulp-platform/spatz">Vectori
vectorizable multi-format floating-point workloads (down to FP8).</p>
<p>The Spatz PMCA is configured as follows:</p>
<p>TODO</p>
<h2 id="dynamic-scratchpad-memory-spm"><a href="https://github.com/pulp-platform/dyn_spm">Dynamic scratchpad memory (SPM)</a></h2>
<p>The dynamic SPM features dynamically switching address mapping policy. It manages the following
features:</p>
<ul>
<li>Two AXI subordinate ports</li>
<li>Two address mapping modes: <em>interleaved</em> and <em>contiguous</em></li>
<li>4 address spaces, 2 for each port. The address space is used to select the AXI port to use, and
the mapping mode</li>
<li>Every address space points to the same physical SRAM through a low-latency matrix crossbar</li>
<li>ECC-equipped memory banks</li>
</ul>
<h2 id="interconnect">Interconnect</h2>
<p>The interconnect is composed of a main <a href="https://github.com/pulp-platform/axi">AXI4</a> matrix (or
crossbar) with AXI5 atomic operations (ATOPs) support. The crossbar extends Cheshire's with
Expand All @@ -2788,6 +2814,19 @@ <h2 id="dram">DRAM</h2>
link</a> to connect to external HyperRAM modules. The
HyperBus interface has a configurable number of physical HyperRAM chips it is attached to, and can
support chips with different densities (from 8MiB to 64MiB per chip).</p>
<h2 id="dynamic-scratchpad-memory-spm"><a href="https://github.com/pulp-platform/dyn_spm">Dynamic scratchpad memory (SPM)</a></h2>
<p>The dynamic SPM features dynamically switching address mapping policy. It manages the following
features:</p>
<ul>
<li>Two AXI subordinate ports</li>
<li>Two address mapping modes: <em>interleaved</em> and <em>contiguous</em></li>
<li>4 address spaces, 2 for each port. The address space is used to select the AXI port to use, and
the mapping mode</li>
<li>Every address space points to the same physical SRAM through a low-latency matrix crossbar</li>
<li>ECC-equipped memory banks</li>
</ul>
<h2 id="mailbox-unit">Mailbox unit</h2>
<p>TODO</p>
<h2 id="peripherals">Peripherals</h2>
<p>Carfield enhances Cheshire's peripheral subsystem with additional capabilities.</p>
<p>An external AXI manager port is attached to the matrix crossbar. The 64-bit data, 48-bit address AXI
Expand Down Expand Up @@ -2817,21 +2856,25 @@ <h3 id="generic-and-advanced-timer">Generic and advanced timer</h3>
</ul>
<p>For more information, read the dedicated
<a href="https://github.com/pulp-platform/timer_unit/blob/master/doc/TIMER_UNIT_reference.xlsx">documentation</a>.</p>
<p>The <a href="https://github.com/pulp-platform/apb_adv_timer"><em>advanced timer</em></a> manages the following features:
* 4 timers with 4 output signal channels each.
* PWM generation functionality
* Multiple trigger input sources:
- output signal channels of all timers
- 32 GPIOs
- Real-time clock (RTC) at crystal frequency (32kHz) or higher
- FLL/PLL clock
In Carfield, we rely on a RTC.
* Configurable input trigger modes
* Configurable prescaler for each timer
* Configurable counting mode for each timer
* Configurable channel threshold action for each timer
* 4 configurable output events
* Configurable clock gating of each timer</p>
<p>The <a href="https://github.com/pulp-platform/apb_adv_timer"><em>advanced
timer</em></a>
manages the following features:</p>
<ul>
<li>4 timers with 4 output signal channels each</li>
<li>PWM generation functionality</li>
<li>Multiple trigger input sources:</li>
<li>output signal channels of all timers</li>
<li>32 GPIOs</li>
<li>Real-time clock (RTC) at crystal frequency (32kHz) or higher</li>
<li>FLL/PLL clock
In Carfield, we rely on a RTC.</li>
<li>Configurable input trigger modes</li>
<li>Configurable prescaler for each timer</li>
<li>Configurable counting mode for each timer</li>
<li>Configurable channel threshold action for each timer</li>
<li>4 configurable output events</li>
<li>Configurable clock gating of each timer</li>
</ul>
<p>For more information, read the dedicated
<a href="https://github.com/pulp-platform/apb_adv_timer/blob/master/doc/APB_ADV_TIMER_reference.xlsx">documentation</a>.</p>
<h3 id="watchdog-timer">Watchdog timer</h3>
Expand Down
123 changes: 86 additions & 37 deletions um/sw/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -587,13 +587,26 @@

<h1 id="software-stack">Software Stack</h1>
<p>Carfield's Software Stack is provided in the <code>sw/</code> folder, organized as follows:</p>
<pre><code>TODO add tree
<pre><code>sw
├── boot
├── include
├── lib
├── link
├── sw.mk
├── tests
   ├── bare-metal
   │   ├── hostd
   │   ├── pulpd
   │   ├── safed
   │   ├── secd
   │   └── spatzd
   └── linux
</code></pre>
<p>Employing Cheshire as <em>host domain</em>, Carfield's software stack is largely based on, and built on top
of, <a href="https://pulp-platform.github.io/cheshire/um/sw/">Cheshire's</a>.</p>
<p>This means that it shares the same:</p>
<ul>
<li>Baremetal programs (BMPs) build flow</li>
<li>Baremetal programs (BMPs) build flow and structure</li>
<li>Boot Flow (<em>passive</em> and <em>autonomous</em> boot)</li>
<li>Boot ROM</li>
<li>Zero-Stage Loader</li>
Expand Down Expand Up @@ -629,57 +642,93 @@ <h2 id="baremetal-programs-bmps">Baremetal programs (BMPs)</h2>
<pre><code>make car-sw-build
</code></pre>
<p>builds program binaries in ELF format for each domain, which can be used with the simulation methods
supported by the platform, as descrbied in <a href="../../tg/sim/">Simulation</a>.
As in Cheshire, Carfield programs can be created to be executed from several memory locations:</p>
supported by the platform, as described in <a href="../../tg/sim/">Simulation</a> or on FPGA as described in
<a href="../../tg/xilinx/">Xilinx FPGAs</a>.</p>
<hr />
<p>As in Cheshire, Carfield programs can be created to be executed from several memory locations:</p>
<ul>
<li>Dynamic SPM</li>
<li>LLC SPM, when the LLC is configured as such (runtime configurable)</li>
<li>DRAM, namely the HyperRAM</li>
<li>Dynamic SPM (<code>l2</code>): the linkerscript is provided in Carfield's <code>sw/link/</code> folder, since Dynamic
SPM is not integrated in the minimal Cheshire</li>
<li>LLC SPM (<code>spm</code>): valid when the LLC is configured as such. In Carfield, half of the LLC is
configured as SPM from the boot ROM during system bringup, as this is the default behavior in
Cheshire.</li>
<li>DRAM (<code>dram</code>): the HyperRAM</li>
</ul>
<p>For example, to build a specific BMP (here <code>sw/tests/bare-metal/hostd/helloworld.c</code> to be run on
Cheshire) executing from the Dynamic SPM, run:</p>
<pre><code>make sw/tests/bare-metal/hostd/helloworld.car.l2.elf
</code></pre>
<p>To create the same program executing from DRAM, <code>sw/tests/bare-metal/hostd/helloworld.car.dram.elf</code>
can instead be built from the same source. Depending on their assumptions and behavior, not all
programs may be built to execute from both locations.</p>
<h2 id="linux-programs">Linux programs</h2>
<p>When executing <em>host domain</em> programs in Linux (on FPGA/ASIC targets) that require access to memory
mapped components of other domains, manual intervention is needed to map virtual to physical
addresses in SW, since domains different than the host <em>currently</em> lack support for HW-based virtual
memory translation.</p>
<p>Support for this is provided in the current SW runtime, hence transparent to the user. Test programs
targeting Linux are are found under <code>sw/tests/linux/&lt;domain&gt;</code>.</p>
mapped components of other domains, SW intervention is needed to map virtual to physical addresses,
since domains different than the host <em>currently</em> lack support for HW-based virtual memory
translation.</p>
<p>In the current SW stack, this mapping is already provided and hence transparent to the user. Test
programs targeting Linux that require it are located in different folder, <code>sw/tests/linux/&lt;domain&gt;</code>.</p>
<h1 id="inter-domain-offload">Inter-domain offload</h1>
<p>Offload of programs to Carfield domains involves:</p>
<ul>
<li>An <em>offloader</em>, typically one of the two controllers, namely the <em>host</em> or <em>safe</em> domains</li>
<li>A <em>target device</em>, namely the <em>accelerator domain</em>. The <em>safe domain</em> can play the role of target
device when offloaded RTOS payloads from the <em>host domain</em>.</li>
<li>An <em>offloader</em>, typically one of the two controllers, i.e., the <em>host</em> or <em>safe</em> domains</li>
<li>A <em>target device</em>, typically the <em>accelerator domain</em>. The <em>safe domain</em> can also play the role of
target device when offloaded RTOS payloads from the <em>host domain</em>.</li>
</ul>
<p>Programs can be offloaded with:</p>
<ul>
<li>Simple baremetal runtime, recommended for regression tests use cases that are simple enough to be
executed with cycle-accurate RTL simulations. For instance, this can be the case of dynamic timing
analysis (DTA) carried out during an ASIC development cycle.</li>
<li>The <a href="https://www.openmp.org/">OpenMP</a> API, recommended when running SW on a FPGA or, eventually,
ASIC version of Carfield, because of the ready-to-use OS support (currently, Linux). Usage of the
OpenMP API with bare-metal (non OS-directed) SW can be supported, but is mostly suited for
heterogeneous embedded systems with highly constrained resources</li>
<li>
<p><strong>Simple baremetal offload (BMO)</strong>, recommended for regression tests use cases that are simple
enough to be executed with cycle-accurate RTL simulations. For instance, this can be the case of
dynamic timing analysis (DTA) carried out during an ASIC development cycle.</p>
</li>
<li>
<p><strong>The <a href="https://www.openmp.org/">OpenMP</a> API</strong>, recommended when developing SW for Carfield on a
FPGA or, eventually, ASIC implementing Carfield, because of the ready-to-use OS support
(currently, Linux). Usage of the OpenMP API with non OS-directed (baremetal) SW can be supported,
but is mostly suited for heterogeneous embedded systems with highly constrained resources</p>
</li>
</ul>
<p>In the following, we briefly describe both.</p>
<h2 id="baremetal-offload-bmo">Baremetal offload (BMO)</h2>
<p>The ELF of a <em>target device</em> is embedded into the <em>offloader</em> ELF as a header file. The latter
contains the memory preload process of the target device in terms of R/W sequence from the host
core, or as a DMA memcopy.</p>
<p>In addition, the offloader takes care of initializing and launching the <em>target device</em> execution in
SW.</p>
<p>The <code>sw.mk</code> make fragment automatically converts a target device compiled ELF into a header file via
an util script, and generates an offloader ELF embedding the header file of the domain. This is
iteratively done for each test present <em>in situ</em> within each target device repository.</p>
<p>For instance, assume the <em>host domain</em> as offloader and <em>integer PMCA</em> as target device:</p>
<h2 id="baremetal-offload">Baremetal offload</h2>
<p>For BMO, the offloader takes care of bootstrapping the target device ELF in the correct memory
location, initializing the target and launching its execution through a simple ELF Loader. The ELF
Loader source code is located in the offloader's SW directory, and follows a naming convention:</p>
<pre><code>&lt;target_device&gt;_offloader_&lt;blocking | non_blocking&gt;.c
</code></pre>
<p>The target device's ELF is included into the offloader's ELF Loader as a <em>header file</em>. The target
device's ELF sections are first pre-processed offline to extract instruction addresses.The resulting
header file provides the ELF loading process at the selected memory location. The loading process
can be carried out by the offloader as R/W sequences, or deferred to a DMA-driven memcopy. In
addition, the offloader takes care of bootstrapping the target device, i.e. initializing it and
launching its execution.</p>
<p>Upon target device completion, the offloader:</p>
<ul>
<li>Is asynchronously notified of the event via a mailboxe interrupt; BMOs of this kind are called
<em>non-blocking</em></li>
<li>Sychronously polls a specific register to catch the completion; BMOs of this kind are called
<em>blocking</em></li>
</ul>
<p>Currently, <em>blocking BMO</em> is implemented.</p>
<hr />
<p>As an example, assume the <em>host domain</em> as offloader and the <em>integer PMCA</em> as target device.</p>
<ol>
<li>First, a header file is generated out of each test available in the integer PMCA repository</li>
<li>Second, the offloader source code is compiled by subsequently including each header file from
each integer PMCA test</li>
<li>The host domain ELF Loader is included in <code>sw/tests/bare-metal/hostd</code></li>
<li>A header file is generated out of each regression test available in the integer PMCA repository.
For this example, the resulting header files are included in <code>sw/tests/bare-metal/pulpd</code></li>
<li>The final ELF executed by the offloader is created by subsequently including each header file
from each integer PMCA regression test</li>
</ol>
<p>The resulting offloader ELF's name reads:</p>
<pre><code>&lt;target_device&gt;_offloader_&lt;blocking | non_blocking&gt;.&lt;target_device_test_name&gt;.car.&lt;l2 | spm | dram&gt;.elf
</code></pre>
<p>According to the memory location where the BMP will be executed.</p>
<p>The final offloader ELF can be preloaded with simulation methods described in the
<a href="../../tg/sim/">Simulation</a> section.</p>
<a href="../../tg/sim/">Simulation</a> section, and can be built again as explained above.</p>
<hr />
<p><em>Note for the reader</em></p>
<p>BMO is in general not recommended for developing SW for Carfield, as it was introduced during ASIC
development cycle and can be an effective litmus test to find and fix HW bugs.</p>
development cycle and can be an effective litmus test to find and fix HW bugs, or during DTA.</p>
<p>For SW development on Carfield and in particular domain-driven offload, it is recommended to use
OpenMP offload on FPGA/ASIC, described below.</p>
<h2 id="openmp-offload-recommended-use-on-fpgaasic">OpenMP offload (recommended: use on FPGA/ASIC)</h2>
Expand Down

0 comments on commit a8d64bd

Please sign in to comment.