Skip to content

Commit 7155f62

Browse files
committed
docs: add data caching and sharding
Signed-off-by: kazutoiris <[email protected]>
1 parent 959353c commit 7155f62

File tree

4 files changed

+109
-0
lines changed

4 files changed

+109
-0
lines changed

en/modules/ROOT/images/fifo-read.svg

Lines changed: 3 additions & 0 deletions
Loading

en/modules/ROOT/images/fifo-write.svg

Lines changed: 3 additions & 0 deletions
Loading

en/modules/ROOT/nav.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,5 @@
22
* xref::[Why do we ... ?]
33
** xref:why/use-antora.adoc[use Antora]
44
** xref:why/support-unaligned-narrow-transfer.adoc[support unaligned/narrow transfer]
5+
* xref::[How do we ... ?]
6+
** xref:how/implement-data-caching-and-sharding.adoc[implement data caching and sharding]
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
= How do we implement data caching and sharding
2+
3+
:toc:
4+
:icons: font
5+
:author: kazutoiris
6+
7+
== Background
8+
The AXI protocol supports **narrow transfer** (8/16/32-bit) with optional data bus widthfootnote:[AXI use `AxSIZE[2:0\]` to indicate the number of bytes transferred per transfer.], while SDRAM operates with fixed 16-bit physical interfaces. This mismatch creates two key challenges:
9+
10+
* **Alignment Overhead**: SDRAM requires 16-bit aligned address, whereas the AXI protocol allows non-aligned transfers. This misalignment necessitates manual data reordering and buffering to ensure the correct order before transferring data to SDRAM or back to the master.
11+
12+
* **Transfer Suspension**: AXI transfers can be suspended when the `ready` or `valid` signals are deasserted. However, SDRAM does not support the ability to suspend burst operations. Once a burst transaction is initiated, it processes the entire burst without any delay. Therefore, this behavior needs to be carefully considered.
13+
14+
== Analysis
15+
16+
=== Read Operation
17+
18+
In this section, we assume that the AXI data bus width is 32-bit, which means we only need to consider 8-bit, 16-bit, and 32-bit read operations.
19+
20+
. 8-bit
21+
+
22+
For 8-bit reads, in the best case, a single SDRAM read can satisfy 2 AXI transfersfootnote:aligned[If the address is aligned to 16 bits, assuming the address pattern is like `?0`.]. However, in the worst case, a single SDRAM read can only satisfy 1 AXI transfer.
23+
24+
. 16-bit
25+
+
26+
For 16-bit reads, in the best case, a single SDRAM read can satisfy 1 AXI transferfootnote:aligned[]. However, in the worst case, 2 SDRAM reads are required to satisfy 1 AXI transfer.
27+
28+
. 32-bit
29+
+
30+
For 32-bit reads, in the best case, a single SDRAM read can satisfy 1 AXI transferfootnote:[Assuming the address pattern is like `1?`.]. However, in the worst case, 2 SDRAM reads are required to satisfy 1 AXI transfer.
31+
32+
Since we can suspend the AXI read response by deasserting the `valid` signal, **as long as** we can cache the SDRAM read responses, implementing a read cache becomes straightforward.
33+
34+
[CAUTION]
35+
====
36+
Caching the SDRAM read responses means caching *all* responses, even during the CAS latencyfootnote:[For convenience, we assume the CAS latency is 3 cycles].
37+
38+
Therefore, we should stop the SDRAM burst read transaction when the AXI master `ready` signal is deasserted and reserve sufficient buffer for the CAS latency.
39+
====
40+
41+
=== Write Operation
42+
43+
Unlike read operations, write operations do not require consideration of CAS latency. As a result, it is possible to transfer both the address and data at the same time.
44+
45+
Similar to read operations, this section only considers 8-bit, 16-bit, and 32-bit read operations.
46+
47+
Due to the limitation of SDRAM`'s CKE feature, which cannot remain deasserted indefinitely, when the master suspends a transfer, we must restart the transaction. Restarting burst transfers requires re-sending the `BURST WRITE` command with data and address, making it similar to the single write operation. Therefore, the single write operation is used in this section instead.
48+
49+
. 8-bit
50+
+
51+
For 8-bit writes, typically 2 transfers are required to complete a write operation. In the best casefootnote:unaligned[If the address is unaligned to 16 bits, assuming the address pattern is like `?1`.], 1 transfer is sufficient.
52+
53+
. 16-bit
54+
+
55+
For 16-bit writes, 1 transfer can always accommodate a single write.
56+
+
57+
[TIP]
58+
====
59+
Let`'s consider two scenarios.
60+
61+
Aligned Address::
62+
The address is like `?0`. In this scenario, the data is 16-bit, and the address is aligned, allowing for a straightforward write operation.
63+
64+
Unaligned Address::
65+
The address is like `?1`. In this scenario, the data is 8-bit, so a single write is sufficient. The next address must be aligned.
66+
67+
Therefore, regardless of whether the address is aligned or not, only a single write is needed.
68+
====
69+
70+
. 32-bit
71+
+
72+
For 32-bit write, under normal circumstances, it is sufficient to accommodate 2 writes. In special casesfootnote:unaligned[], it can only support a single write.
73+
74+
[NOTE]
75+
====
76+
Since a `READ` operation is equivalent to a `STOP BURST` followed by a `READ`, and a `WRITE` can be considered equivalent to a `STOP BURST` followed by a `WRITE`. We can directly send a `WRITE` to start a new burst operation without the need to send a `STOP BURST` first.
77+
78+
However, if the master`'s `valid` signal is deasserted and we do not have enough write buffer data, we need to manually send a `STOP BURST`. In the worst-case scenario, this may result in an additional idle cycle.
79+
80+
Therefore, we still use a single write operation.
81+
====
82+
83+
== Implementation
84+
85+
Due to the CAS latency of SDRAM, we choose to separate the command and data channel. Therefore, the command channel is controlled by the state machine. The RFIFO will only connect to `DQi`, and the WFIFO will only connect to `DQo`.
86+
87+
=== Read Operation
88+
89+
We use two FIFOs to implement the read cache. RFIFO1 is a 1-depth 32-bit Bypass FIFO that is directly connected to the `R` response channel of AXI. RFIFO2 is a 4-depth 16-bit Bypass FIFO that is directly connected to the `DQi` port of SDRAM.
90+
91+
image::ROOT:fifo-read.svg[RFIFO, opts=inline]
92+
93+
Every time two 16-bit data are available in FIFO2, the data will be cached into RFIFO1. When RFIFO1 is full, the state machine will automatically switch to the `STOP` state, forcing the SDRAM to stop the burst operation. Due to the CAS latency of SDRAM, it will still send 3 cycles of read data. At this time, RFIFO2 will cache all the read data. When RFIFO1 is empty, the burst operation will be restarted.
94+
95+
=== Write Operation
96+
97+
Since the AXI protocol supports back-pressure, we can use a single FIFO to implement the write cache. The WFIFO is a 4-depth 32-bit Bypass FIFO that is directly connected to the `DQo` port of SDRAM. But we need to manually control the back-pressure in the `B` channel. Therefore, we connect two FIFOs to the `B` channel (to simulate the `almost_empty` signal).
98+
99+
image::ROOT:fifo-write.svg[WFIFO, opts=inline]
100+
101+
When WFIFO is empty or BFIFO1 is full, the state machine will automatically switch to the `STOP` state, forcing the SDRAM to stop the burst operation. When WFIFO is full, the AXI write request will be suspended through back-pressure.

0 commit comments

Comments
 (0)