forked from riscv-non-isa/riscv-trace-spec
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathingressPort.tex
executable file
·752 lines (659 loc) · 39.2 KB
/
ingressPort.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
\chapter{Hart to encoder interface} \label{Interface}
\section{Instruction Trace Interface requirements}\label{sec:InstructionInterfaceRequirements}
This section describes in general terms the information which must be
passed from the RISC-V hart to the trace encoder for the purposes of
Instruction Trace, and distinguishes between what is mandatory, and
what is optional.
The following information is mandatory:
\begin{itemize}
\item The number of instructions that are being retired;
\item Whether there has been an exception or interrupt, and if so the cause
(from the \textbf{\textit{ucause/scause/vscause/mcause}} CSR)
and trap value (from the \textbf{\textit{utval/stval/vstval/mtval}} CSR).
The register set to output should be the set that is updated as a result of the exception (i.e.
the set associated with the privilege level immediately following the exception);
\item The current privilege level of the RISC-V hart;
\item The \textit{instruction\_type} of retired instructions for:
\begin{itemize}
\item Jumps with a target that cannot be inferred from the source code;
\item Taken and nontaken branches;
\item Return from exception or interrupt (\textbf{\textit{*ret}} instructions).
\end{itemize}
\item The \textit{instruction\_address} for:
\begin{itemize}
\item Jumps with a target that \textit{cannot} be inferred from the source code;
\item The instruction retired immediately after a jump with a target that \textit{cannot} be inferred
from the source code (also referred to as the target or destination of the jump);
\item Taken and nontaken branches;
\item The last instruction retired before an exception or interrupt;
\item The first instruction retired following an exception or interrupt;
\item The last instruction retired before a privilege change;
\item The first instruction retired following a privilege change.
\end{itemize}
\end{itemize}
The following information is optional:
\begin{itemize}
\item Context or Time information:
\begin{itemize}
\item The context and/or hart ID and/or time;
\item The type of action to take when context or time data changes.
\end{itemize}
\item The \textit{instruction\_type} of instructions for:
\begin{itemize}
\item Calls with a target that \textit{cannot} be inferred from the source code;
\item Calls with a target that \textit{can} be inferred from the source code;
\item Tail-calls with a target that \textit{cannot} be inferred from the source code;
\item Tail-calls with a target that \textit{can} be inferred from the source code;
\item Returns with a target that \textit{cannot} be inferred from the source code;
\item Returns with a target that \textit{can} be inferred from the source code;
\item Co-routine swap;
\item Jumps which don't fit any of the above classifications with a target that \textit{cannot} be inferred from the source code;
\item Jumps which don't fit any of the above classifications with a target that \textit{can} be inferred from the source code.
\end{itemize}
\item If context or time is supported then the \textit{instruction\_address} for:
\begin{itemize}
\item The last instruction retired before a context or a time change;
\item The first instruction retired following a context or time change.
\end{itemize}
\item Whether jump targets are sequentially inferable or not.
\end{itemize}
The mandatory information is the bare-minimum required to implement the branch trace algorithm outlined in Chapter~\ref{Algorithm}.
The optional information facilitates alternative or improved trace algorithms:
\begin{itemize}
\item Implicit return mode (see Section~\ref{sec:implicit-return}) requires the encoder to keep track of the number of nested
function calls, and to do this it must be aware of all calls and returns regardless of whether the target can be inferred or not;
\item A simpler algorithm useful for basic code profiling would only report function calls and returns, again
regardless of whether the target can be inferred or not;
\item Branch prediction techniques can be used to further improve the encoder efficiency, particularly for loops
(see Section~\ref{sec:branch-prediction}). This requires the encoder to be aware of the address of all branches, whether
they are taken or not.
\item Uninferable jumps can be treated as inferable (which don't need to be reported in the trace output) if both the jump
and the preceding instruction which loads the target into a register have been traced.
\end{itemize}
\subsection{Jump classification and target inference} \label{Jump Classes}
Jumps are classified as \textit{inferable}, or \textit{uninferable}. An \textit{inferable} jump has a target which can be
deduced from the binary executable or representation thereof (e.g. ELF). For the purposes of this specification, the following
strict definition applies:
If the target of a jump is supplied via a constant embedded within the jump opcode, it is classified as \textit{inferable}.
Jumps which are not \textit{inferable} are by definition \textit{uninferable}.
However, there are some jump targets which can still be deduced from the binary executable by considering pairs of instructions
even though by the above definition they are classified as uninferable. Specifically, jump targets that are supplied via
\begin{itemize}
\item an \textbf{\textit{lui}} or \textbf{\textit{c.lui}} (a register which contains a constant), or
\item an \textbf{\textit{auipc}} (a register which contains a constant offset from the PC).
\end{itemize}
Such jump targets are classified as \textit{sequentially inferable} if the pair of instructions are retired consecutively
(i.e. the \textbf{\textit{auipc}}, \textbf{\textit{lui}} or \textbf{\textit{c.lui}} immediately precedes the jump). Note:
the restriction that the instructions are retired consecutively is necessary in order to minimize the additional signalling
needed between the hart and the encoder, and should have a minimal impact on trace efficiency as it is anticipated that
consecutive execution will be the norm. Support for sequentially inferable jumps is optional.
Jumps may optionally be further classified according to the recommended calling convention:
\begin{itemize}
\item \textit{Calls}:
\begin{itemize}
\item \textbf{\textit{jal}} x1;
\item \textbf{\textit{jal}} x5;
\item \textbf{\textit{jalr}} x1, rs where rs != x5;
\item \textbf{\textit{jalr}} x5, rs where rs != x1;
\item \textbf{\textit{c.jalr}} rs1 where rs1 != x5;
\item \textbf{\textit{c.jal}}.
\end{itemize}
\item \textit{Tail-calls}:
\begin{itemize}
\item \textbf{\textit{jal}} x0;
\item \textbf{\textit{c.j}};
\item \textbf{\textit{jalr}} x0, rs where rs != x1 and rs != x5;
\item \textbf{\textit{c.jr}} rs1 where rs1 != x1 and rs1 != x5.
\end{itemize}
\item \textit{Returns}:
\begin{itemize}
\item \textbf{\textit{jalr}} rd, rs where (rs == x1 or rs == x5) and rd != x1 and rd != x5;
\item \textbf{\textit{c.jr}} rs1 where rs1 == x1 or rs1 == x5.
\end{itemize}
\item \textit{Co-routine swap}:
\begin{itemize}
\item \textbf{\textit{jalr}} x1, x5;
\item \textbf{\textit{jalr}} x5, x1;
\item \textbf{\textit{c.jalr}} x5.
\end{itemize}
\item \textit{Other}:
\begin{itemize}
\item \textbf{\textit{jal}} rd where rd != x1 and rd != x5;
\item \textbf{\textit{jalr}} rd, rs where rs != x1 and rs != x5 and rd != x0 and rd != x1 and rd != x5.
\end{itemize}
\end{itemize}
\subsection{Relationship between RISC-V core and the encoder} \label{sec:relationship}
The encoder is intended to encode the instructions executed on a single hart.
It is however commonplace for a RISC-V core to contain multiple harts. This can be
supported by the core in several different ways:
\begin{itemize}
\item Implement a separate instance of the interface per hart. Each instance can be connected
to a separate encoder instance, allowing all harts to be traced concurrently. Alternatively,
external muxing may be used in conjunction with a single encoder in order to trace one particular
hart at a time;
\item Implement a singe interface for the core, with muxing inside the core to select which hart to
connect to the interface.
\end{itemize}
(Whilst it is technically feasible to use a single encoder with multiple harts operating
in a fine-grained multi-threaded configuration, the frequent context changes that would occur
as a result of thread-switching would result in extremely poor encoding efficiency, and so
this configuration is not recommended.)
\section{Instruction Trace Interface}\label{sec:InstructionTraceInterface}
This section describes the interface between a RISC-V hart and the
trace encoder that conveys the information described in the section ~\ref{sec:InstructionInterfaceRequirements}.
Signals are assigned to one of the following groups:
\begin{itemize}
\item M: Mandatory. The interface must include an instance of this signal.
\item O: Optional. The interface may include an instance of this signal.
\item MR: Mandatory, may be replicated. For harts that can retire a maximum of N "special" instructions per clock
cycle, the interface must include N instances of this signal.
\item OR: Optional, may be replicated. For harts that can retire a maximum of N "special" per clock
cycle, the interface must include zero or N instances of this signal.
\item BR: Block, may be replicated. Mandatory for harts that can retire multiple instructions in a block.
Replication as per OR. If omitted, the interface must include SR group signals instead.
\item SR: Single, may be replicated. Mandatory for harts that can only retire one instruction in a block.
Replication as per OR (see section~\ref{sec:alt-multi}). If omitted, the interface must include BR
group signals instead.
\end{itemize}
"Special" instructions are those that require \textbf{itype} to be non-zero.
\begin{table}[htp]
\centering
\caption{Instruction interface signals}
\label{tab:common-ingress}
\begin{tabulary}{\textwidth}{|l|l|p{80mm}|}
\hline
\textbf{Signal} & \textbf{Group} & \textbf{Function} \\
\hline
\textbf{itype}[\textit{itype\_width\_p}-1:0] & MR & Termination type of the instruction block. Encoding given in Table~\ref{tab:itype}
(see Section~\ref{Jump Classes} for definitions of codes 6 - 15).\\
\hline
\textbf{cause}[\textit{ecause\_width\_p}-1:0] & M & Exception or interrupt cause (\textbf{\textit{ucause/scause/ vscause/mcause}}).
Ignored unless \textbf {itype}=1 or 2.\\
\hline
\textbf{tval}[\textit{iaddress\_width\_p}-1:0] & M & The associated trap value, e.g. the
faulting virtual address for address exceptions, as would be
written to the \textbf{utval/stval/vstval/mtval} CSR. Future optional extensions may define \textbf{tval}
to provide ancillary information in cases where it currently supplies zero.
Ignored unless \textbf{itype}=1.\\
\hline
\textbf{priv}[\textit{privilege\_width\_p}-1:0] & M & Privilege level for all instructions retired on this cycle.
Encoding given in Table~\ref{tab:priv}. Codes 4-7 optional.\\
\hline
\textbf{iaddr}[\textit{iaddress\_width\_p}-1:0] & MR & The address of the 1st instruction retired in this block.
Invalid if \textbf{iretire}=0 unless \textbf{itype}=1, in which case it indicates the address of the
instruction which incurred the exception. \\
\hline
\textbf{context}[\textit{context\_width\_p}-1:0] & O & Context for all instructions retired on this cycle.\\
\hline
\textbf{time}[\textit{time\_width\_p}-1:0] & O & Time generated by the core.\\
\hline
\textbf{ctype}[\textit{ctype\_width\_p}-1:0] & O & Reporting behavior for \textbf{context}.
Encoding given in Table~\ref{tab:context-type}. Codes 2-3 optional.\\
\hline
\textbf{sijump} & OR & If \textbf{itype} indicates that this block ends with an uninferable discontinuity, setting this signal to 1
indicates that it is sequentially inferable and may be treated as inferable by the encoder if the preceding
\textbf{\textit{auipc}}, \textbf{\textit{lui}} or \textbf{\textit{c.lui}} has been traced.
Ignored for \textbf{itype} codes other than 6, 8, 10, 12 or 14.\\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Instruction interface signals - multiple retirement per block}
\label{tab:multi-ingress}
\begin{tabulary}{\textwidth}{|l|l|p{80mm}|}
\hline
\textbf{Signal} & \textbf{Group} & \textbf{Function} \\
\hline
\textbf{iretire}[\textit{iretire\_width\_p}-1:0] & BR & Number of halfwords represented by instructions retired in this block.\\
\hline
\textbf{ilastsize}[\textit{ilastsize\_width\_p}-1:0] & BR & The size of the last retired instruction is 2\textsuperscript{\textbf{ilastsize}} half-words.\\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Instruction interface signals - single retirement per block}
\label{tab:single-ingress}
\begin{tabulary}{\textwidth}{|l|l|p{80mm}|}
\hline
\textbf{Signal} & \textbf{Group} & \textbf{Function} \\
\hline
\textbf{iretire}[0:0] & SR & Number of instructions retired in this block (0 or 1).\\
\hline
\textbf{ilastsize}[\textit{ilastsize\_width\_p-1}:0] & SR & The size of the retired instruction is 2\textsuperscript{\textbf{ilastsize}} half-words.\\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Instruction Type (\textbf{itype}) encoding}
\label{tab:itype}
\begin{tabulary}{\textwidth}{|l|p{140mm}|}
\hline
\textbf{Value} & \textbf{Description} \\
\hline
0 & Final instruction in the block is none of the other named \textbf{itype} codes\\
1 & Exception. An exception that traps occurred following the final retired instruction in the block\\
2 & Interrupt. An interrupt that traps occurred following the final retired instruction in the block\\
3 & Exception or interrupt return\\
4 & Nontaken branch\\
5 & Taken branch\\
6 & Uninferable jump if \textit{itype\_width\_p} is 3, reserved otherwise\\
7 & reserved\\
8 & Uninferable call\\
9 & Inferrable call\\
10 & Uninferable tail-call\\
11 & Inferrable tail-call\\
12 & Co-routine swap\\
13 & Return\\
14 & Other uninferable jump\\
15 & Other inferable jump\\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Privilege level (\textbf{priv}) encoding}
\label{tab:priv}
\begin{tabulary}{\textwidth}{|l|p{140mm}|}
\hline
\textbf{Value} & \textbf{Description} \\
\hline
0 & U\\
1 & S/HS\\
2 & reserved\\
3 & M\\
4 & D (debug mode)\\
5 & VU\\
6 & VS\\
7 & reserved\\
\hline
\end{tabulary}
\end{table}
Tables~\ref{tab:common-ingress} and ~\ref{tab:multi-ingress}
list the signals in the interface designed to efficiently support retirement of multiple
instructions per cycle. The following discussion describes the multiple-retirement behavior.
However, for harts that can only retire one instruction at a time, the signalling can be
simplified, and this is discussed subsequently in Section~\ref{sec:single-retire}.
The information presented in a block represents a contiguous
block of instructions starting at \textbf{iaddr}, all of which retired
in the same cycle. Note if \textbf{itype} is 1 or 2 (indicating an
exception or an interrupt), the number of instructions retired may be
zero. \textbf{cause} and \textbf{tval} are only defined if
\textbf{itype} is 1 or 2. If \textbf{iretire}=0 and \textbf{itype}=0,
the values of all other signals are undefined.
\textbf{iretire} contains the number of (16-bit) half-words represented by
instructions retired in this block, and \textbf{ilastsize} the size of the last
instruction. Half-words rather than instruction count enables the encoder to easily compute the address of
the last instruction in the block without having access to the size of every
instruction in the block.
\textbf{itype} can be 3 or 4 bits wide. If \textit{itype\_width\_p} is 3, a single code (6) is used to
indicate all uninferable jumps. This is simpler to implement, but precludes use of the implicit return mode
(see section~\ref{sec:implicit-return}), which requires jump types to be fully classified.
Whilst \textbf{iaddr} is typically a virtual address, it does not affect the encoder's behavior
if it is a physical address.
For harts that can retire a maximum of N non-zero \textbf{itype} values per clock
cycle, the signal groups MR, OR and either BR or SR must be replicated N times.
Typically N is determined by the maximum number of branches that can be retired per clock cycle.
Signal group 0 represents information about the oldest instruction block, and group N-1
represents the newest instruction block. The interface supports no more
than one privilege change, context change,
exception or interrupt per cycle and so signals in
groups M and O are not replicated. Furthermore, \textbf{itype} can only
take the value 1 or 2 in one of the signal groups, and this must be
the newest valid group (i.e. \textbf{iretire} and \textbf{itype} must
be zero for higher numbered groups). If fewer than N groups are required
in a cycle, then lower numbered groups must be used
first. For example, if there is one branch, use only group 0, if
there are two branches, instructions up to the 1st branch
must be reported in group 0 and instructions up to the 2nd branch
must be reported in group 1 and so on.
\textbf{sijump} is optional and may be omitted if the hart does not implement the logic to detect
sequentially inferable jumps. If the encoder offers an \textbf{sijump} input it must also provide a
parameter to indicate whether the input is connected to a hart that implements this capability, or
tied off. This is to ensure the decoder can be made aware of the hart's capability. Enabling
sequentially inferable jump mode in the encoder and decoder when the hart does not support it will
prevent correct reconstruction by the decoder.
The \textbf{context} and/or the \textbf{time} field can be used to convey any additional information to the decoder. For example:
\begin{itemize}
\item The address space and virtual machine IDs (\textbf{ASID} and \textbf{VMID}
respectively). Where present it is recommended these values be wired to bits [15:0] and [29:16];
\item The software thread ID;
\item The process ID from an operating system;
\item It could be used to convey the values of CSRs to the decoder by setting \textbf{context} to the
CSR number and value when a CSR is written;
\item In cases where a single encoder is being shared amongst multiple harts
(see section~\ref{sec:relationship}), it could also be used to indicate the hart ID, in cases where the
hart ID can be changed dynamically.
\item Time from within the hart
\end{itemize}
Table~\ref{tab:context-type} specifies the actions for the various
\textbf{ctype} values. A typical behavior would be for this signal to
remain zero except on the 1st retirement after a context change
or when a time value should be reported.
\textit{ctype\_width\_p} may be 1 or 2. The reduced width option only
provides support for reporting context changes imprecisely.
\begin{table}[htp]
\centering
\caption{Context type \textbf{ctype} values and corresponding actions}
\label{tab:context-type}
\begin{tabulary}{\textwidth}{|l|l|p{90mm}|}
\hline
\textbf{Type} & \textbf{Value} & \textbf{Actions} \\
\hline
Unreported & 0 & No action (don't report context).\\
\hline
Report context imprecisely & 1 & An example would be a SW thread or operating system process change.\newline
Report the new context value at the earliest convenient opportunity.\newline
It is reported without any address information, and the assumption is that the precise
point of context change can be deduced from the source code (e.g. a CSR write). \\
\hline
Report context precisely & 2 & Report the address of the 1st instruction retired in this block, and the new context.\newline
If there were unreported branches beforehand, these need to be reported first.\newline
Treated the same as a privilege change.\\
\hline
Report context as an & 3 & An example would be a change of hart.\\
asynchronous discontinuity & &
Need to report the last instruction retired on the previous context, as well as the 1st on the new context.\newline
Treated the same as an exception.\\
\hline
\end{tabulary}
\end{table}
\subsection{Simplifications for single-retirement} \label{sec:single-retire}
For harts that can only retire one instruction at a time, the interface can be simplified to
the signals listed in tables~\ref{tab:common-ingress} and ~\ref{tab:single-ingress}. The
simplifications can be summarized as follows:
\begin{itemize}
\item \textbf{iretire} simply indicates whether an instruction retired or not;
\end{itemize}
\textbf{Note:} \textbf{ilastsize} is still needed in order to determine the address of the next
instruction, as this is the predicted return address for implicit return mode
(see Section~\ref{sec:implicit-return}).
The parameter \textit{retires\_p} which indicates to the encoder the maximum number of
instructions that can be retired per cycle can be used by an encoder capable of supporting single or
multiple retirement to select the appropriate interpretation of \textbf{iretire}.
\subsection{Alternative multiple-retirement interface configurations} \label{sec:alt-multi}
For a hart that can retire multiple instructions per cycle, but no more than one branch, the preferred
solution is to use one instance of signals from groups BR, MR and OR. However, if the hart can retire
N branches in a cycle, N instances of signals from groups MR, OR and either SR or BR must be used
(each instance can be either a single instruction or a block).
If the hart can retire N instructions per cycle, but only one branch, it is allowed (though not recommended)
to provide explicit details of every instruction retired by using N instances of signals from groups
SR, MR and OR.
\subsection{Optional sideband signals}
Optional sideband signals may be included to provide additional functionality, as described in
tables \ref{tab:ingress-side-band} and \ref{tab:egress-side-band}.
Note, any user defined information that needs to be output by the encoder
will need to be applied via the \textbf{context} input.
\begin{table}[htp]
\centering
\caption{Optional sideband encoder input signals}
\label{tab:ingress-side-band}
\begin{tabulary}{\textwidth}{|l|p{18mm}|p{85mm}|}
\hline
\textbf{Signal} & \textbf{Group} & \textbf{Function} \\
\hline
\textbf{impdef}[\textit{impdef\_width\_p}-1:0] & O & Implementation defined sideband signals. A typical use for
these would be for filtering (see Chapter~\ref{ch:filtering}.\\
\hline
\textbf{trigger}[2+:0] & [1:0]: O \newline
[2+]: OR & A pulse on bit 0 will cause the encoder to start tracing, and continue until further
notice, subject to other filtering criteria also being met.\newline
A pulse on bit 1 will cause the encoder to stop tracing until further notice. See section~\ref{sec:trigger}).\\
\hline
\textbf{halted} & O & Hart is halted. Upon assertion, the encoder will output a packet to report the address
of the last instruction retired before halting, followed by a support packet to indicate that tracing has stopped.
Upon deassertion, the encoder will start tracing again, commencing with a synchronization packet.
\textbf{Note:} If this signal is not provided, it is strongly recommended that Debug mode can be signalled via a 3-bit
\textbf{privilege} signal. This will allow tracing in Debug mode to be controlled via the optional filtering capabilities.\\
\hline
\textbf{reset} & O & Hart is in reset. Provided the encoder is in a different reset domain to the hart, this
allows the encoder to indicate that tracing has ended on entry to reset, and restarted on exit.
Behavior is as described above for halt.\\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Optional sideband encoder output signals}
\label{tab:egress-side-band}
\begin{tabulary}{\textwidth}{|l|l|p{125mm}|}
\hline
\textbf{Signal} & \textbf{Group} & \textbf{Function} \\
\hline
\textbf{stall} & O & Stall request to hart. Some applications may require lossless trace, which can be achieved by
using this signal to stall the hart if the trace encoder is unable to output a trace packet (for example due to
back-pressure from the packet transport infrastructure).\\
\hline
\end{tabulary}
\end{table}
\FloatBarrier
\subsection{Using trigger outputs from the Debug Module} \label{sec:trigger}
The debug module of the RISC-V hart may have a trigger unit. This defines a match control register
(\textbf{\textit{mcontrol}}) containing a 4-bit \textbf{action} field, and reserves codes 2 - 5
of this field for trace use.
These action codes are hereby defined as shown in table \ref{tab:debugModuleTriggerSupport}.
If implemented, each action must generate a pulse on an output from the hart, on the same cycle as the instruction which
caused the trigger is retired.
\begin{table}[!h]
\centering
\caption{Debug Module trigger support (\textbf{\textit{mcontrol}} \textbf{action})}
\label{tab:debugModuleTriggerSupport}
\begin{tabulary}{\textwidth}{|l|p{140mm}|}
\hline
\textbf {Value} & \textbf {Description} \\
\hline
2 & \textit{Trace-on}. This should be connected to
\textbf{trigger[0]} if the encoder provides it. \\
\hline
3 & \textit{Trace-off}. This should be connected to
\textbf{trigger[1]} if the encoder provides it. \\
\hline
4 & \textit{Trace-notify}. This should be connected to
\textbf{trigger[1 + \textit{blocks}:}2] if the encoder provides it. This will cause the encoder to output
a packet containing the address of the last instruction in the block if it is enabled.
One bit per block.\\
\hline
\end{tabulary}
\end{table}
Trace-on and Trace-off actions provide a means for the hart to control when tracing
starts and stops. It is recommended that tracing starts from the oldest
instruction retired in the cycle that Trace-on is asserted, and stops following the newest
instruction retired in the cycle that Trace-off is asserted (subject to any optional filtering).
Trace-notify provides means to ensure that a specified instruction is explicitly reported
(subject to any optional filtering). This capability is sometimes known as a watchpoint.
\FloatBarrier
\subsection{Example retirement sequences}
\begin{table}[htp]
\centering
\caption{Example 1 : 9 Instructions retired over four cycles, 2 branches}
\label{tab:signal-block-9-instructions-2-branches}
\begin{tabulary}{\textwidth}{|l|l|}
\hline
\textbf {Retired} & \textbf {Instruction Trace Block} \\
\hline
1000: \textbf{\textit{divuw}} & \textbf{iretire}=7, \textbf{iaddr}=0x1000, \textbf{itype}=8\\
1004: \textbf{\textit{add}} & \\
1008: \textbf{\textit{or}} & \\
100C: \textbf{\textit{c.jalr}} & \\
\hline
0940: \textbf{\textit{addi}} & \textbf{iretire}=3, \textbf{iaddr}=0x0940, \textbf{itype}=4\\
0944: \textbf{\textit{c.beq}} & \\
\hline
0946: \textbf{\textit{c.bnez}} & \textbf{iretire}=1, \textbf{iaddr}=0x0946, \textbf{itype}=5\\
\hline
0988: \textbf{\textit{lbu}} & \textbf{iretire}=4, \textbf{iaddr}=0x0988, \textbf{itype}=0\\
098C: \textbf{\textit{csrrw}} & \\
\hline
\end{tabulary}
\end{table}
\section{Data Trace Interface requirements}\label{sec:DataInterfaceRequirements}
This section describes in general terms the information which must be
passed from the RISC-V hart to the trace encoder for the purposes of
Data Trace, and distinguishes between what is mandatory, and what is
optional.
If Data Trace is not needed in a system then there is no requirement
for the RISC-V hart to supply any of the signals in section
~\ref{sec:DataTraceInterface}.
Data trace supports up to four data access types: load, store, atomic and CSR.
Support for both atomic and CSR accesses are independently optional.
The signalling protocol can take one of two forms, depending on the needs of the RISC-V
hart: \textit{unified} or \textit{split}.
Unified is the simplest form, suitable for simpler, in-order harts. In this form, all
information about a data access is signalled by the RISC-V hart in the
same cycle that the associated data access instruction is reported on the instruction
trace interface.
For harts with out of order or speculative execution capabilities, many loads may be in
progress simultaneously, and this approach is not practical as it would require the hart
to maintain a large amount of state relating to all the in-progress loads. For this
reason, the interface also supports splitting loads into two parts:
\begin{itemize}
\item The \textit{request} phase provides all the information about the load that
originates from the hart (address, size, etc.) when the instruction retires;
\item The \textit{response} phase provides the data and response status when it has
been returned to the hart from the memory system.
\end{itemize}
The two parts of a split load are associated by use of a transaction ID.
Any extensions that result in multiple memory accesses such as the
push and pop instructions proposed by the code size working group will
each result in multiple loads or stores. To allow the resulting loads
or stores to be associated with the correct instruction, these
multi-memory-access instructions must be reported on the instruction
trace interface multiple times (once for each individual load or
store) using \textbf{itype} 0 except for the final load or store,
which must retire using the natural \textbf{itype} for the instruction
(for example, a popret instruction must use \textbf{itype} 13 for the
final load to signal the return).
If an exception
occurs part way through the sequence of loads or stores initiated by such an instruction,
and the instruction is re-executed after the exception handler has been serviced, the
load or store sequence must recommence from the beginning.
If an exception
occurs part way through the sequence of loads or stores initiated by such an instruction,
and the instruction is re-executed after the exception handler has been serviced, the
load or store sequence must recommence from the beginning.
\section{Data Trace Interface}\label{sec:DataTraceInterface}
This section describes the interface between a RISC-V hart and the
trace encoder that conveys the information described in the
section~\ref{sec:DataInterfaceRequirements}. Signals are assigned to
one of the following groups:
\begin{itemize}
\item M: Mandatory. The interface must include an instance of this signal;
\item U: Unified. Mandatory for unified signalling;
\item S: Split. Mandatory for split load signalling;
\item O: Optional. The interface may include an instance of this signal.
\end{itemize}
\begin{table}[htp]
\centering
\caption{Data interface signals}
\label{tab:data-ingress}
\begin{tabulary}{\textwidth}{|l|l|p{80mm}|}
\hline
\textbf{Signal} & \textbf{Group} & \textbf{Function} \\
\hline
\textbf{dretire} & M & Data access retired (when high)\\
\hline
\textbf{dtype}[\textit{dtype\_width\_p}-1:0] & M & Data access type. Encoding given in Table~\ref{tab:dtype}\\
\hline
\textbf{daddr}[\textit{daddress\_width\_p}-1:0] & M & The data access address\\
\hline
\textbf{dsize}[\textit{dsize\_width\_p}-1:0] & M & The data access size is 2\textsuperscript{\textbf{dsize}} bytes\\
\hline
\textbf{data}[\textit{data\_width\_p}-1:0] & U & The data\\
\hline
\textbf{iaddr\_lsbs}[\textit{iaddr\_lsbs\_width\_p}-1:0] & O & LSBs of the data access instruction address. Required if \textit{retires\_p} > 1\\
\hline
\textbf{dblock}[\textit{dblock\_width\_p}-1:0] & O & Instruction block in which the data access instruction is retired. Required if there
are replicated instruction block signals\\
\hline
\textbf{lrid}[\textit{lrid\_width\_p}-1:0] & S & Load request ID. Valid when \textbf{dretire} is high\\
\hline
\textbf{lresp}[\textit{lresp\_width\_p}-1:0] & S & Load response:\newline
0: None\newline
1: reserved\newline
2: Okay. Load successful; \textbf{ldata} valid\newline
3: Error. Load failed; \textbf{ldata} not valid\\
\hline
\textbf{lid}[\textit{lrid\_width\_p}-1:0] & S & Split Load ID. Valid when \textbf{lresp} is non-zero\\
\hline
\textbf{sdata}[\textit{sdata\_width\_p}-1:0] & S & Store data. Valid when \textbf{dretire} is high\\
\hline
\textbf{ldata}[\textit{ldata\_width\_p}-1:0] & S & Load data. Valid when \textbf{lresp} is non-zero\\
\hline
\end{tabulary}
\end{table}
\begin{table}[htp]
\centering
\caption{Data access type (\textbf{dtype}) encoding}
\label{tab:dtype}
\begin{tabulary}{\textwidth}{|l|p{140mm}|}
\hline
\textbf{Value} & \textbf{Description} \\
\hline
0 & Load\\
1 & Store\\
2 & reserved\\
3 & reserved\\
4 & CSR read-write\\
5 & CSR read-set\\
6 & CSR read-clear\\
7 & reserved\\
8 & Atomic swap\\
9 & Atomic add\\
10 & Atomic AND\\
11 & Atomic OR\\
12 & Atomic XOR\\
13 & Atomic max\\
14 & Atomic min\\
15 & Conditional store failure\\
\hline
\end{tabulary}
\end{table}
All signals in M, U and O groups are only valid when \textbf{dretire} is high. Signals in the S group are valid as
indicated in table \ref{tab:data-ingress}.
For harts that can retire a maximum of M data accesses per cycle, the implemented signal
groups must be replicated M times. If fewer than M groups are required in a cycle, then lower numbered groups must
be used first. For example, if there is one data access, use only group 0.
The maximum value of \textit{dtype\_width\_p} is 4. However, if only loads and stores are supported,
\textit{dtype\_width\_p} can be 1. If CSRs are supported but atomics are not, \textit{dtype\_width\_p} can be 3.
Atomic and CSR accesses have either both load and store data, or store data and an operand. For CSRs and unified
atomics, both values are reported via \textbf{data}, with the store data in the LSBs and the load data or operand
in the MSBs.
\textit{lrid\_width\_p} is determined by the maximum number of loads that can be in progress simultaneously, such
that at any one time there can be no more than one load in progress with a given ID.
\textbf{iaddr\_lsbs} and \textbf{dblock} are provided to support filtering of which data accesses to trace
based on their instruction address. This is best illustrated by considering the following instruction sequence:
\begin{enumerate}
\item load
\item <some non data access instruction>
\item load
\item <some non data access instruction>
\item <some non data access instruction>
\end{enumerate}
Suppose the hart is capable of retiring up to 4 instructions in a cycle, via a single block. Instruction trace is
enabled throughout, but the requirement is to collect data trace for the 1st load (instruction 1), and filtering is
configured to match the address of this instruction only. However, information about instruction addresses is passed
to the encoder at the block level, and the block boundaries are invisible to the decoder. For instruction trace,
all instructions in a block are traced if any of the instructions in that block match the filtering criteria.
That is fine for instruction trace - the address of the 1st and last traced instruction are output explicitly.
There will be some fuzziness about precisely what those addresses will be depending on where the block boundaries
fall, but this is not a concern as everything is always self-consistent.
However, that is not the case for data trace. Consider two scenarios:
\begin{itemize}
\item Case 1: 1st block contains instructions 1, 2, 3; second block contains 4, 5
\item Case 2: 1st block contains instructions 1, 2; second block contains 3, 4, 5
\end{itemize}
Given that \textbf{iretire} is non-zero in the same cycle that the data access retires, the encoder knows the address of
the 1st and last instructions in a block, but does not know precisely where in the block the data access is.
In both cases, the first block matches the filtering criteria (it contains the address of instruction 1), and the second block does not.
But if the encoder traced all the data accesses in the matching block, then in case 1 it would trace both instructions 1 and 3, whereas
in the second case it would trace only instruction 1. The decoder has no visibility of the block boundaries so cannot account for this.
It is expecting only instruction 1 to be traced, and so may misinterpret instruction 3. If this code is in a loop for example, it will
assume that the 2nd traced load is in fact instruction 1 from the next loop iteration, rather than instruction 3 from this iteration.
Providing the LSBs of the data access instruction address allows the decoder to determine precisely whether the data access should be
traced or not, and removes the dependency on the block sizes and boundaries. The number of bits required is one more bit than the
number required to index within the block because blocks can start on any half-word boundary.
For harts that replicate the block signals to allow multiple blocks to retire per cycle it is also necessary to indicate which block
each data access is associated with, so the encoder knows which block address to combine with the LSBs in order to construct the
actual data access instruction address. 1 bit for 2 blocks per cycle, 2 bits for 4, and so on.