Skip to content
This repository has been archived by the owner on Aug 28, 2024. It is now read-only.

Latest commit

 

History

History
71 lines (62 loc) · 3.57 KB

intro.adoc

File metadata and controls

71 lines (62 loc) · 3.57 KB

Introduction

For a Machine-level environment, extension Smcdeleg (‘Sm’ for Privileged architecture and Machine-level extension, ‘cdeleg’ for Counter Delegation) encompasses all added CSRs and all behavior modifications for a hart, over all privilege levels. For a Supervisor-level environment, extension Ssccfg (‘Ss’ for Privileged architecture and Supervisor-level extension, ‘ccfg’ for Counter Configuration) provides access to delegated counters, and to new supervisor-level state. These extensions depend on the Zicntr and/or Zihpm extensions, and on the Sscsrind extension.

Motivation

The Zicntr extension defines a set of fixed-event counters (cycle, time, instret), while the Zihpm extension defines programmable counters (hpmcounteri and hpmeventi). The mcounteren CSR provides a means to make select counter CSRs readable in supervisor (S) mode, while the scounteren CSR provides a means for S-mode to further expose those selected counter CSRs as readable in user (U) mode. Counters and event selector CSRs can only be written in machine (M) mode.

In modern “Rich OS” environments, hardware performance monitoring resources are managed by the kernel, kernel driver, and/or hypervisor. Counters may be configured with differing scopes, in some cases counting events system-wide, while in others counting events on behalf of a single virtual machine or application. In such environments, the latency of counter writes has a direct impact on overall profiling overhead as a result of frequent counter writes during:

  1. Sample collection, to clear overflow indication, and reload overflowed counter(s)

  2. Context switch, between processes, threads, containers, or virtual machines

This extension provides a means for M-mode to allow writing select counters and event selectors from S/HS-mode. The purpose is to avert transitions to and from M-mode that add latency to these performance critical supervisor/hypervisor code sections. This extension also defines one new CSR, scountinhibit.

Note

Indirect vs direct access to counters and event selectors was discussed at length. While a direct access method (e.g., new shpmcounteri CSRs) has the potential to reduce latency for performance-sensitive operations such as context switch and counter overflow handling, by avoiding the need to write an index CSR for each counter access, in practice the benefits are difficult to reap. Because the CSR number is embedded in the immediate of CSR access instructions, functions to access individual counters by index have to utilize a switch statement to jump to the instruction that accesses the chosen counter. Counters are typically accessed infrequently (say every sample, or every context switch), so this switch statement is likely to incur a branch mispredict, which will undermine the performance benefits intended by avoiding an indirect access mechanism. A static routine that accesses all counters could be crafted without branches, but with Linux perf only counters associated with active perf events are accessed.

With indirect access, branching can be avoided for all cases, with the counter index simply written to the index register, and a static flow to read/write the associated alias register. While strong ordering between the index write and the alias register access is required, it is believed that pipelining of CSR accesses can ensure that the costs associated with this ordering are less than the cost associated with the mispredictions that result from the direct method.