-
Notifications
You must be signed in to change notification settings - Fork 140
IPC sensitivity to cache line size
This page is maintained by Alexandr Misevich
A cache is a hardware or software component that stores data so future requests for that data can be served faster. Data is transferred between memory and cache in blocks of fixed size, called cache lines or cache blocks. A cache hit occurs when the requested line can be found in a cache, while a cache miss occurs when it cannot. In the second case, it is needed to fetch necessary data from the main memory, which generates latency, and store it in the cache. In addition, it may be needed to evict another line from the cache to free space for the new one.
Increase of cache line size can have different effect on the performance. Firstly, larger cache line takes longer time to perform a cache line fill. Moreover, it takes longer time to evict lines from cache. Thus, larger line size means larger miss penalty.
Secondly, if there are too big blocks, we may fetch unused data, while possibly evicting useful cache lines. As the result, miss rate goes up. However, the situation can be quite the opposite. If a program works with consecutive pieces of data, large cache blocks will decrease memory access latency. In other words, larger line size takes advantage of spatial locality.
This page is devoted to the study of the L1 Instruction Cache. Instruction cache is controlled by 5 values, which are configurable with Config objects:
- cache size in bytes;
- cache associativity;
- size of cache line;
- cache miss penalty latency;
Class CacheTagArray
(to get more information read Cache-model ) and subsidiary functions are used to reproduce the work of a real cache. On cache miss, bubble occurs in pipeline and special long-latency loop port models memory access. Special thread clock_instr_cache(Cycle cycle)
listens to that port and fill cache with actual data.
The main target of the research is to determine the correlation between the IPC(instructions per cycle) and the cache parameters. Performance studies are made on instruction stress trace. Sensitivity to cache line size is demonstrated with dc_ic_stress.s. Since dc_ic_stress.s
does not contain loops, there is not much sensitivity to cache size and/or number of ways. So, the only effective parameter is cache line size which corresponds to throughput of our fictitious memory->cache interface. Next table proves what was said before:
Cache size, bytes | IPC |
---|---|
512 | 0.41 |
1K | 0.41 |
2K | 0.41 |
4K | 0.41 |
Constant parameters of the simulation:
- instruction cache size in bytes = 2048
- cache associativity = 4
- number of instructions = 400.000
- cache miss penalty = 30 cycles
The following results were obtained during the study:
Cache line size, bytes | IPC |
---|---|
4 | 0.042 |
8 | 0.080 |
16 | 0.149 |
32 | 0.258 |
64 | 0.410 |
128 | 0.580 |
256 | 0.732 |
512 | 0.842 |
Result can be achieved analytically as well. In case of ideal cache, each instruction is fetched immediately and CPI == 1
. If cache is not ideal, then each Nth instruction is a cache miss, where N is amount of instructions in a cache line. Since each instruction is 4 bytes long, N = cache_line_size / 4
. In case of cache miss, CPI is equal to miss penalty plus 1 cycle. Averaging CPI, we have the following formula:
CPI = (4 / cache_line_size) * (miss_penalty + 1) + (1 - 4 / cache_line_size) * 1 =
= 1 + 4 * miss_penalty / cache_line_size
IPC = 1 / (1 + 4 * miss_penalty / cache_line_size))
If miss penalty is around cache line size, function is linear to cache line size logarithm N
, as one may see expanding the formula with Tailor series:
IPC = 1 / (1 + 4 * miss_penalty / 2^N) =
= 1 / (1 + exp(ln(4 * miss_penalty) - N * ln(2)) ≈
≈ 0.5 + (N * ln(2) - ln(4 * miss_penalty)) / 4 =
= N * ln(2) / 4 + const
Linear dependency to cache line exponent is clearly visible on a chart:
MIPT-V / MIPT-MIPS — Cycle-accurate pre-silicon simulation.