Skip to content

Commit

Permalink
[spatz_vrf] experimenting with normal VRF layout instead of barber pole
Browse files Browse the repository at this point in the history
Having a normal layout means that the conflict occuring at the VRF are structural, on the other hand for barber pole,
depending on the registers used by instructions, there may or may not be conflicts.
But since we add buffers on FPU and VLSU1, we can afford to have a normal layout and the conflict happens only initially
as can be seen for the dotp kernel
Performance of kernels (4k , 32k):
1) axpy_4096 : 52.0 % / 56.0 %
2) dotp_4096 : 75.6 % / 96.1 %
3) fmatmul_64x64x64 : 97.8 % / 97.8 %
  • Loading branch information
Navaneeth-KunhiPurayil committed Jan 17, 2025
1 parent 681c9e0 commit fc4a53c
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion hw/ip/spatz/src/spatz_vrf.sv
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ module spatz_vrf
automatic logic [1:0] vreg8 = addr[$clog2(8*NrWordsPerVector) +: 2];

// Barber's pole. Advance the starting bank of each vector by one every eight vector registers.
f_bank = addr[$clog2(NrVRFBanks)-1:0] + vreg8;
f_bank = addr[$clog2(NrVRFBanks)-1:0];
endfunction: f_bank

/////////////
Expand Down
8 changes: 4 additions & 4 deletions sw/spatzBenchmarks/dp-fdotp/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -94,17 +94,17 @@ int main() {
result[0] = acc;
}

// End timer and check if new best runtime
if (cid == 0)
timer = benchmark_get_cycle() - timer;

// Wait for all cores to finish
snrt_cluster_hw_barrier();

// End dump
if (cid == 0)
stop_kernel();

// End timer and check if new best runtime
if (cid == 0)
timer = benchmark_get_cycle() - timer;

// Check and display results
if (cid == 0) {
long unsigned int performance = 1000 * 2 * dotp_l.M / timer;
Expand Down

0 comments on commit fc4a53c

Please sign in to comment.