Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Part 1 A] Initial (unmodified) report different than expected #2

Open
mbforbes opened this issue May 23, 2017 · 0 comments
Open

[Part 1 A] Initial (unmodified) report different than expected #2

mbforbes opened this issue May 23, 2017 · 0 comments

Comments

@mbforbes
Copy link

After going into zynq/hls/mmult_float and running vivado_hls -f hls.tcl, the numbers in my report, without changing anything, are different from expected (what's given in the assignment). I ran multiple times. It does seem to be using the correct device (xc7z020clg484-1). Given we're doing optimizations, does this difference matter?

It looks (from my naive investigation) like the L3 inner loop has 11 instead of 10 iteration latency, which bumps up the overall latency by 10%.

Here's what I get:

Latency

expected:

+--------+--------+--------+--------+---------+
|     Latency     |     Interval    | Pipeline|
|   min  |   max  |   min  |   max  |   Type  |
+--------+--------+--------+--------+---------+
|  209851|  209851|  209852|  209852|   none  |
+--------+--------+--------+--------+---------+

mine (about 10% slower):

    +--------+--------+--------+--------+---------+
    |     Latency     |     Interval    | Pipeline|
    |   min  |   max  |   min  |   max  |   Type  |
    +--------+--------+--------+--------+---------+
    |  230331|  230331|  230332|  230332|   none  |
    +--------+--------+--------+--------+---------+

Utilization

expected:

+-----------------+---------+-------+--------+-------+
|       Name      | BRAM_18K| DSP48E|   FF   |  LUT  |
+-----------------+---------+-------+--------+-------+
|DSP              |        -|      -|       -|      -|
|Expression       |        -|      -|       0|    308|
|FIFO             |        -|      -|       -|      -|
|Instance         |        0|      5|     384|    751|
|Memory           |       16|      -|       0|      0|
|Multiplexer      |        -|      -|       -|    381|
|Register         |        -|      -|     714|      -|
+-----------------+---------+-------+--------+-------+
|Total            |       16|      5|    1098|   1440|
+-----------------+---------+-------+--------+-------+
|Available        |      280|    220|  106400|  53200|
+-----------------+---------+-------+--------+-------+
|Utilization (%)  |        5|      2|       1|      2|
+-----------------+---------+-------+--------+-------+

mine (FF/LUT higher):

+-----------------+---------+-------+--------+-------+
|       Name      | BRAM_18K| DSP48E|   FF   |  LUT  |
+-----------------+---------+-------+--------+-------+
|DSP              |        -|      -|       -|      -|
|Expression       |        -|      -|       0|    537|
|FIFO             |        -|      -|       -|      -|
|Instance         |        0|      5|     384|    751|
|Memory           |       16|      -|       0|      0|
|Multiplexer      |        -|      -|       -|    558|
|Register         |        -|      -|     779|      -|
+-----------------+---------+-------+--------+-------+
|Total            |       16|      5|    1163|   1846|
+-----------------+---------+-------+--------+-------+
|Available        |      280|    220|  106400|  53200|
+-----------------+---------+-------+--------+-------+
|Utilization (%)  |        5|      2|       1|      3|
+-----------------+---------+-------+--------+-------+

Loop performance

expected:

+--------------+--------+--------+----------+-----------+-----------+------+----------+
|              |     Latency     | Iteration|  Initiation Interval  | Trip |          |
|   Loop Name  |   min  |   max  |  Latency |  achieved |   target  | Count| Pipelined|
+--------------+--------+--------+----------+-----------+-----------+------+----------+
|- LOAD_OFF_1  |      10|      10|         2|          -|          -|     5|    no    |
|- LOAD_W_1    |    2580|    2580|       258|          -|          -|    10|    no    |
| + LOAD_W_2   |     256|     256|         2|          -|          -|   128|    no    |
|- LOAD_I_1    |    2064|    2064|       258|          -|          -|     8|    no    |
| + LOAD_I_2   |     256|     256|         2|          -|          -|   128|    no    |
|- L1          |  205056|  205056|     25632|          -|          -|     8|    no    |
| + L2         |   25630|   25630|      2563|          -|          -|    10|    no    |
|  ++ L3       |    2560|    2560|        10|          -|          -|   256|    no    |
|- STORE_O_1   |     136|     136|        17|          -|          -|     8|    no    |
| + STORE_O_2  |      15|      15|         3|          -|          -|     5|    no    |
+--------------+--------+--------+----------+-----------+-----------+------+----------+

mine (L1/L2/L3 are slower---might all be stemming from L3 having 11 instead of 10 iteration latency?):

        +--------------+--------+--------+----------+-----------+-----------+------+----------+
        |              |     Latency     | Iteration|  Initiation Interval  | Trip |          |
        |   Loop Name  |   min  |   max  |  Latency |  achieved |   target  | Count| Pipelined|
        +--------------+--------+--------+----------+-----------+-----------+------+----------+
        |- LOAD_OFF_1  |      10|      10|         2|          -|          -|     5|    no    |
        |- LOAD_W_1    |    2580|    2580|       258|          -|          -|    10|    no    |
        | + LOAD_W_2   |     256|     256|         2|          -|          -|   128|    no    |
        |- LOAD_I_1    |    2064|    2064|       258|          -|          -|     8|    no    |
        | + LOAD_I_2   |     256|     256|         2|          -|          -|   128|    no    |
        |- L1          |  225536|  225536|     28192|          -|          -|     8|    no    |
        | + L2         |   28190|   28190|      2819|          -|          -|    10|    no    |
        |  ++ L3       |    2816|    2816|        11|          -|          -|   256|    no    |
        |- STORE_O_1   |     136|     136|        17|          -|          -|     8|    no    |
        | + STORE_O_2  |      15|      15|         3|          -|          -|     5|    no    |
        +--------------+--------+--------+----------+-----------+-----------+------+----------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant