Fix test case for duplicate FSM optimization #2204

calebmkim · 2024-07-11T13:54:00Z

@parthsarkar17 The runt test for duplicate FSMs was passing in the incorrect parameter. What probably happened was that it was using an old pass input parameter name that was updated but then the runt command was not updated.

This is one of the reasons why we should have at least one snapshot test for the FSM optimizations, so things like this are easier to catch (not a big deal at all, but I think it's good practice from now on).

calebmkim · 2024-07-11T13:55:46Z

tests/passes/tdcc-duplicate/seq.expect

+      fsm0.write_en = fsm0.out == 2'd2 & C[done] ? 1'd1;
+      fsm1.in = fsm0.out == 2'd2 & C[done] ? 2'd3;
+      fsm1.write_en = fsm0.out == 2'd2 & C[done] ? 1'd1;
+      tdcc[done] = fsm0.out == 2'd3 ? 1'd1;


All of these internal transitions use fsm0. Is that intentional?

I think it might be worth doing the following: all fsm==0../n2 guards should read from fsm0 and all fsm==n/2....n guards should read from fsm1.
It seems like right now, we only do this for certain queries.

@parthsarkar17 lmk if you want to take a shot at this. I think I could also probably do this as well.

Nvm, I'm impatient so I just did it myself.

@parthsarkar17 please take a sec to review though to see if the changes look good to you.

Yes, that is intentional. The enables (e.g. B[go] = !B[done] & fsm{x}.out == 2'd2 ? 1'd1;) are the ones that are split evenly between the two registers. The transitions for each of the registers are based on the logic for fsm0. I thought that was fine because the second register can just update based off of the first register's logic. If we were to split this logic, then that might just add more complexity. Is that right?

On second thought, I can see how all transition logic coming from fsm0 may also contribute to excessive fanout from the first register. If we split those guards between the two registers, that might help. I originally thought we are simply using the transition values computed for fsm0 to update fsm1, which is true, but it might be even better to split that logic also.

So your idea is there are two choices:

register A and register B both read exclusively from register A

register A and register B both read from both registers A and B.

And your idea is that (1) will use more resources/have a worse slack than (2). That may be the case: it might be simpler for the synthesis tools to place everything if the dataflow graph is simpler.

Here's the reason I think it might actually be more profitable to do (2).
The number of queries (i.e., fsm == i) you have to do is actually fewer:
In case (1) fsm0 will probably have to perform N queries since it's in charge of all the transitions. fsm1 will probably have to perform N/2 queries (for the 1/2 of the enable queries). Which leads to 1.5 N queries total.
In case (2) fsm0 and fsm1 will both have to perform N/2 queries (since all queries are split), which leads to N queries total.

I'm not sure which is correct: it might be worth running some experiments to see. We can test this commit vs. a previous commit.

Yeah those are also the options as I see them. I think resources wise, they should be the same (because the logic for each state transition needs to come from one of the registers, doesn't matter from which, right?), but worst slack might be worse if we read exclusively from A.

And yeah, I totally didn't think about the unevenness that could come from all transition reads coming from A. The math sounds good to me!

'

parthsarkar17 · 2024-07-11T14:58:05Z

calyx-opt/src/passes/top_down_compile_control.rs

+    // num_states+1 is needed to prevent error (the done condition needs
+    // to check past the number of states, i.e., will check fsm == 3 when
+    // num_states == 3).
+    let reg_to_query: usize = (state * num_registers / (num_states + 1))


So I had a whole back and forth with myself about this +1, and I eventually got rid of it for the following reason. Please lmk if my reasoning is off:

When you use reg_to_query, you pass in fsm_rep.last_state, which is itself computed by sch.last_state(). This function looks through all the transitions, which are tuples of u64s, and returns the maximum u64 value along the second element of the tuple. As you pointed out in the comment, this includes the state above the index of the true last state, meaning sch.last_state() returns the number of states, and not the index of the last state. I think the way it currently is, with the + 1, only the final state will query from the second register because that's the way the integer division works out if the denominator num_states is too high. Lmk if that checks out.

The reason why I did +1 was only pragmatic:
If you have a single register, then you always want reg_to_query to return 0. The only way to do this for the last state (i.e., when state==num_state is if you do num_states +1).

The only thing i don't quite understand is this:

I think the way it currently is, with the + 1, only the final state will query from the second register because that's the way the integer division works out if the denominator num_states is too high.

Won't it query from the second register any time state > (num_states + 1)/2 which will be basically half of the states?

Oh sorry, I forgot to mention that the biggest thing is that state will never equal num_state from what I can tell. The transition guards are computed by mapping over each (s, e, g) element in sch.transitions with respect to s. This s is always < num_states.

As for your second point, if there are n states, then the final state is n-1. This means that, if state = n-2, then (state * num_registers / (num_states + 1)) is equal to ((n-2)* 2)/(n + 1), which, for n >=2 results in integer division of 0.

That was wrong, sorry. It doesn't matter one way or another.

…ng them

add test case

4762d90

calebmkim commented Jul 11, 2024

View reviewed changes

calebmkim added 9 commits July 11, 2024 10:10

generalize duplication

b8c78c3

fix bug

0325231

comment

5ca2811

fixed tests

a74bf2d

added comment

d51783e

weird clippy error I can't locally replicate

3df970f

small change

a02b7c5

does this fix it

63887b2

'

somethign

e33dae0

parthsarkar17 reviewed Jul 11, 2024

View reviewed changes

calebmkim and others added 3 commits July 11, 2024 11:00

add

9c3ddf9

defaulted transition queries to first register, instead of distributi…

94fd565

…ng them

updated snapshot test to reflect no dist. of transition queries

9a4f44c

parthsarkar17 merged commit 894e221 into main Jul 14, 2024
18 checks passed

parthsarkar17 deleted the duplicate-fixes branch July 14, 2024 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test case for duplicate FSM optimization #2204

Fix test case for duplicate FSM optimization #2204

calebmkim commented Jul 11, 2024 •

edited

Loading

calebmkim Jul 11, 2024 •

edited

Loading

calebmkim Jul 11, 2024

parthsarkar17 Jul 11, 2024 •

edited

Loading

calebmkim Jul 11, 2024 •

edited

Loading

parthsarkar17 Jul 11, 2024

parthsarkar17 Jul 11, 2024 •

edited

Loading

calebmkim Jul 11, 2024 •

edited

Loading

parthsarkar17 Jul 11, 2024 •

edited

Loading

Fix test case for duplicate FSM optimization #2204

Fix test case for duplicate FSM optimization #2204

Conversation

calebmkim commented Jul 11, 2024 • edited Loading

calebmkim Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

calebmkim Jul 11, 2024

Choose a reason for hiding this comment

parthsarkar17 Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

calebmkim Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

parthsarkar17 Jul 11, 2024

Choose a reason for hiding this comment

parthsarkar17 Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

calebmkim Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

parthsarkar17 Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

calebmkim commented Jul 11, 2024 •

edited

Loading

calebmkim Jul 11, 2024 •

edited

Loading

parthsarkar17 Jul 11, 2024 •

edited

Loading

calebmkim Jul 11, 2024 •

edited

Loading

parthsarkar17 Jul 11, 2024 •

edited

Loading

calebmkim Jul 11, 2024 •

edited

Loading

parthsarkar17 Jul 11, 2024 •

edited

Loading