-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix test case for duplicate FSM optimization #2204
Conversation
fsm0.write_en = fsm0.out == 2'd2 & C[done] ? 1'd1; | ||
fsm1.in = fsm0.out == 2'd2 & C[done] ? 2'd3; | ||
fsm1.write_en = fsm0.out == 2'd2 & C[done] ? 1'd1; | ||
tdcc[done] = fsm0.out == 2'd3 ? 1'd1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these internal transitions use fsm0
. Is that intentional?
I think it might be worth doing the following: all fsm==0../n2
guards should read from fsm0
and all fsm==n/2....n
guards should read from fsm1
.
It seems like right now, we only do this for certain queries.
@parthsarkar17 lmk if you want to take a shot at this. I think I could also probably do this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, I'm impatient so I just did it myself.
@parthsarkar17 please take a sec to review though to see if the changes look good to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is intentional. The enables (e.g. B[go] = !B[done] & fsm{x}.out == 2'd2 ? 1'd1;
) are the ones that are split evenly between the two registers. The transitions for each of the registers are based on the logic for fsm0
. I thought that was fine because the second register can just update based off of the first register's logic. If we were to split this logic, then that might just add more complexity. Is that right?
On second thought, I can see how all transition logic coming from fsm0
may also contribute to excessive fanout from the first register. If we split those guards between the two registers, that might help. I originally thought we are simply using the transition values computed for fsm0
to update fsm1
, which is true, but it might be even better to split that logic also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So your idea is there are two choices:
- register A and register B both read exclusively from register A
- register A and register B both read from both registers A and B.
And your idea is that (1) will use more resources/have a worse slack than (2). That may be the case: it might be simpler for the synthesis tools to place everything if the dataflow graph is simpler.
Here's the reason I think it might actually be more profitable to do (2).
The number of queries (i.e., fsm == i
) you have to do is actually fewer:
In case (1) fsm0
will probably have to perform N queries since it's in charge of all the transitions. fsm1
will probably have to perform N/2 queries (for the 1/2 of the enable queries). Which leads to 1.5 N queries total.
In case (2) fsm0
and fsm1
will both have to perform N/2
queries (since all queries are split), which leads to N queries total.
I'm not sure which is correct: it might be worth running some experiments to see. We can test this commit vs. a previous commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah those are also the options as I see them. I think resources wise, they should be the same (because the logic for each state transition needs to come from one of the registers, doesn't matter from which, right?), but worst slack might be worse if we read exclusively from A.
And yeah, I totally didn't think about the unevenness that could come from all transition reads coming from A. The math sounds good to me!
// num_states+1 is needed to prevent error (the done condition needs | ||
// to check past the number of states, i.e., will check fsm == 3 when | ||
// num_states == 3). | ||
let reg_to_query: usize = (state * num_registers / (num_states + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I had a whole back and forth with myself about this +1
, and I eventually got rid of it for the following reason. Please lmk if my reasoning is off:
When you use reg_to_query
, you pass in fsm_rep.last_state
, which is itself computed by sch.last_state()
. This function looks through all the transitions, which are tuples of u64
s, and returns the maximum u64
value along the second element of the tuple. As you pointed out in the comment, this includes the state above the index of the true last state, meaning sch.last_state()
returns the number of states, and not the index of the last state. I think the way it currently is, with the + 1
, only the final state will query from the second register because that's the way the integer division works out if the denominator num_states
is too high. Lmk if that checks out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I did +1
was only pragmatic:
If you have a single register, then you always want reg_to_query
to return 0. The only way to do this for the last state (i.e., when state==num_state
is if you do num_states +1
).
The only thing i don't quite understand is this:
I think the way it currently is, with the + 1, only the final state will query from the second register because that's the way the integer division works out if the denominator num_states is too high.
Won't it query from the second register any time state > (num_states + 1)/2
which will be basically half of the states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I forgot to mention that the biggest thing is that state
will never equal num_state
from what I can tell. The transition guards are computed by mapping over each (s, e, g)
element in sch.transitions
with respect to s
. This s
is always < num_states
.
As for your second point, if there are n
states, then the final state is n-1
. This means that, if state = n-2
, then (state * num_registers / (num_states + 1))
is equal to ((n-2)* 2)/(n + 1)
, which, for n >=2
results in integer division of 0
.
That was wrong, sorry. It doesn't matter one way or another.
@parthsarkar17 The runt test for duplicate FSMs was passing in the incorrect parameter. What probably happened was that it was using an old pass input parameter name that was updated but then the runt command was not updated.
This is one of the reasons why we should have at least one snapshot test for the FSM optimizations, so things like this are easier to catch (not a big deal at all, but I think it's good practice from now on).