Jiahan Lab Notebook #1731

jiahanxie353 · 2023-09-25T21:28:04Z

jiahanxie353
Sep 25, 2023
Collaborator

Hi @sampsyo @rachitnigam , just created my lab notebook and I will start the discussion here!

Goal

My current goal is to fix the issue, i.e. add support for scf.if during lowering by adding extra functionalities to yieldOp.

Progress

Reproduced the error mentioned in the issue, and
Used gdb tools to track the call stack so now I understand the whole process of how the mlir file source got lowered to Calyx:
- runOnOperation gets run, in which we sequentially apply lowering pattern to each operation, including scf.yield for our case;
- runPartialPattern thus gets invoked, in which we apply patterns and fold greedily to the test code based on the configuration;
- execute the actual rewriting here and here;
- and then partial lowering function's operations will actually happen here, which is a hook to allow rewriter to lower different ops during the pass;
- and specifically, funcOp (the testing code) will walk its instructions using switch-cases that are based on operation types. And since we have scf::YieldOp inside scf.if, we will use buildOp, and then the overloading version for yieldOp gets called here. However, it assumes that this yieldOp is either inside a for loop or a while loop. But it can actually be inside an ifOp as well, so the current code fails to take that into consideration

Blocker

To actually fix it, I’m wondering if something like:
- add a branch for IfOp, just like we have an if statement for ForOp and an if statement for WhileOp: ;
- and dynamically cast it to IfOp, something like auto ifOp = dyn_cast<scf:IfOp>(yieldOp->getParentOp());

would be a good starting point?
2. Not a blocker, but a general question: why do we keep saying "partial lowering"? ""Partial" as to there are many HLS passes and MLIR.SCF -> Calyx is just "partial"?
3. Not a blocker, but just something I don't understand. What does

the pattern driver will re-enqueue the op again

mean here? What's the benefit of using that queue?

sampsyo · 2023-09-28T00:59:47Z

sampsyo
Sep 28, 2023
Maintainer

Awesome! Thank you for getting this started!

I unfortunately can't pretend to be enough of an expert on SCFToCalyx to have a direct answer to your proposal, but it sounds plausible enough to me. I would recommend forging right ahead with that plan and seeing how well it works out! Having even a draft PR to demonstrate the general idea you're pursuing could be a great way to start engaging with the CIRCT community—they might be able to provide some feedback too.
AFAICT, "partial lowering" is a term of art in MLIR world https://github.com/bollu/mlir/blob/master/orig_docs/Tutorials/Toy/Ch-5.md#partial-lowering. It seems to mean just that we aren't fully eliminating the old dialect's constructs and replacing them with constructs from the new (lower) dialect. Instead, "partial lowering" just means eliminating some of the higher-level dialect's constructs—producing a program with a mix of dialects. I'm not 100% sure how that applies in this case, but perhaps some scf operations remain after this pass?
It might be helpful to read a bit more about MLIR's pattern rewriting engine. There seem to be complicated rules about when the rewriter decides to "try again" on previously rewritten code, which is important because doing this willy-nilly could lead to nontermination or inefficiency.

0 replies

jiahanxie353 · 2023-10-22T17:51:13Z

jiahanxie353
Oct 22, 2023
Collaborator Author

Goal

Concretize the implementation of lowering scf.if and scf.yield operations, and eventually solve the issue above.

Progress

I have opened a PR to address the aforementioned issue. Discussions are ongoing with members of the CIRCT community, and I have had several productive one-on-one discussions with Chris, who has been exceptionally helpful.
To solve that issue and become more familiar with MLIR, CIRCT and Calyx, I have also been reading more MLIR documents about IR lowering and pattern rewriting, and I have been studying Calyx documentation itself.
To deepen my understanding of MLIR, CIRCT, and Calyx, I've undertaken the following steps:

Document Study: I have invested time in studying the MLIR documentation, focusing on understanding more about the IR lowering and pattern rewriting flows. Additionally, I have delved into the Calyx documentation itself.
Debugging Approach: Rather than relying solely on gdb, which can be cumbersome due to the extensive intermediate function calls during back tracing, I've found utility in using helper print functions and additional MLIR debug flags. This approach has streamlined my understanding of the lowering and rewriting flows.
And building on the acquired knowledge, I took a strategic decision to take a step back and first proposed a desired, lowered IR for the example I'm working on. And if my proposed IR makes sense, I will move on to the details. Because I found myself been stuck in the intricacies too much and didn't have a firm grasp of the big picture, which had been prone to go on the wrong track and had unproductive outcomes.
To recap, the example I have been working on is:

func.func @main(%arg0 : i32, %arg1 : i32) -> i32 {
  %0 = arith.cmpi slt, %arg0, %arg1 : i32
  %1 = scf.if %0 -> i32 {
    %3 = arith.addi %arg0, %arg1 : i32
    scf.yield %3 : i32
  } else {
    scf.yield %arg1 : i32
  }
  return %1 : i32
}

And my proposed desired, lowered IR is:

module attributes {calyx.entrypoint = "main"} {
calyx.component @main(%in0: i32, %in1: i32, %clk: i1 {clk}, %reset: i1 {reset}, %go: i1 {go}) -> (%out0: i32, %done: i1 {done}) {
  %std_slt_0.left, %std_slt_0.right, %std_slt_0.out = calyx.std_slt @std_slt_0 : i32, i32, i1
  %std_add_0.left, %std_add_0.right, %std_add_0.out = calyx.std_add @std_add_0 : i32, i32, i32
  %yield_reg.in, %yield_reg.write_en, %yield_reg.clk, %yield_reg.reset, %yield_reg.out, %yield_reg.done = calyx.register @yield_reg : i32, i1, i1, i1, i32, i1
  %ret_reg.in, %ret_reg.write_en, %ret_reg.clk, %ret_reg.reset, %ret_reg.out, %ret_reg.done = calyx.register @ret_reg : i32, i1, i1, i1, i32, i1
  calyx.wires {
      calyx.assign %out = %ret_reg.out : i32
      calyx.comb_group @bb0_0 {
            calyx.assign %std_slt.left = %arg0 : i32
            calyx.assign %std_slt.right = %arg1 : i32
      }
      calyx.comb_group @bb0_1 {
            calyx.assign %std_add.left = %arg0 : i32
            calyx.assign %std_add.right = %arg1 : i32
      }
      calyx.group @assign_then_block {
            calyx.assign %yield_reg.in = %std_add.out : i32
            calyx.assign %yield_reg.write_en = %true : i1
            calyx.group_done %yield_reg.done : i1
      }
      calyx.group @assign_else_block {
            calyx.assign %yield_reg.in = %arg1 : i32
            calyx.assign %yield_reg.write_en = %true : i1
            calyx.group_done %yield_reg.done : i1
      }
     calyx.group @ret_assign {
            calyx.assign %ret_reg.in = %yield_reg.out : i32
            calyx.assign %ret_reg.write_en = %true : i1
            calyx.group_done %ret_reg.done : i1
      }
  }
   calyx.control {
     calyx.seq {
       calyx.if %std_slt.out with @bb0_1 {
         calyx.seq {
           calyx.enable @assign_then_block
         }
       }
      calyx.else {
           calyx.enable @assign_else_block
      }
      }
    }
  } {toplevel}
}
}

(Also proposed in the PR as well)

Blocker

For my proposal, a few things I'm not confident about are:

What should we do with the conditional operand of scf.if? Should we initialize any Calyx component for it; or should we just wire/assign it?
I'm not confident about the design choice of creating a yield_reg register for the result of yield because I don't think we are storing states and its is pretty combinational, except that we could have two possible combinational results based on whether we execute the then branch or the else branch.
And I'm not confident about the output Calyx control part. Using as a guidance, I print the final IR after lowering the following function that has scf.while:

module {
  func.func @counter(%start : i32, %end : i32) -> i32 {
    %c1 = arith.constant 1 : i32
    // The 'while' loop.
    %result = scf.while (%current = %start) : (i32) -> i32 {
      // Before region: Computes the loop condition.
      %cond = arith.cmpi slt, %current, %end : i32
      scf.condition(%cond) %current : i32
    } do {
      ^bb0(%loop_var: i32):
        // Loop body: Increment the counter by 1.
        %next = arith.addi %loop_var, %c1 : i32
        scf.yield %next : i32
    }
    return %result : i32
  }
}

And the corresponding Calyx control is:

 calyx.control {              // line 1
      calyx.seq {              // line 2
        calyx.seq {            // line 3
          calyx.par {           // line 4
            calyx.enable @assign_while_0_init_0   // line 5
          }                                                               // line 6
          calyx.while %std_slt_0.out with @bb0_0 {  // line 7
            calyx.seq {                                                 // line 8
              calyx.enable @assign_while_0_latch     // line 9
            }                                                                // line 10
          }                                                                  // line 11
          calyx.enable @ret_assign_0                      // line 12
        }
      }
    }
  } {toplevel}

Area of confusion: I don't quite understand the rule of thumb of placing calyx.seq, calyx.par, and calyx.enable. I understand that calyx.seq is for sequential work and calyx.par is for tasks in parallel.
However, I'm puzzled about how do we decide where to put calyx.seq. For example, why do we have two consecutive calyx.seq on line 2 and line 3, especially when line 2's calyx.seq does not have anything sharing the same hierarchy (if we think about line 4's calyx.par has the same hierarchy as line 7's calyx.while, in terms of being nested inside line 3's calyx.seq)? Why can't we just delete line 2's calyx.seq as it seems like an extra wrap?
And I don't understand why line 5' calyx.enable is put inside calyx.par while there is only one thing in calyx.par, it's counter-intuitive for me since we want parallel work when the number of tasks is larger than 1.
Also, for group enable, could you explain the idea of it? As says in the document, it's used for "naming a group in a control statement", so can I understand it as a kind of function call as Calyx groups (i.e. calling a group we declared above in the control program)? And as the documentation says, "a group enable, executes the group to completion". Does it mean to the completion of the whole program?

Design consideration/philosophy behind Calyx constructs

I'm attempting to grasp the design considerations behind various Calyx constructs, such as component, cell, wire, group?
Firstly, as in MLIR, "a pass is always rooted with an operation" (reference), and inside an Operation, we can have Regions, in which we have a list of Blocks, in which we can nest Operations again. Therefore, we can view MLIR's structure in a hierarchical and also recursive way.
Secondly, as in hardware design, we often separate things as the data path (or structure) and the control unit.
Therefore, since Calyx is an IR for representing hardware accelerator, I'm sure it has brought ideas from both domains (one domain as IR, or programming language; the second as hardware/RTL).
And I'd like to share some observations and assumptions I've made:

Data path analogy: cell and wire are just like data path in RTL
Control unit analogy: I also think of control similar to RTL's control unit, and the use of component is to connect the cells and wires (ordata path )with control (or control unit), but please correct me if I'm wrong.
The role of group? But I'm not sure about what's special about group, I think of it as nothing special but a group of assignments. Is there any nested structure consideration behind it, just like in MLIR's Operation->Region->Block->Operation?

For SCF to Calyx lowering pass

I'm wondering is MLIR lowering pass eliminating dead code already? I found something interesting as, when I have:

module {
  func.func @counter(%start : i32, %end : i32) -> i32 {
    %c1 = arith.constant 1 : i32
    // The 'while' loop.
    %result = scf.while (%current = %start) : (i32) -> i32 {
      // Before region: Computes the loop condition.
      %cond = arith.cmpi slt, %current, %end : i32
      scf.condition(%cond) %current : i32
    } do {
      ^bb0(%loop_var: i32):
        // Loop body: Increment the counter by 1.
        %next = arith.addi %loop_var, %c1 : i32
        %another_next = arith.addi %next, %c1 : i32                // note here is another assignment
        scf.yield %next : i32
    }
    return %result : i32
  }
}

The latch group in the resulting IR is no different than we not having that %another_next assignment:

calyx.group @assign_while_0_latch {
        calyx.assign %while_0_arg0_reg.in = %std_add_0.out : i32
        calyx.assign %while_0_arg0_reg.write_en = %true : i1
        calyx.assign %std_add_0.left = %while_0_arg0_reg.out : i32
        calyx.assign %std_add_0.right = %c1_i32 : i32
        calyx.group_done %while_0_arg0_reg.done : i1
}

so it's essentially eliminating that line since %another_next is never used anywhere.
but if I have:

do {
      ^bb0(%loop_var: i32):
        // Loop body: Increment the counter by 1.
        %next = arith.addi %loop_var, %c1 : i32
        %another_next = arith.addi %next, %c1 : i32
        scf.yield %another_next : i32
}

the resulting IR becomes:

calyx.group @assign_while_0_latch {
        calyx.assign %while_0_arg0_reg.in = %std_add_1.out : i32
        calyx.assign %while_0_arg0_reg.write_en = %true : i1
        calyx.assign %std_add_1.left = %std_add_0.out : i32
        calyx.assign %std_add_0.left = %while_0_arg0_reg.out : i32
        calyx.assign %std_add_0.right = %c1_i32 : i32
        calyx.assign %std_add_1.right = %c1_i32 : i32
        calyx.group_done %while_0_arg0_reg.done : i1
 }

which takes that assignment into consideration.

Also I want to apologize for this super late progress update, I was exploring different things and I was also blocked by some other work over the past weeks. I will update my lab notebook in time :)
Thank you!

2 replies

rachitnigam Oct 23, 2023
Maintainer

I'll read this later but also tagging @cgyurgyik on this since he asked a couple of relevant questions in the CIRCT repo.

rachitnigam Oct 25, 2023
Maintainer

Okay, finally had a chance to read this @jiahanxie353; thanks for the detailed write-up! Some answers:

Why can't we just delete line 2's calyx.seq as it seems like an extra wrap?

You definitely can! In fact, the Calyx compiler has a pass called collapse-control that transforms seq { seq { ... } } into seq { ... }. In general, the extra seq nests come from compilers that are just conservatively adding them when compiling things that could generate a sequence of statements.

Also, for group enable, could you explain the idea of it? As says in the document, it's used for "naming a group in a control statement", so can I understand it as a kind of function call as Calyx groups (i.e. calling a group we declared above in the control program)? And as the documentation says, "a group enable, executes the group to completion". Does it mean to the completion of the whole program?

A group enable runs a group's assignments "to completion". Each group defines a bunch of assignments and a done condition. Running to completion means running to the point when the done condition's 1-bit signal becomes 1. For more information, read the Calyx paper.

The role of group? But I'm not sure about what's special about group, I think of it as nothing special but a group of assignments.

The power of a group is to define an isolated computation without needing to worry about the rest of the hardware structure. Again, the Calyx paper highlights this well but the idea is that you can define a group that "increments a value in register r" without having to worry about any other circuitry that also does something to r. This compositional nature of groups allows us to reason about compiler optimizations much more easily that with normal hardware.

I'm wondering is MLIR lowering pass eliminating dead code already?

The -canonicalize option to MLIR passes might do this sometimes. I would try making the code use %another_next and see if %next gets removed instead.

jiahanxie353 · 2023-10-27T04:39:19Z

jiahanxie353
Oct 27, 2023
Collaborator Author

Thanks @rachitnigam for your detailed explanation :)

In fact, the Calyx compiler has a pass called collapse-control that transforms seq { seq { ... } } into seq { ... }. In general, the extra seq nests come from compilers that are just conservatively adding them when compiling things that could generate a sequence of statements.

Which means we don't need to worry about it yet during lowering MLIR to Calyx, and rather let Calyx itself to handle it after MLIR-to-Calyx?

A group enable runs a group's assignments "to completion". Each group defines a bunch of assignments and a done condition. Running to completion means running to the point when the done condition's 1-bit signal becomes 1.

The power of a group is to define an isolated computation without needing to worry about the rest of the hardware structure.

These make sense! Thanks for the explanation! Are the groups used temporarily for grouping things so that we can (1) isolate computation and (2) identify code-simplification opportunities? Because when I was reading the paper, Figure 2d showsRemoveGroups, and seems like groups are removed after CompileControl pass?

I would try making the code use %another_next and see if %next gets removed instead.

Under this situation, %another_next is used and used is ignored, or rather, the value of %another_next is being passed to the corresponding return register's input port. After pondering, I feel like it's not some optimization tools that are doing this, but rather the power of SSA and the way we design it. To be specific, we care about values instead of the symbolic representation/variable name, and we care about the values exposed to the interface (like input/output ports) of each component - we assign different values to the corresponding ports, and if some values are never assigned to anything/connected to any input/output interface, they naturally won't appear in the hardware. Hope my above thinking process makes sense.

1 reply

rachitnigam Oct 27, 2023
Maintainer

Are the groups used temporarily for grouping things so that we can (1) isolate computation and (2) identify code-simplification opportunities?

Yup, exactly! Groups are like LLVM's basic blocks: they let us perform optimizations but must ultimately be transformed into something that looks like "normal hardware". This is achieved by RemoveGroups although in the current version of the compiler, that pass is called TopDownCompileControl.

I feel like it's not some optimization tools that are doing this, but rather the power of SSA and the way we design it.

Yes, I think you're right! That is, because the compilation walks over the SSA use-def graph, it can just ignore values that do not affect the computation for the output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Calyx Infrastructure

Jiahan Lab Notebook #1731

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

The Calyx Infrastructure

Jiahan Lab Notebook #1731

jiahanxie353 Sep 25, 2023 Collaborator

Goal

Progress

Blocker

Replies: 3 comments · 3 replies

sampsyo Sep 28, 2023 Maintainer

jiahanxie353 Oct 22, 2023 Collaborator Author

Goal

Progress

Blocker

For my proposal, a few things I'm not confident about are:

Design consideration/philosophy behind Calyx constructs

For SCF to Calyx lowering pass

rachitnigam Oct 23, 2023 Maintainer

rachitnigam Oct 25, 2023 Maintainer

jiahanxie353 Oct 27, 2023 Collaborator Author

rachitnigam Oct 27, 2023 Maintainer

jiahanxie353
Sep 25, 2023
Collaborator

Replies: 3 comments 3 replies

sampsyo
Sep 28, 2023
Maintainer

jiahanxie353
Oct 22, 2023
Collaborator Author

rachitnigam Oct 23, 2023
Maintainer

rachitnigam Oct 25, 2023
Maintainer

jiahanxie353
Oct 27, 2023
Collaborator Author

rachitnigam Oct 27, 2023
Maintainer