This github repository summarizes the progress made in the ASIC class. Quick links:
-
Day-2-Timing libs,hierarchical,flat synthesis,efficient flop coding styles
-
Day-4-GLS, blocking vs non-blocking and Synthesis-Simulation mismatch
Summary
I installed the needed tools.
Yosys
I installed Yosys using the following commands:$ git clone https://github.com/YosysHQ/yosys.git
$ cd yosys-master
$ sudo apt install make
$ sudo apt-get install build-essential clang bison flex \
libreadline-dev gawk tcl-dev libffi-dev git \
graphviz xdot pkg-config python3 libboost-system-dev \
libboost-python-dev libboost-filesystem-dev zlib1g-dev
$ make
$ sudo make install
Below is the screenshot showing sucessful launch:
Iverilog
I installed iverilog using the following command:
sudo apt-get install iverilog
Below is the screenshot showing sucessful launch:
Gtkwave
I installed gtkwave using the following command:
sudo apt-get install gtkwave
Below is the screenshot showing sucessful launch:
Ngspice
I downloaded the tarball from https://sourceforge.net/projects/ngspice/files/ to a local directory and unpacked it using the following commands:
tar -zxvf ngspice-40.tar.gz
cd ngspice-40
mkdir release
cd release
../configure --with-x --with-readline=yes --disable-debug
make
sudo make install
Below is the screenshot showing sucessful launch:
Magic
I installed magic using the following commands:
sudo apt-get install m4
sudo apt-get install tcsh
sudo apt-get install csh
sudo apt-get install libx11-dev
sudo apt-get install tcl-dev tk-dev
sudo apt-get install libcairo2-dev
sudo apt-get install mesa-common-dev libglu1-mesa-dev
sudo apt-get install libncurses-dev
Below is the screenshot showing sucessful launch:
OpenSTA
I installed and built OpenSTA (including the needed packages) using the following commands:
sudo apt-get install cmake clang gcctcl swig bison flex
git clone https://github.com/The-OpenROAD-Project/OpenSTA.git
cd OpenSTA
mkdir build
cd build
cmake ..
make
Below is the screenshot showing sucessful launch:
OpenLANE
I installed and built OpenLANE (including the needed packages) using the following commands:
sudo apt-get update
sudo apt-get upgrade
sudo apt install -y build-essential python3 python3-venv python3-pip make git
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io
sudo docker run hello-world
sudo groupadd docker
sudo usermod -aG docker $USER
sudo reboot
# After reboot
docker run hello-world
Below is the screenshot showing sucessful launch:
Introduction to Verilog RTL design and Synthesis
RTL Design: In simple terms RTL design or Register Transfer Level design is a method in which we can transfer data from one register to another. In RTL design we write code for Combinational and Sequential circuits in HDL(Hardware Description Language) like Verilog or VerilogHDL which can model logical and hardware operation. RTL design can be one code or set of verilog codes. One key note is that we need to write RTL design with optimized and synthesizable (realizable as physical gates).
Sample RTL design outline:
module module_name (port list);
//declarations;
//initializations;
//continuos concurrent assigments;
//procedural blocks;
endmodule
Test Bench: Using Verilog we can write a test bench to apply stimulus to the RTL design and verify the results of the design by instantiating design with in test bench. Up-front verification becomes very important as design size increases in size and complexity while any project progresses. This ensures simulation results matches with post synthesis results. A test bench can have two parts, the one generates input signals for the model to be tested while the other part checks the output signals from the design under test. It can be represented as follows.
Simulation: RTL design is checked for adherence to its design specification using simulation by giving sample inputs. This helps finding and fixing bugs in the RTL design in the early stages of design development.
Simulator: Simulator is the tool used for this process. It looks for changes on input signals to evaluate outputs. No change in output if there is no change in input signals Here is the flow of frondend design:
Introduction to open source simulator iverilog and gtkwave
iverilog: iverilog stands for Icarus Verilog. Icarus Verilog is an implementation of the Verilog hardware description language. It supports the 1995, 2001 and 2005 versions of the standard, portions of SystemVerilog, and some extensions.
Gtkwave: GTKWave is a fully featured GTK+ based wave viewer for Unix, Win32, and Mac OSX which reads LXT, LXT2, VZT, FST, and GHW files as well as standard Verilog VCD/EVCD files and allows their viewing.
We were introducted to Linux operating system and were made aware of the basic commands. Using git clone command we've cloned library files like standard cell library, primitives which are used for synthesis and few verilog codes for practice. In this session, I've performed simulation of multiplexer. I've added both the RTL design code and test bench code in iverilog to generate vcd file which I used in gtkwave generator to get the output waveformes after simulation. The output was generated by taking the inputs from the testbench code.
Here is the code :
module good_mux (input i0 , input i1 , input sel , output reg y);
always @ (*)
begin
if(sel)
y <= i1;
else
y <= i0;
end
endmodule
`timescale 1ns / 1ps
module tb_good_mux;
// Inputs
reg i0,i1,sel;
// Outputs
wire y;
// Instantiate the Unit Under Test (UUT), name based instantiation
good_mux uut (.sel(sel),.i0(i0),.i1(i1),.y(y));
//good_mux uut (sel,i0,i1,y); //order based instantiation
initial begin
$dumpfile("tb_good_mux.vcd");
$dumpvars(0,tb_good_mux);
// Initialize Inputs
sel = 0;
i0 = 0;
i1 = 0;
#300 $finish;
end
always #75 sel = ~sel;
always #10 i0 = ~i0;
always #55 i1 = ~i1;
endmodule
Introduction to Yosys synthesizer
Synthesis: Synthesis transforms the simple RTL design into a gate-level netlist with all the constraints as specified by the designer. In simple language, Synthesis is a process that converts the abstract form of design to a properly implemented chip in terms of logic gates.
Synthesis takes place in multiple steps:
- Converting RTL into simple logic gates.
- Mapping those gates to actual technology-dependent logic gates available in the technology libraries.
- Optimizing the mapped netlist keeping the constraints set by the designer intact.
Synthesizer: It is a tool we use to convert out RTL design code to netlist. Yosys is the tool I've used in this workshop. Here is the flow of above processess.
Yosys:Yosys is a framework for RTL synthesis and more. It currently has extensive Verilog-2005 support and provides a basic set of synthesis algorithms for various application domains. Yosys is the core component of most our implementation and verification flows.
I was given an overview of the operation of the tool and the files we'll need to provide the tool to give the required netlist. We give RTL design code, .lib file which has all the building blocks of the netlist. Using these two files, Yosys synthesizer generates a netlist file. .lib basically is a collection of logical modules like, And, Or, Not etc.... These are equivalent gate level representation of the RTL code.
Below are the commands to perform above synthesis.
- RTL Design - read_verilog
- .lib - read_liberty
- netlist file- write_verilog
Operational flow of Yosys Synthesizer
Verification of Synthesized design: In order to make sure that there are no errors in the netlist, we'll have to verify the synthesized circuit. The netlist verification flow can be seen in the below image:
The gtkwave output for the netlist should match the output waveform for the RTL design file. As netlist and design code have same set of inputs and outputs, we can use the same testbench and compare the waveforms.
Introduction to loigc synthesis: Below is the snippet RTL code and equivalent digital circuit:
In the above image, mapping of code and digital circuit is done using Synthesis.
.lib: It is a collection of logical modules like, And, Or, Not etc...It has different flvors of same gate like 2 input AND gate, 3 input AND gate etc... with different performace speed.
Need for different flavours of gate: In order to make a faster circuit, the clock frequency should be high. For that the time period of the clock should be as low as possible. However, in a sequential circuit, clock period depends on three factors so that data is not lost or to be glitch free.
For the below circuit the three factors are
The equation is as follows
As per the above equation, for a smaller propagation delay, we need faster cells. But again, why do we have faster cells? This is to ensure that there are no HOLD time violations at B flipflop. This complete collection forms .lib
Faster Cells vs Slower Cells: Load in digital circuit is of Capacitence. Faster the charging or dicharging of capacitance, lesser is the celll delay. However, for a quick charge/ discharge of capacitor, we need transistors capable of sourcing more current i.e, we need WIDE TRANSISTORS.
Wider transistors have lesser delay but consume more area and power. Narrow transistors are other way around. Faster cells come with a cost of area and power.
Selection of the Cells: We'll need to guide the Synthesizer to choose the flavour of cells that is optimum for implementation of logic circuit. Keeping in view of previous observations of faster vs slower cells,to avoid hold time violations, larger circuits, sluggish circuits, we offer guidance to synthesizer in the form of Constraints.
Below is an illustration of Synthesis.
Invoking Yosys:
Snippet below illustrates reading .lib, design and choosing the module to synthesize:
Generating Netlist: The logic of good_mux will be realizable using gates in the sky130_fd_sc_hd__tt_025C_1v80.lib file
Below is the snippet showing the synthesis results and synthesized circuit for multiplexer.
Netlist code:
Simplified netlist code: This code consisits of additional switch. To further simplify, we use below command
$ yosys
yosys> read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
yosys> read_verilog good_mux.v
yosys> synth -top good_mux
yosys> abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
yosys> show
yosys> write_verilog good_mux_netlist.v
yosys> !vim good_mux_netlist.v
yosys> write_verilog -noattr good_mux_netlist.v
yosys> !vim good_mux_netlist.v
Introduction to timing .libs
This lab guides us through the .lib files where we have all the gates coded in. According to the below parameters the libraries will be characterized to model the variations.
With in the lib file, the gates are delared as follows to meet the variations due to process, temperatures and voltages.
For the above example, for all the 32 cominations i.e 2^5 (5 is no.of variables), the delay, power and all the related parameters for each gate are mentioned.
This image displays the power consumtion comparision.
Below image is the delay order for the different flavor of gates.
LAB- Hierarchical synthesis and flat synthesis
multiple_module
module sub_module2 (input a, input b, output y);
assign y = a | b;
endmodule
module sub_module1 (input a, input b, output y);
assign y = a&b;
endmodule
module multiple_modules (input a, input b, input c , output y);
wire net1;
sub_module1 u1(.a(a),.b(b),.y(net1)); //net1 = a&b
sub_module2 u2(.a(net1),.b(c),.y(y)); //y = net1|c ,ie y = a&b + c;
endmodule
This is the schematic as per the connections in the above module.
However, the yosys synthesizer generates the following schematic instead of the above one and with in the submodules, the connections are made
$ yosys
yosys> read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
yosys> read_verilog multiple_modules.v
yosys> synth -top multiple_modules
yosys> show multiple_modules
The synthesizer considers the module hierarcy and does the mapping accordting to instantiation. Here is the hierarchical netlist code for the multiple_modules:
module multiple_modules(a, b, c, y);
input a;
input b;
input c;
wire net1;
output y;
sub_module1 u1 (.a(a),.b(b),.y(net1) );
sub_module2 u2 (.a(net1),.b(c),.y(y));
endmodule
module sub_module1(a, b, y);
wire _0_;
wire _1_;
wire _2_;
input a;
input b;
output y;
sky130_fd_sc_hd__and2_0 _3_ (.A(_1_),.B(_0_),.X(_2_));
assign _1_ = b;
assign _0_ = a;
assign y = _2_;
endmodule
module sub_module2(a, b, y);
wire _0_;
wire _1_;
wire _2_;
input a;
input b;
output y;
sky130_fd_sc_hd__lpflow_inputiso1p_1 _3_ (.A(_1_),.SLEEP(_0_),.X(_2_) );
assign _1_ = b;
assign _0_ = a;
assign y = _2_;
endmodule
Flattened netlist:
In flattened netlist, the hierarcies are flattend out and there is single module i.e, gates are instantiated directly instead of sub_modules. Here is the flattened netlist code for the multiple_modules:
module multiple_modules(a, b, c, y);
wire _0_;
wire _1_;
wire _2_;
wire _3_;
wire _4_;
wire _5_;
input a;
input b;
input c;
wire net1;
wire \u1.a ;
wire \u1.b ;
wire \u1.y ;
wire \u2.a ;
wire \u2.b ;
wire \u2.y ;
output y;
sky130_fd_sc_hd__and2_0 _6_ (
.A(_1_),
.B(_0_),
.X(_2_)
);
sky130_fd_sc_hd__lpflow_inputiso1p_1 _7_ (
.A(_4_),
.SLEEP(_3_),
.X(_5_)
);
assign _4_ = \u2.b ;
assign _3_ = \u2.a ;
assign \u2.y = _5_;
assign \u2.a = net1;
assign \u2.b = c;
assign y = \u2.y ;
assign _1_ = \u1.b ;
assign _0_ = \u1.a ;
assign \u1.y = _2_;
assign \u1.a = a;
assign \u1.b = b;
assign net1 = \u1.y ;
endmodule
The commands to get the hierarchical and flattened netlists is shown below:
yosys> write_verilog -noattr multiple_modules_hier.v
- Executing Verilog backend.
Dumping module
\multiple_modules'. Dumping module
\sub_module1'. Dumping module `\sub_module2'.
yosys> !gvim multiple_modules_hier.v
- Shell command: gvim multiple_modules_hier.v
yosys> flatten
- Executing FLATTEN pass (flatten design). Deleting now unused module sub_module1. Deleting now unused module sub_module2. <suppressed ~2 debug messages>
yosys> write_verilog -noattr multiple_modules_flat.v
- Executing Verilog backend. Dumping module `\multiple_modules'.
yosys> !gvim multiple_modules_flat.v
- Shell command: gvim multiple_modules_flat.v
This is the synthyesized circuit for a flattened netlist. Here u1 and u2 are flattened and directly or gates are realized.
Here is the synthesized circuit of sub_module1. We are also generating module level synthesis so that if there is a top module with multiple and same sub_modules, we can synthesize it once and can use and connect the same netlist multiple times in the top module netlist.
Another reason to generate module level synthesis and then stictch them together is to avoid errors in a top module if its massive and consists of several sub modules. Generating netlist for synthesis and then stiching it together in top level becomes easier and reduces risk of output mismatch.
We control this synthesis using synth -top <module_name> command
Various Flop coding styles and optimization
Why Flops and Flop coding styles
In this session, the discussion was about how to code various types of flops and various styles of coding a flop.
Why a Flop?
In a combinational circuit, the output changes after the propagation delay of the circuit once inputs are changed. During the propagation of data, if there are different paths with different propagation delays, there might be a chance of getting a glitch at the output.
If there are multiple combinational circuits in the design, the occurances of glitches are more thereby making the output unstable.
To curb this drawback, we are going for flops to store the data from the cominational circuits. When a flop is used, the output of combinational circuit is stored in it and it is propagated only at the posedge or negedge of the clock so that the next combinational circuit gets a glitch free input thereby stabilising the output.
We use initialize signals or control pins called set and reset on a flop to initialize the flop, other wise a garbage value to sent out to the next combinational circuit. These control pins can be synchronous or asynchronous.
d-flipflop with asynchronous reset- Here the output q goes low whenever reset is high and will not wait for the clock's posedge, i.e irrespective of clock, the output is changed to low.
module dff_asyncres ( input clk , input async_reset , input d , output reg q );
always @ (posedge clk , posedge async_reset)
begin
if(async_reset)
q <= 1'b0;
else
q <= d;
end
endmodule
Simulation:
Synthesized circuit:
d-flipflop with asynchronous set- Here the output q goes high whenever set is high and will not wait for the clock's posedge, i.e irrespective of clock, the output is changed to high.
module dff_async_set ( input clk , input async_set , input d , output reg q );
always @ (posedge clk , posedge async_set)
begin
if(async_set)
q <= 1'b1;
else
q <= d;
end
endmodule
Simulation:
Synthesized circuit:
d-flipflop with synchronous reset- Here the output q goes low whenever reset is high and at the positive edge of the clock. Here the reset of the output depends on the clock.
module dff_syncres ( input clk , input async_reset , input sync_reset , input d , output reg q );
always @ (posedge clk )
begin
if (sync_reset)
q <= 1'b0;
else
q <= d;
end
endmodule
Simulation:
Synthesized circuit:
d-flipflop with synchronous and asynchronbous reset- Here the output q goes low whenever asynchronous reset is high where output doesn't depend on clock and also when synchronous reset is high and posedge of clock occurs.
module dff_asyncres_syncres ( input clk , input async_reset , input sync_reset , input d , output reg q );
always @ (posedge clk , posedge async_reset)
begin
if(async_reset)
q <= 1'b0;
else if (sync_reset)
q <= 1'b0;
else
q <= d;
end
endmodule
Simulation:
Synthesized circuit:
Interesting optimisations
This lab session deals with some automatic and interesting optimisations of the circuits based on logic. In the below example, multiplying a number with 2 doesn't need any additional hardeware and only needs connecting the bits from a to y and grounding the LSB bit of y is enough and the same is realized by Yosys.
module mul2 (input [2:0] a, output [3:0] y);
assign y = a * 2;
endmodule
Synthesized circuit:
When it comes to multiplying with powers of 2, it just needs shifting as shown in the below image:
Netlist for the above schematic
Special case of multiplying a with 9. The result is shown in the below image:
The schematic for the same is shown below:
Netlist for the above schematic
Combinational logic optimization with examples
Optimising the combinational logic circuit is squeezing the logic to get the most optimized digital design so that the circuit finally is area and power efficient. This is achieved by the synthesis tool using various techniques and gives us the most optimized circuit.
Techniques for optimization:
- Constant propagation which is Direct optimizxation technique
- Boolean logic optimization using K-map or Quine McKluskey
Here is an example for Constant Propagation
In the above example, if we considor the trasnsistor level circuit of output Y, it has 6 MOS trasistors and when it comes to invertor, only 2 transistors will be sufficient. This is achieved by making A as contstant and propagating the same to output.
Example for Boolean logic optimization:
Let's consider an example concurrent statement assign y=a?(b?c:(c?a:0)):(!c)
The above expression is using a ternary operator which realizes a series of multiplexers, however, when we write the boolean expression at outputs of each mux and simplify them further using boolean reduction techniques, the outout y turns out be just ~(a^c)
Command to optimize the circuit by yosys is yosys> opt_clean -purge
Example-1
module opt_check (input a , input b , output y);
assign y = a?b:0;
endmodule
Optimized circuit
Example-2
module opt_check2 (input a , input b , output y);
assign y = a?1:b;
endmodule
Example-3
module opt_check3 (input a , input b, input c , output y);
assign y = a?(c?b:0):0;
endmodule
Example-4
module opt_check4 (input a , input b , input c , output y);
assign y = a?(b?(a & c ):c):(!c);
endmodule
Example- 5:Here there is multiple modules present so we will try to check whether those module are being used or not by using following commands:
yosys:read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
yosys:read_verilog multiple_module_opt2.v
yosys:synth -top multiple_module_opt2
yosys:abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
yosys:flatten
yosys:opt_clean -purge
yosys:show
module sub_module(input a , input b , output y);
assign y = a & b;
endmodule
module multiple_module_opt2(input a , input b , input c , input d , output y);
wire n1,n2,n3;
sub_module U1 (.a(a) , .b(1'b0) , .y(n1));
sub_module U2 (.a(b), .b(c) , .y(n2));
sub_module U3 (.a(n2), .b(d) , .y(n3));
sub_module U4 (.a(n3), .b(n1) , .y(y));
endmodule
Before Flatten
After Flatten
Example-6
module sub_module1(input a , input b , output y);
assign y = a & b;
endmodule
module sub_module2(input a , input b , output y);
assign y = a^b;
endmodule
module multiple_module_opt(input a , input b , input c , input d , output y);
wire n1,n2,n3;
sub_module1 U1 (.a(a) , .b(1'b1) , .y(n1));
sub_module2 U2 (.a(n1), .b(1'b0) , .y(n2));
sub_module2 U3 (.a(b), .b(d) , .y(n3));
assign y = c | (b & n1);
endmodule
Before Flatten
After Flatten
Sequential Logic Optimization with examples
Below are the various techniques used for sequential logic optimisations:
-Basic
- Sequential contant propagation
- Advanced
- State optimisation
- Retiming
- Sequential Logic Cloning (Floor Plan Aware Synthesis)
Sequential contant propagation- Here only the first logic can be optimized as the output of flop is always zero. However for the second flop, the output changes continuously, therefor it cannot be used for contant propagation.
State Optimisation: This is optimisation of unused state. Using this technique we can come up with most optimised state machine.
Cloning: This is done when performing PHYSICAL AWARE SYNTHESIS. Lets consider a flop A which is connected to flop B and flop C through a combination logic. If B and C are placed far from A in the flooerplan, there is a routing path delay. To avoid this, we connect A to two intermediate flops and then from these flops the output is sent to B and C thereby decreasing the delay. This process is called cloning since we are generating two new flops with same functionality as A.
Retiming: Retiming is a powerful sequential optimization technique used to move registers across the combinational logic or to optimize the number of registers to improve performance via power-delay trade-off, without changing the input-output behavior of the circuit.
Example-1
Here flop will be inferred as the output is not constant.
module dff_const1(input clk, input reset, output reg q);
always @(posedge clk, posedge reset)
begin
if(reset)
q <= 1'b0;
else
q <= 1'b1;
end
endmodule
Simulation
Synthesis
In the synthesis report, we'll see that a Dflop was inferred in this example.
Example-2
Here flop will not be inferred as the output is always high.
module dff_const2(input clk, input reset, output reg q);
always @(posedge clk, posedge reset)
begin
if(reset)
q <= 1'b1;
else
q <= 1'b1;
end
endmodule
Simulation
Synthesis
Example-3
module dff_const3(input clk, input reset, output reg q);
reg q1;
always @(posedge clk, posedge reset)
begin
if(reset)
begin
q <= 1'b1;
q1 <= 1'b0;
end
else
begin
q1 <= 1'b1;
q <= q1;
end
end
endmodule
Simulation*
Synthesis
Example4
module dff_const4(input clk, input reset, output reg q);
reg q1;
always @(posedge clk, posedge reset)
begin
if(reset)
begin
q <= 1'b1;
q1 <= 1'b1;
end
else
begin
q1 <= 1'b1;
q <= q1;
end
end
endmodule
Simulation*
Synthesis
Example5
module dff_const5(input clk, input reset, output reg q);
reg q1;
always @(posedge clk, posedge reset)
begin
if(reset)
begin
q <= 1'b0;
q1 <= 1'b0;
end
else
begin
q1 <= 1'b1;
q <= q1;
end
end
endmodule
Simulation*
Synthesis
Sequential optimisation of unused outputs
Example1
module counter_opt (input clk , input reset , output q);
reg [2:0] count;
assign q = count[0];
always @(posedge clk ,posedge reset)
begin
if(reset)
count <= 3'b000;
else
count <= count + 1;
end
endmodule
Synthesis
Updated counter logic-
module counter_opt (input clk , input reset , output q);
reg [2:0] count;
assign q = {count[2:0]==3'b100};
always @(posedge clk ,posedge reset)
begin
if(reset)
count <= 3'b000;
else
count <= count + 1;
end
endmodule
Synthesis
All the other blocks in synthesizer are for incrementing the counter but the output is only from the three input NOR gate.
GLS, Synthesis-Simulation mismatch and Blocking, Non-blocking statements
What is GLS- Gate Level Simulation?:
GLS is generating the simulation output by running test bench with netlist file generated from synthesis as design under test. Netlist is logically same as RTL code, therefore, same test bench can be used for it.
Why GLS?:
We perform this to verify logical correctness of the design after synthesizing it. Also ensuring the timing of the design is met.
Below picture gives an insight of the procedure. Here while using iverilog, we also include gate level verilog models to generate GLS simulation.
There are three main reasons for Synthesis Simulation Mismatch:
- Missing sensitivity list in always block
- Blocking vs Non-Blocking Assignments
- Non standard Verilog coding
Missing sensitivity list in always block:
If the consider - Example-2, we can see the only sel is mentioned in the sensitivity list. During the simulation, the waveforms will resemble a latched output but the simulation of netlist will not infer this as the synthesizer will only look at the statements with in the procedural block and not the sensitivity list.
As the synthesizer doen't look for sensitivity list and it looks only for the statements in procedural block, it infers correct circuit and if we simulate the netlist code, there will be a synthesis simulation mismatch.
To avoid the synthesis and simulation mismatch. It is very important to check the behaviour of the circuit first and then match it with the expected output seen in simulation and make sure there are no synthesis and simulation mismatches. This is why we use GLS.
Blocking vs Non-Blocking Assignments:
Blocking statements execute the statemetns in the order they are written inside the always block. Non-Blocking statements execute all the RHS and once always block is entered, the values are assigned to LHS. This will give mismatch as sometimes, improper use of blocking statements can create latches. Get to see at Example4
Lab- GLS Synth Sim Mismatch
Example-1 There is no mismatch in this example as the netlist simulation and rtl simulation waveform are similar only
module ternary_operator_mux (input i0 , input i1 , input sel , output y);
assign y = sel?i1:i0;
endmodule
Simulation
Synthesis
Netlist Simulation
module bad_mux (input i0 , input i1 , input sel , output reg y);
always @ (sel)
begin
if(sel)
y <= i1;
else
y <= i0;
end
endmodule
Simulation
Synthesis
Netlist Simulation
MISMATCH
Here first pic shows the netlist simulation which corrects the bad_mux design which was only changing waveform when sel was triggered while for a mux to work properly it should be sensitivity to all the input signals
Example-3
module good_mux (input i0 , input i1 , input sel , output reg y);
always @ (*)
begin
if(sel)
y <= i1;
else
y <= i0;
end
endmodule
Simulation
Synthesis
Netlist Simulation
Lab- Synthesis simulation mismatch blocking statement
Here the output is depending on the past value of x which is dependednt on a and b and it appears like a flop.
module blocking_caveat (input a , input b , input c, output reg d);
reg x;
always @ (*)
begin
d = x & c;
x = a | b;
end
endmodule
Simulation
Synthesis
Netlist Simulation
MISMATCH
Here this how the circuit should behave but this correct waveform is only obtained while doing netlist simulation. Here first pic show the netlist simulation which shows the proper working of the dut while the last pic shows the improper working of dut as we have used blocking statement here which causes synthesis simulation mismatch which is sorted out by GLS while providing netlist simulation
If and Case constructs
The construct if is mainly used to create priority logic. In a nested if else construct, the conditions are given priority from top to bottom. Only if the condition is satisfied, if statement is executed and the compiler comes out of the block. If condition fails, it checks for next condition and so on as shown below.
Syntax for nested if else
if (<condition 1>)
begin
-----------
-----------
end
else if (<condition 2>)
begin
-----------
-----------
end
else if (<condition 3>)
.
.
.
Dangers with IF:
If use a bad coding style i.e, using incomplete if else constructs will infer a latch. We definetly don't require an unwanted latch in a combinational circuit. When an incomplete construct is used, if all the conditions are failed, the input is latched to the output and hence we don't get desired output unless we need a latch.
This can be shown in below example:
Syntax
case(statement)
case1: begin
--------
--------
end
case2: begin
--------
--------
end
default:
endcase
In case construct, the execution checks for all the case statements and whichever satisfies the statement, that particular statement is executed.If there is no match, the default statement is executed. But here unlike if construct, the execution doesn't stop once statement is satisfied, but it continues further.
Caveats in Case
Caveats in case occur due to two reasons. One is incomplete case statements and the other is partial assignments in case statements.
Lab- Incomplete IF
This incomplete if construct forms a connection between i0 and output y i.e, D-latch with input as i1 and i0 will be the enable for it.
Example-1
module incomp_if (input i0 , input i1 , input i2 , output reg y);
always @ (*)
begin
if(i0)
y <= i1;
end
endmodule
Simulation
Synthesis
Example-2
The below code is equivalent to two 2:1 mux with i0 and i2 as select lines with i1 and i3 as inputs respectively. Here as well, the output is connected back to input in the form of a latch with an enable input of OR of i0 and i2.
module incomp_if2 (input i0 , input i1 , input i2 , input i3, output reg y);
always @ (*)
begin
if(i0)
y <= i1;
else if (i2)
y <= i3;
end
endmodule
Simulation
Synthesis
Lab- incomplete overlapping Case
Example-1
Thie is an example of incomplete case where other two combinations 10 and 11 were not included. This is infer a latch for the multiplexer and connect i2 and i3 with the output.
module incomp_case (input i0 , input i1 , input i2 , input [1:0] sel, output reg y);
always @ (*)
begin
case(sel)
2'b00 : y = i0;
2'b01 : y = i1;
endcase
end
endmodule
Simulator
Synthesis
Example-2- Complete case
This is the case of complete case statements as the default case is given. If the actual case statements don't execute, the compiler directly executes the default statements and a latch is not inferred.
module comp_case (input i0 , input i1 , input i2 , input [1:0] sel, output reg y);
always @ (*)
begin
case(sel)
2'b00 : y = i0;
2'b01 : y = i1;
default : y = i2;
endcase
end
endmodule
Simulation
Synthesis
Example-3
In the below example, y is present in all the case statements and it had particular outut for all cases. There no latch is inferred in case of y.
When it comes to x, it is not assigned for the input 01, therefore a latch is inferred here.
module partial_case_assign (input i0 , input i1 , input i2 , input [1:0] sel, output reg y , output reg x);
always @ (*)
begin
case(sel)
2'b00 : begin
y = i0;
x = i2;
end
2'b01 : y = i1;
default : begin
x = i1;
y = i2;
end
endcase
end
endmodule
Simulation
Synthesis
Example-4-Bad case construct
module bad_case (input i0 , input i1, input i2, input i3 , input [1:0] sel, output reg y);
always @(*)
begin
case(sel)
2'b00: y = i0;
2'b01: y = i1;
2'b10: y = i2;
2'b1?: y = i3;
//2'b11: y = i3;
endcase
end
endmodule
Simulation
Synthesis
Netlist simulation As we can see from the simulation wave form and difference in netlist waveform here the invalid case is getting fixed by the tool which we should avoid to do so in the code
For Loop and For Generate
For Loop
- For look is used in always block
- It is used for excecuting expressions alone
Generate For loop
- Generate for loop is used for instantaing hardware
- It should be used only outside always block
For loop can be used to generate larger circuits like 256:1 multiplexer or 1-256 demultiplexer where the coding style of smaller mux is not feesible and can have human errors since we would need to include huge number of combinations.
FOR Generate can be used to instantiate any number of sub modules with in a top module. For example, if we need a 32 bit ripple carry adder, instead of instantiating 32 full adders, we can write a generate for loop and connect the full adders appropriately.
Lab- For and For Generate
Example-1- Mux using generate
Here for loop is used to design a 4:1 mux. This can also be written using case or if else block, however, for a large size mux, only for loop model is feasible.
module mux_generate (input i0 , input i1, input i2 , input i3 , input [1:0] sel , output reg y);
wire [3:0] i_int;
assign i_int = {i3,i2,i1,i0};
integer k;
always @ (*)
begin
for(k = 0; k < 4; k=k+1) begin
if(k == sel)
y = i_int[k];
end
end
endmodule
Simulation
Synthesis
Netlist Simulation
Example-2-Demux using Case
module demux_case (output o0 , output o1, output o2 , output o3, output o4, output o5, output o6 , output o7 , input [2:0] sel , input i);
reg [7:0]y_int;
assign {o7,o6,o5,o4,o3,o2,o1,o0} = y_int;
integer k;
always @ (*)
begin
y_int = 8'b0;
case(sel)
3'b000 : y_int[0] = i;
3'b001 : y_int[1] = i;
3'b010 : y_int[2] = i;
3'b011 : y_int[3] = i;
3'b100 : y_int[4] = i;
3'b101 : y_int[5] = i;
3'b110 : y_int[6] = i;
3'b111 : y_int[7] = i;
endcase
end
endmodule
Simulation
Synthesis
Netlist Simulation
Example-3-Demux using Generate
The code in above example is big and also there is a chance of human error wile writing the code. However, using for loop as shown below, this drawback can be elimiated to a great extent.
module demux_generate (output o0 , output o1, output o2 , output o3, output o4, output o5, output o6 , output o7 , input [2:0] sel , input i);
reg [7:0]y_int;
assign {o7,o6,o5,o4,o3,o2,o1,o0} = y_int;
integer k;
always @ (*)
begin
y_int = 8'b0;
for(k = 0; k < 8; k++) begin
if(k == sel)
y_int[k] = i;
end
end
endmodule
Simulation
Synthesis
Netlist Simulation
Example-4- Ripple carry adder using fulladder
In this Ripple carry adder example, unlike instantiating fulladder for 8 times, generate for loop is used to instantiate the fulladder for 7 times and only for first full adder, it is instantiated seperately. Using the same code, just by changing bus sizes and condition of for loop, we can design any required size of ripple carry adder.
module rca (input [7:0] num1 , input [7:0] num2 , output [8:0] sum);
wire [7:0] int_sum;
wire [7:0]int_co;
genvar i;
generate
for (i = 1 ; i < 8; i=i+1) begin
fa u_fa_1 (.a(num1[i]),.b(num2[i]),.c(int_co[i-1]),.co(int_co[i]),.sum(int_sum[i]));
end
endgenerate
fa u_fa_0 (.a(num1[0]),.b(num2[0]),.c(1'b0),.co(int_co[0]),.sum(int_sum[0]));
assign sum[7:0] = int_sum;
assign sum[8] = int_co[7];
endmodule
module fa (input a , input b , input c, output co , output sum);
assign {co,sum} =a+b+c;
endmodule
Simulation
Synthesis
Netlist Simulation
I sciencerly thank Mr. Kunal Gosh(Founder/VSD) for helping me out to complete this flow smoothly.
- Kunal Ghosh, VSD Corp. Pvt. Ltd.
- Skywater Foundry
- Chatgpt
- Kanish R,Colleague,IIIT B
- Sumanto Kar,VSD Corp.
- DantuNandini,Senior,IIIT B
- Mariam Rakka
- Nanditha Rao, Professor, IIITB
- Madhav Rao, Professor, IIITB
- Manikandan,Professor,IIITB