Alwin_iiitb_asic_class

This github repository summarizes the progress made in the ASIC class. Quick links:

Day-0-Installation
Day-1-Introduction to Verilog RTL design and Synthesis
Day-2-Timing libs,hierarchical,flat synthesis,efficient flop coding styles
Day-3-Combinational and sequential optmizations
Day-4-GLS, blocking vs non-blocking and Synthesis-Simulation mismatch
Day-5-if, case, for loop and for generate
Word of Thanks
Reference

Day-0-Installation

Summary

I installed the needed tools.

Yosys

I installed Yosys using the following commands:

$ git clone https://github.com/YosysHQ/yosys.git
$ cd yosys-master 
$ sudo apt install make 
$ sudo apt-get install build-essential clang bison flex \
    libreadline-dev gawk tcl-dev libffi-dev git \
    graphviz xdot pkg-config python3 libboost-system-dev \
    libboost-python-dev libboost-filesystem-dev zlib1g-dev
$ make 
$ sudo make install

Below is the screenshot showing sucessful launch:

Iverilog

I installed iverilog using the following command:

sudo apt-get install iverilog

Below is the screenshot showing sucessful launch:

Gtkwave

I installed gtkwave using the following command:

sudo apt-get install gtkwave

Below is the screenshot showing sucessful launch:

Ngspice

I downloaded the tarball from https://sourceforge.net/projects/ngspice/files/ to a local directory and unpacked it using the following commands:

tar -zxvf ngspice-40.tar.gz
cd ngspice-40
mkdir release
cd release
../configure  --with-x --with-readline=yes --disable-debug
make
sudo make install

Below is the screenshot showing sucessful launch:

Magic

I installed magic using the following commands:

sudo apt-get install m4
sudo apt-get install tcsh
sudo apt-get install csh
sudo apt-get install libx11-dev
sudo apt-get install tcl-dev tk-dev
sudo apt-get install libcairo2-dev
sudo apt-get install mesa-common-dev libglu1-mesa-dev
sudo apt-get install libncurses-dev

Below is the screenshot showing sucessful launch:

OpenSTA

I installed and built OpenSTA (including the needed packages) using the following commands:

sudo apt-get install cmake clang gcctcl swig bison flex
git clone https://github.com/The-OpenROAD-Project/OpenSTA.git
cd OpenSTA
mkdir build
cd build
cmake ..
make

Below is the screenshot showing sucessful launch:

OpenLANE

I installed and built OpenLANE (including the needed packages) using the following commands:

sudo apt-get update
sudo apt-get upgrade
sudo apt install -y build-essential python3 python3-venv python3-pip make git

sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update

sudo apt install docker-ce docker-ce-cli containerd.io

sudo docker run hello-world

sudo groupadd docker
sudo usermod -aG docker $USER
sudo reboot 

# After reboot
docker run hello-world

Below is the screenshot showing sucessful launch:

Day-1-Introduction to Verilog RTL design and Synthesis

Introduction to Verilog RTL design and Synthesis

RTL Design: In simple terms RTL design or Register Transfer Level design is a method in which we can transfer data from one register to another. In RTL design we write code for Combinational and Sequential circuits in HDL(Hardware Description Language) like Verilog or VerilogHDL which can model logical and hardware operation. RTL design can be one code or set of verilog codes. One key note is that we need to write RTL design with optimized and synthesizable (realizable as physical gates).

Sample RTL design outline:

module module_name (port list);
	//declarations;
	//initializations;
	//continuos concurrent assigments;
	//procedural blocks;
endmodule

Test Bench: Using Verilog we can write a test bench to apply stimulus to the RTL design and verify the results of the design by instantiating design with in test bench. Up-front verification becomes very important as design size increases in size and complexity while any project progresses. This ensures simulation results matches with post synthesis results. A test bench can have two parts, the one generates input signals for the model to be tested while the other part checks the output signals from the design under test. It can be represented as follows.

Simulation: RTL design is checked for adherence to its design specification using simulation by giving sample inputs. This helps finding and fixing bugs in the RTL design in the early stages of design development.

Simulator: Simulator is the tool used for this process. It looks for changes on input signals to evaluate outputs. No change in output if there is no change in input signals Here is the flow of frondend design:

Introduction to open source simulator iverilog and gtkwave

iverilog: iverilog stands for Icarus Verilog. Icarus Verilog is an implementation of the Verilog hardware description language. It supports the 1995, 2001 and 2005 versions of the standard, portions of SystemVerilog, and some extensions.

Gtkwave: GTKWave is a fully featured GTK+ based wave viewer for Unix, Win32, and Mac OSX which reads LXT, LXT2, VZT, FST, and GHW files as well as standard Verilog VCD/EVCD files and allows their viewing.

Lab examples using iverilog and gtkwave

We were introducted to Linux operating system and were made aware of the basic commands. Using git clone command we've cloned library files like standard cell library, primitives which are used for synthesis and few verilog codes for practice. In this session, I've performed simulation of multiplexer. I've added both the RTL design code and test bench code in iverilog to generate vcd file which I used in gtkwave generator to get the output waveformes after simulation. The output was generated by taking the inputs from the testbench code.

Here is the code :

module good_mux (input i0 , input i1 , input sel , output reg y); 
	always @ (*)
	begin
		if(sel)
		y <= i1;
		else 
		y <= i0;
	end
endmodule


`timescale 1ns / 1ps
module tb_good_mux;
// Inputs
reg i0,i1,sel;
// Outputs
wire y;
  		// Instantiate the Unit Under Test (UUT), name based instantiation
	good_mux uut (.sel(sel),.i0(i0),.i1(i1),.y(y));
	//good_mux uut (sel,i0,i1,y);  //order based instantiation
initial begin
	$dumpfile("tb_good_mux.vcd");
	$dumpvars(0,tb_good_mux);
	// Initialize Inputs
	sel = 0;
	i0 = 0;
	i1 = 0;
	#300 $finish;
end
always #75 sel = ~sel;
always #10 i0 = ~i0;
always #55 i1 = ~i1;
endmodule

Introduction to Yosys synthesizer

Synthesis: Synthesis transforms the simple RTL design into a gate-level netlist with all the constraints as specified by the designer. In simple language, Synthesis is a process that converts the abstract form of design to a properly implemented chip in terms of logic gates.

Synthesis takes place in multiple steps:

Converting RTL into simple logic gates.
Mapping those gates to actual technology-dependent logic gates available in the technology libraries.
Optimizing the mapped netlist keeping the constraints set by the designer intact.

Synthesizer: It is a tool we use to convert out RTL design code to netlist. Yosys is the tool I've used in this workshop. Here is the flow of above processess.

Yosys:Yosys is a framework for RTL synthesis and more. It currently has extensive Verilog-2005 support and provides a basic set of synthesis algorithms for various application domains. Yosys is the core component of most our implementation and verification flows.

I was given an overview of the operation of the tool and the files we'll need to provide the tool to give the required netlist. We give RTL design code, .lib file which has all the building blocks of the netlist. Using these two files, Yosys synthesizer generates a netlist file. .lib basically is a collection of logical modules like, And, Or, Not etc.... These are equivalent gate level representation of the RTL code.

Below are the commands to perform above synthesis.

RTL Design - read_verilog
.lib - read_liberty
netlist file- write_verilog

Operational flow of Yosys Synthesizer

Verification of Synthesized design: In order to make sure that there are no errors in the netlist, we'll have to verify the synthesized circuit. The netlist verification flow can be seen in the below image:

The gtkwave output for the netlist should match the output waveform for the RTL design file. As netlist and design code have same set of inputs and outputs, we can use the same testbench and compare the waveforms.

Introduction to loigc synthesis: Below is the snippet RTL code and equivalent digital circuit:

In the above image, mapping of code and digital circuit is done using Synthesis.

.lib: It is a collection of logical modules like, And, Or, Not etc...It has different flvors of same gate like 2 input AND gate, 3 input AND gate etc... with different performace speed.

Need for different flavours of gate: In order to make a faster circuit, the clock frequency should be high. For that the time period of the clock should be as low as possible. However, in a sequential circuit, clock period depends on three factors so that data is not lost or to be glitch free.

For the below circuit the three factors are

Clock to Q of flipflop A
Propagation delay of combinational circuit
Setuptime of flipflop B

The equation is as follows

As per the above equation, for a smaller propagation delay, we need faster cells. But again, why do we have faster cells? This is to ensure that there are no HOLD time violations at B flipflop. This complete collection forms .lib

Faster Cells vs Slower Cells: Load in digital circuit is of Capacitence. Faster the charging or dicharging of capacitance, lesser is the celll delay. However, for a quick charge/ discharge of capacitor, we need transistors capable of sourcing more current i.e, we need WIDE TRANSISTORS.

Wider transistors have lesser delay but consume more area and power. Narrow transistors are other way around. Faster cells come with a cost of area and power.

Selection of the Cells: We'll need to guide the Synthesizer to choose the flavour of cells that is optimum for implementation of logic circuit. Keeping in view of previous observations of faster vs slower cells,to avoid hold time violations, larger circuits, sluggish circuits, we offer guidance to synthesizer in the form of Constraints.

Below is an illustration of Synthesis.

Labs on Yosys introduction

Invoking Yosys:

Snippet below illustrates reading .lib, design and choosing the module to synthesize:

Generating Netlist: The logic of good_mux will be realizable using gates in the sky130_fd_sc_hd__tt_025C_1v80.lib file

Below is the snippet showing the synthesis results and synthesized circuit for multiplexer.

Netlist code:

Simplified netlist code: This code consisits of additional switch. To further simplify, we use below command

$ yosys
yosys> read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
yosys> read_verilog good_mux.v 
yosys> synth -top good_mux 
yosys> abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
yosys> show

yosys> write_verilog good_mux_netlist.v 
yosys> !vim good_mux_netlist.v 

yosys> write_verilog -noattr good_mux_netlist.v
yosys> !vim good_mux_netlist.v

Day-2-Timing libs, hierarchical, flat synthesis, efficient flop coding styles

Introduction to timing .libs

LAB- Introduction to dot Lib

This lab guides us through the .lib files where we have all the gates coded in. According to the below parameters the libraries will be characterized to model the variations.

With in the lib file, the gates are delared as follows to meet the variations due to process, temperatures and voltages.

For the above example, for all the 32 cominations i.e 2^5 (5 is no.of variables), the delay, power and all the related parameters for each gate are mentioned.

This image displays the power consumtion comparision.

Below image is the delay order for the different flavor of gates.

LAB- Hierarchical synthesis and flat synthesis

multiple_module

module sub_module2 (input a, input b, output y);
	assign y = a | b;
endmodule

module sub_module1 (input a, input b, output y);
	assign y = a&b;
endmodule


module multiple_modules (input a, input b, input c , output y);
wire net1;
sub_module1 u1(.a(a),.b(b),.y(net1));  //net1 = a&b
sub_module2 u2(.a(net1),.b(c),.y(y));  //y = net1|c ,ie y = a&b + c;
endmodule

This is the schematic as per the connections in the above module.

However, the yosys synthesizer generates the following schematic instead of the above one and with in the submodules, the connections are made

$ yosys
yosys> read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
yosys> read_verilog multiple_modules.v
yosys> synth -top multiple_modules
yosys> show multiple_modules

The synthesizer considers the module hierarcy and does the mapping accordting to instantiation. Here is the hierarchical netlist code for the multiple_modules:

module multiple_modules(a, b, c, y);
	  input a;
	 input b;
	 input c;
	  wire net1;
	 output y;
  sub_module1 u1 (.a(a),.b(b),.y(net1) );
  sub_module2 u2 (.a(net1),.b(c),.y(y));
endmodule

module sub_module1(a, b, y);
 wire _0_;
 wire _1_;
 wire _2_;
 input a;
 input b;
 output y;
 sky130_fd_sc_hd__and2_0 _3_ (.A(_1_),.B(_0_),.X(_2_));
 assign _1_ = b;
 assign _0_ = a;
 assign y = _2_;
endmodule

module sub_module2(a, b, y);
wire _0_;
 wire _1_;
 wire _2_;
input a;
input b;
 output y;
 sky130_fd_sc_hd__lpflow_inputiso1p_1 _3_ (.A(_1_),.SLEEP(_0_),.X(_2_) );
 assign _1_ = b;
 assign _0_ = a;
 assign y = _2_;
endmodule

Flattened netlist:

In flattened netlist, the hierarcies are flattend out and there is single module i.e, gates are instantiated directly instead of sub_modules. Here is the flattened netlist code for the multiple_modules:

module multiple_modules(a, b, c, y);
	 wire _0_;
	 wire _1_;
	 wire _2_;
	 wire _3_;
	 wire _4_;
	 wire _5_;
	 input a;
	 input b;
	 input c;
	 wire net1;
	 wire \u1.a ;
	 wire \u1.b ;
	 wire \u1.y ;
	 wire \u2.a ;
	 wire \u2.b ;
	 wire \u2.y ;
	output y;
	 sky130_fd_sc_hd__and2_0 _6_ (
	  .A(_1_),
	 .B(_0_),
	 .X(_2_)
	);
	 sky130_fd_sc_hd__lpflow_inputiso1p_1 _7_ (
	  .A(_4_),
	  .SLEEP(_3_),
	  .X(_5_)
	 );
	 assign _4_ = \u2.b ;
	 assign _3_ = \u2.a ;
	 assign \u2.y  = _5_;
	 assign \u2.a  = net1;
	 assign \u2.b  = c;
	 assign y = \u2.y ;
	 assign _1_ = \u1.b ;
	 assign _0_ = \u1.a ;
	 assign \u1.y  = _2_;
	 assign \u1.a  = a;
	 assign \u1.b  = b;
	 assign net1 = \u1.y ;
	endmodule

The commands to get the hierarchical and flattened netlists is shown below:

yosys> write_verilog -noattr multiple_modules_hier.v

Executing Verilog backend. Dumping module \multiple_modules'. Dumping module \sub_module1'. Dumping module `\sub_module2'.

yosys> !gvim multiple_modules_hier.v

Shell command: gvim multiple_modules_hier.v

yosys> flatten

Executing FLATTEN pass (flatten design). Deleting now unused module sub_module1. Deleting now unused module sub_module2. <suppressed ~2 debug messages>

yosys> write_verilog -noattr multiple_modules_flat.v

Executing Verilog backend. Dumping module `\multiple_modules'.

yosys> !gvim multiple_modules_flat.v

Shell command: gvim multiple_modules_flat.v

This is the synthyesized circuit for a flattened netlist. Here u1 and u2 are flattened and directly or gates are realized.

Here is the synthesized circuit of sub_module1. We are also generating module level synthesis so that if there is a top module with multiple and same sub_modules, we can synthesize it once and can use and connect the same netlist multiple times in the top module netlist.

Another reason to generate module level synthesis and then stictch them together is to avoid errors in a top module if its massive and consists of several sub modules. Generating netlist for synthesis and then stiching it together in top level becomes easier and reduces risk of output mismatch.

We control this synthesis using synth -top <module_name> command

Various Flop coding styles and optimization

Why Flops and Flop coding styles

In this session, the discussion was about how to code various types of flops and various styles of coding a flop.

Why a Flop?

In a combinational circuit, the output changes after the propagation delay of the circuit once inputs are changed. During the propagation of data, if there are different paths with different propagation delays, there might be a chance of getting a glitch at the output.
If there are multiple combinational circuits in the design, the occurances of glitches are more thereby making the output unstable.
To curb this drawback, we are going for flops to store the data from the cominational circuits. When a flop is used, the output of combinational circuit is stored in it and it is propagated only at the posedge or negedge of the clock so that the next combinational circuit gets a glitch free input thereby stabilising the output.

We use initialize signals or control pins called set and reset on a flop to initialize the flop, other wise a garbage value to sent out to the next combinational circuit. These control pins can be synchronous or asynchronous.

Lab- flop synthesis simulations

d-flipflop with asynchronous reset- Here the output q goes low whenever reset is high and will not wait for the clock's posedge, i.e irrespective of clock, the output is changed to low.

 module dff_asyncres ( input clk ,  input async_reset , input d , output reg q );
	always @ (posedge clk , posedge async_reset)
	begin
		if(async_reset)
			q <= 1'b0;
		else	
			q <= d;
	end
endmodule

Simulation:

Synthesized circuit:

d-flipflop with asynchronous set- Here the output q goes high whenever set is high and will not wait for the clock's posedge, i.e irrespective of clock, the output is changed to high.

module dff_async_set ( input clk ,  input async_set , input d , output reg q );
	always @ (posedge clk , posedge async_set)
	begin
		if(async_set)
			q <= 1'b1;
		else
			q <= d;
	end
endmodule

Simulation:

Synthesized circuit:

d-flipflop with synchronous reset- Here the output q goes low whenever reset is high and at the positive edge of the clock. Here the reset of the output depends on the clock.

module dff_syncres ( input clk , input async_reset , input sync_reset , input d , output reg q );
	always @ (posedge clk )
	begin
		if (sync_reset)
			q <= 1'b0;
		else	
			q <= d;
	end
endmodule

Simulation:

Synthesized circuit:

d-flipflop with synchronous and asynchronbous reset- Here the output q goes low whenever asynchronous reset is high where output doesn't depend on clock and also when synchronous reset is high and posedge of clock occurs.

module dff_asyncres_syncres ( input clk , input async_reset , input sync_reset , input d , output reg q );
	always @ (posedge clk , posedge async_reset)
	begin
		if(async_reset)
			q <= 1'b0;
		else if (sync_reset)
			q <= 1'b0;
		else	
			q <= d;
	end
endmodule

Simulation:

Synthesized circuit:

Interesting optimisations

This lab session deals with some automatic and interesting optimisations of the circuits based on logic. In the below example, multiplying a number with 2 doesn't need any additional hardeware and only needs connecting the bits from a to y and grounding the LSB bit of y is enough and the same is realized by Yosys.

module mul2 (input [2:0] a, output [3:0] y);
	assign y = a * 2;
endmodule

Synthesized circuit:

When it comes to multiplying with powers of 2, it just needs shifting as shown in the below image:

Netlist for the above schematic

Special case of multiplying a with 9. The result is shown in the below image:

The schematic for the same is shown below:

Netlist for the above schematic

Day3-Combinational and sequential optmizations

Combinational logic optimization with examples

Optimising the combinational logic circuit is squeezing the logic to get the most optimized digital design so that the circuit finally is area and power efficient. This is achieved by the synthesis tool using various techniques and gives us the most optimized circuit.

Techniques for optimization:

Constant propagation which is Direct optimizxation technique
Boolean logic optimization using K-map or Quine McKluskey

Here is an example for Constant Propagation

In the above example, if we considor the trasnsistor level circuit of output Y, it has 6 MOS trasistors and when it comes to invertor, only 2 transistors will be sufficient. This is achieved by making A as contstant and propagating the same to output.

Example for Boolean logic optimization:

Let's consider an example concurrent statement assign y=a?(b?c:(c?a:0)):(!c)

The above expression is using a ternary operator which realizes a series of multiplexers, however, when we write the boolean expression at outputs of each mux and simplify them further using boolean reduction techniques, the outout y turns out be just ~(a^c)

Command to optimize the circuit by yosys is yosys> opt_clean -purge

Example-1

module opt_check (input a , input b , output y);
	assign y = a?b:0;
endmodule

Optimized circuit

Example-2

module opt_check2 (input a , input b , output y);
	assign y = a?1:b;
endmodule

Example-3

module opt_check3 (input a , input b, input c , output y);
	assign y = a?(c?b:0):0;
endmodule

Example-4

module opt_check4 (input a , input b , input c , output y);
	assign y = a?(b?(a & c ):c):(!c);
endmodule

Example- 5:Here there is multiple modules present so we will try to check whether those module are being used or not by using following commands:

yosys:read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
yosys:read_verilog multiple_module_opt2.v
yosys:synth -top multiple_module_opt2
yosys:abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
yosys:flatten
yosys:opt_clean -purge
yosys:show

module sub_module(input a , input b , output y);
	assign y = a & b;
endmodule

module multiple_module_opt2(input a , input b , input c , input d , output y);
	wire n1,n2,n3;
	sub_module U1 (.a(a) , .b(1'b0) , .y(n1));
	sub_module U2 (.a(b), .b(c) , .y(n2));
	sub_module U3 (.a(n2), .b(d) , .y(n3));
	sub_module U4 (.a(n3), .b(n1) , .y(y));
endmodule

Before Flatten

After Flatten

Example-6

	module sub_module1(input a , input b , output y);
	 assign y = a & b;
	endmodule

	module sub_module2(input a , input b , output y);
	 assign y = a^b;
	endmodule

	module multiple_module_opt(input a , input b , input c , input d , output y);
	wire n1,n2,n3;
	sub_module1 U1 (.a(a) , .b(1'b1) , .y(n1));
	sub_module2 U2 (.a(n1), .b(1'b0) , .y(n2));
	sub_module2 U3 (.a(b), .b(d) , .y(n3));

	assign y = c | (b & n1); 
	endmodule

Before Flatten

After Flatten

Sequential Logic Optimization with examples

Below are the various techniques used for sequential logic optimisations:
-Basic

Sequential contant propagation
Advanced
- State optimisation
- Retiming
- Sequential Logic Cloning (Floor Plan Aware Synthesis)

Basic

Sequential contant propagation- Here only the first logic can be optimized as the output of flop is always zero. However for the second flop, the output changes continuously, therefor it cannot be used for contant propagation.

Advanced

State Optimisation: This is optimisation of unused state. Using this technique we can come up with most optimised state machine.

Cloning: This is done when performing PHYSICAL AWARE SYNTHESIS. Lets consider a flop A which is connected to flop B and flop C through a combination logic. If B and C are placed far from A in the flooerplan, there is a routing path delay. To avoid this, we connect A to two intermediate flops and then from these flops the output is sent to B and C thereby decreasing the delay. This process is called cloning since we are generating two new flops with same functionality as A.

Retiming: Retiming is a powerful sequential optimization technique used to move registers across the combinational logic or to optimize the number of registers to improve performance via power-delay trade-off, without changing the input-output behavior of the circuit.

Example-1
Here flop will be inferred as the output is not constant.

module dff_const1(input clk, input reset, output reg q);
	always @(posedge clk, posedge reset)
	begin
		if(reset)
			q <= 1'b0;
		else
			q <= 1'b1;
	end
endmodule

Simulation

Synthesis
In the synthesis report, we'll see that a Dflop was inferred in this example.

Example-2
Here flop will not be inferred as the output is always high.

module dff_const2(input clk, input reset, output reg q);
	always @(posedge clk, posedge reset)
	begin
		if(reset)
			q <= 1'b1;
		else
			q <= 1'b1;
	end
endmodule

Simulation

Synthesis

Example-3

	module dff_const3(input clk, input reset, output reg q);
	reg q1;

	always @(posedge clk, posedge reset)
	begin
		if(reset)
		begin
			q <= 1'b1;
			q1 <= 1'b0;
		end
		else
		begin
			q1 <= 1'b1;
			q <= q1;
		end
	end
	endmodule

Simulation*

Synthesis

Example4

	module dff_const4(input clk, input reset, output reg q);
	reg q1;

	always @(posedge clk, posedge reset)
	begin
		if(reset)
		begin
			q <= 1'b1;
			q1 <= 1'b1;
		end
	else
		begin
			q1 <= 1'b1;
			q <= q1;
		end
	end
	endmodule

Simulation*

Synthesis

Example5

	module dff_const5(input clk, input reset, output reg q);
	reg q1;
	always @(posedge clk, posedge reset)
		begin
			if(reset)
			begin
				q <= 1'b0;
				q1 <= 1'b0;
			end
		else
			begin
				q1 <= 1'b1;
				q <= q1;
			end
		end
	endmodule

Simulation*

Synthesis

Sequential optimisation of unused outputs

Example1

	module counter_opt (input clk , input reset , output q);
	reg [2:0] count;
	assign q = count[0];
	always @(posedge clk ,posedge reset)
	begin
		if(reset)
			count <= 3'b000;
		else
			count <= count + 1;
	end
	endmodule

Synthesis

Updated counter logic-

module counter_opt (input clk , input reset , output q);
	reg [2:0] count;
	assign q = {count[2:0]==3'b100};
	always @(posedge clk ,posedge reset)
	begin
	if(reset)
		count <= 3'b000;
	else
		count <= count + 1;
	end
endmodule

Synthesis

All the other blocks in synthesizer are for incrementing the counter but the output is only from the three input NOR gate.

Day-4-GLS,blocking vs non-blocking and Synthesis-Simulation mismatch

GLS, Synthesis-Simulation mismatch and Blocking, Non-blocking statements

GLS Concepts And Flow Using Iverilog

What is GLS- Gate Level Simulation?:
GLS is generating the simulation output by running test bench with netlist file generated from synthesis as design under test. Netlist is logically same as RTL code, therefore, same test bench can be used for it.

Why GLS?:
We perform this to verify logical correctness of the design after synthesizing it. Also ensuring the timing of the design is met.

Below picture gives an insight of the procedure. Here while using iverilog, we also include gate level verilog models to generate GLS simulation.

Synthesis Simulation Mismatch

There are three main reasons for Synthesis Simulation Mismatch:

Missing sensitivity list in always block
Blocking vs Non-Blocking Assignments
Non standard Verilog coding

Missing sensitivity list in always block:

If the consider - Example-2, we can see the only sel is mentioned in the sensitivity list. During the simulation, the waveforms will resemble a latched output but the simulation of netlist will not infer this as the synthesizer will only look at the statements with in the procedural block and not the sensitivity list.

As the synthesizer doen't look for sensitivity list and it looks only for the statements in procedural block, it infers correct circuit and if we simulate the netlist code, there will be a synthesis simulation mismatch.

To avoid the synthesis and simulation mismatch. It is very important to check the behaviour of the circuit first and then match it with the expected output seen in simulation and make sure there are no synthesis and simulation mismatches. This is why we use GLS.

Blocking vs Non-Blocking Assignments:

Blocking statements execute the statemetns in the order they are written inside the always block. Non-Blocking statements execute all the RHS and once always block is entered, the values are assigned to LHS. This will give mismatch as sometimes, improper use of blocking statements can create latches. Get to see at Example4

Lab- GLS Synth Sim Mismatch

Example-1 There is no mismatch in this example as the netlist simulation and rtl simulation waveform are similar only

module ternary_operator_mux (input i0 , input i1 , input sel , output y);
	assign y = sel?i1:i0;
endmodule

Simulation

Synthesis

Netlist Simulation

Example-2

module bad_mux (input i0 , input i1 , input sel , output reg y);
	always @ (sel)
	begin
		if(sel)
			y <= i1;
		else 
			y <= i0;
	end
endmodule

Simulation

Synthesis

Netlist Simulation

MISMATCH
Here first pic shows the netlist simulation which corrects the bad_mux design which was only changing waveform when sel was triggered while for a mux to work properly it should be sensitivity to all the input signals

Example-3

module good_mux (input i0 , input i1 , input sel , output reg y);
	always @ (*)
	begin
		if(sel)
			y <= i1;
		else 
			y <= i0;
	end
endmodule

Simulation

Synthesis

Netlist Simulation

Lab- Synthesis simulation mismatch blocking statement

Here the output is depending on the past value of x which is dependednt on a and b and it appears like a flop.

Example4

module blocking_caveat (input a , input b , input  c, output reg d); 
reg x;
always @ (*)
	begin
	d = x & c;
	x = a | b;
end
endmodule

Simulation

Synthesis

Netlist Simulation

MISMATCH

Here this how the circuit should behave but this correct waveform is only obtained while doing netlist simulation. Here first pic show the netlist simulation which shows the proper working of the dut while the last pic shows the improper working of dut as we have used blocking statement here which causes synthesis simulation mismatch which is sorted out by GLS while providing netlist simulation

Day-5- if, case, for loop and for generate

If and Case constructs

6.1.1 If construct

The construct if is mainly used to create priority logic. In a nested if else construct, the conditions are given priority from top to bottom. Only if the condition is satisfied, if statement is executed and the compiler comes out of the block. If condition fails, it checks for next condition and so on as shown below.

Syntax for nested if else

if (<condition 1>)
begin
-----------
-----------
end
else if (<condition 2>)
begin
-----------
-----------
end
else if (<condition 3>)
.
.
.

Dangers with IF:

If use a bad coding style i.e, using incomplete if else constructs will infer a latch. We definetly don't require an unwanted latch in a combinational circuit. When an incomplete construct is used, if all the conditions are failed, the input is latched to the output and hence we don't get desired output unless we need a latch.

This can be shown in below example:

Case construct

Syntax

case(statement)
  case1: begin
       --------
	 --------
	 end
 case2: begin
	     --------
	 --------
	 end
 default:
 endcase

In case construct, the execution checks for all the case statements and whichever satisfies the statement, that particular statement is executed.If there is no match, the default statement is executed. But here unlike if construct, the execution doesn't stop once statement is satisfied, but it continues further.

Caveats in Case
Caveats in case occur due to two reasons. One is incomplete case statements and the other is partial assignments in case statements.

Lab- Incomplete IF

This incomplete if construct forms a connection between i0 and output y i.e, D-latch with input as i1 and i0 will be the enable for it.
Example-1

module incomp_if (input i0 , input i1 , input i2 , output reg y);
always @ (*)
begin
	if(i0)
		y <= i1;
end
endmodule

Simulation

Synthesis

Example-2
The below code is equivalent to two 2:1 mux with i0 and i2 as select lines with i1 and i3 as inputs respectively. Here as well, the output is connected back to input in the form of a latch with an enable input of OR of i0 and i2.

module incomp_if2 (input i0 , input i1 , input i2 , input i3, output reg y);
	always @ (*)
	begin
		if(i0)
			y <= i1;
		else if (i2)
			y <= i3;
	end
endmodule

Simulation

Synthesis

Lab- incomplete overlapping Case

Example-1
Thie is an example of incomplete case where other two combinations 10 and 11 were not included. This is infer a latch for the multiplexer and connect i2 and i3 with the output.

module incomp_case (input i0 , input i1 , input i2 , input [1:0] sel, output reg y);
	always @ (*)
	begin
	case(sel)
		2'b00 : y = i0;
		2'b01 : y = i1;
	endcase
	end
endmodule

Simulator

Synthesis

Example-2- Complete case

This is the case of complete case statements as the default case is given. If the actual case statements don't execute, the compiler directly executes the default statements and a latch is not inferred.

module comp_case (input i0 , input i1 , input i2 , input [1:0] sel, output reg y);
always @ (*)
begin
	case(sel)
		2'b00 : y = i0;
		2'b01 : y = i1;
		default : y = i2;
	endcase
end
endmodule

Simulation

Synthesis

Example-3
In the below example, y is present in all the case statements and it had particular outut for all cases. There no latch is inferred in case of y. When it comes to x, it is not assigned for the input 01, therefore a latch is inferred here.

module partial_case_assign (input i0 , input i1 , input i2 , input [1:0] sel, output reg y , output reg x);
always @ (*)
begin
	case(sel)
		2'b00 : begin
			y = i0;
			x = i2;
			end
		2'b01 : y = i1;
		default : begin
	         	  x = i1;
			  y = i2;
		 	 end
	endcase
end
endmodule

Simulation

Synthesis

Example-4-Bad case construct

module bad_case (input i0 , input i1, input i2, input i3 , input [1:0] sel, output reg y);
always @(*)
begin
	case(sel)
		2'b00: y = i0;
		2'b01: y = i1;
		2'b10: y = i2;
		2'b1?: y = i3;
		//2'b11: y = i3;
	endcase
end
endmodule

Simulation

Synthesis

Netlist simulation As we can see from the simulation wave form and difference in netlist waveform here the invalid case is getting fixed by the tool which we should avoid to do so in the code

For Loop and For Generate

For Loop

For look is used in always block
It is used for excecuting expressions alone

Generate For loop

Generate for loop is used for instantaing hardware
It should be used only outside always block

For loop can be used to generate larger circuits like 256:1 multiplexer or 1-256 demultiplexer where the coding style of smaller mux is not feesible and can have human errors since we would need to include huge number of combinations.

FOR Generate can be used to instantiate any number of sub modules with in a top module. For example, if we need a 32 bit ripple carry adder, instead of instantiating 32 full adders, we can write a generate for loop and connect the full adders appropriately.

Lab- For and For Generate

Example-1- Mux using generate
Here for loop is used to design a 4:1 mux. This can also be written using case or if else block, however, for a large size mux, only for loop model is feasible.

module mux_generate (input i0 , input i1, input i2 , input i3 , input [1:0] sel  , output reg y);
	wire [3:0] i_int;
	assign i_int = {i3,i2,i1,i0};
	integer k;
always @ (*)
	begin
	for(k = 0; k < 4; k=k+1) begin
		if(k == sel)
		y = i_int[k];
		end
	end
endmodule

Simulation

Synthesis

Netlist Simulation

Example-2-Demux using Case

module demux_case (output o0 , output o1, output o2 , output o3, output o4, output o5, output o6 , output o7 , input [2:0] sel  , input i);
reg [7:0]y_int;
assign {o7,o6,o5,o4,o3,o2,o1,o0} = y_int;
integer k;
always @ (*)
begin
y_int = 8'b0;
case(sel)
	3'b000 : y_int[0] = i;
	3'b001 : y_int[1] = i;
	3'b010 : y_int[2] = i;
	3'b011 : y_int[3] = i;
	3'b100 : y_int[4] = i;
	3'b101 : y_int[5] = i;
	3'b110 : y_int[6] = i;
	3'b111 : y_int[7] = i;
endcase
end
endmodule

Simulation

Synthesis

Netlist Simulation

Example-3-Demux using Generate

The code in above example is big and also there is a chance of human error wile writing the code. However, using for loop as shown below, this drawback can be elimiated to a great extent.

module demux_generate (output o0 , output o1, output o2 , output o3, output o4, output o5, output o6 , output o7 , input [2:0] sel  , input i);
reg [7:0]y_int;
assign {o7,o6,o5,o4,o3,o2,o1,o0} = y_int;
integer k;
always @ (*)
begin
	y_int = 8'b0;
	for(k = 0; k < 8; k++) begin
		if(k == sel)
		y_int[k] = i;
	end
end
endmodule

Simulation

Synthesis

Netlist Simulation

Example-4- Ripple carry adder using fulladder

In this Ripple carry adder example, unlike instantiating fulladder for 8 times, generate for loop is used to instantiate the fulladder for 7 times and only for first full adder, it is instantiated seperately. Using the same code, just by changing bus sizes and condition of for loop, we can design any required size of ripple carry adder.

module rca (input [7:0] num1 , input [7:0] num2 , output [8:0] sum);
wire [7:0] int_sum;
wire [7:0]int_co;

genvar i;
generate
	for (i = 1 ; i < 8; i=i+1) begin
		fa u_fa_1 (.a(num1[i]),.b(num2[i]),.c(int_co[i-1]),.co(int_co[i]),.sum(int_sum[i]));
	end

endgenerate
fa u_fa_0 (.a(num1[0]),.b(num2[0]),.c(1'b0),.co(int_co[0]),.sum(int_sum[0]));


assign sum[7:0] = int_sum;
assign sum[8] = int_co[7];
endmodule

module fa (input a , input b , input c, output co , output sum);
assign {co,sum} =a+b+c;
endmodule

Simulation

Synthesis

Netlist Simulation

Word of Thanks

I sciencerly thank Mr. Kunal Gosh(Founder/VSD) for helping me out to complete this flow smoothly.

Acknowledgement

Kunal Ghosh, VSD Corp. Pvt. Ltd.
Skywater Foundry
Chatgpt
Kanish R,Colleague,IIIT B
Sumanto Kar,VSD Corp.
DantuNandini,Senior,IIIT B
Mariam Rakka
Nanditha Rao, Professor, IIITB
Madhav Rao, Professor, IIITB
Manikandan,Professor,IIITB

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alwin_iiitb_asic_class

Day-0-Installation

Day-1-Introduction to Verilog RTL design and Synthesis

Lab examples using iverilog and gtkwave

Labs on Yosys introduction

Day-2-Timing libs, hierarchical, flat synthesis, efficient flop coding styles

LAB- Introduction to dot Lib

Lab- flop synthesis simulations

Day3-Combinational and sequential optmizations

Basic

Advanced

Day-4-GLS,blocking vs non-blocking and Synthesis-Simulation mismatch

GLS Concepts And Flow Using Iverilog

Synthesis Simulation Mismatch

Example-2

Example4

Day-5- if, case, for loop and for generate

6.1.1 If construct

Case construct

Word of Thanks

Acknowledgement

Reference

About

Releases

Packages

alwinshaju08/Alwin_iiitb_asic_class

Folders and files

Latest commit

History

Repository files navigation

Alwin_iiitb_asic_class

Day-0-Installation

Day-1-Introduction to Verilog RTL design and Synthesis

Lab examples using iverilog and gtkwave

Labs on Yosys introduction

Day-2-Timing libs, hierarchical, flat synthesis, efficient flop coding styles

LAB- Introduction to dot Lib

Lab- flop synthesis simulations

Day3-Combinational and sequential optmizations

Basic

Advanced

Day-4-GLS,blocking vs non-blocking and Synthesis-Simulation mismatch

GLS Concepts And Flow Using Iverilog

Synthesis Simulation Mismatch

Example-2

Example4

Day-5- if, case, for loop and for generate

6.1.1 If construct

Case construct

Word of Thanks

Acknowledgement

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages