This project is done in the course ["Advanced Physical Design using OpenLANE/Sky130"]
Introduction
With the introduction of open-source technology for chip creation, many RTL designs and EDA Tools were made available for free. The [SKY130 PDK] fills the gap in a whole Open source chip development.(https://skywater-pdk.readthedocs.io/en/latest/rules.html) from Skywater Technologies and Google. There were a number of EDA Tools with distinct functions throughout the design cycle. The design flow was not clear, and the Skywater pdk was only compatible with industrial equipment. These problems were addressed by [OpenLane](https://github.com/The-OpenROAD-Project/OpenLane), which offered a fully automated and tidy RTL to GDSII flow. OpenLane is not a product; rather, it is a flow made up of a number of EDA tools, automation scripts, and Skywater-pdks that have been optimized for use with open-source EDA tools.Overall Design Flow
An RTL design is created for a design specification using HDLs like Verilog or VHDL, or it can be created using high-level synthesis tools like SystemC, MATLAB HDL Coder, Bluespec, etc. The process of converting the RTL Netlist into a manufactured IC then starts, and is known as the Physical Design Flow. Floor planning, which entails placing preplaced cells, power planning, etc., comes first in the physical design process. The placement of logical synthesis comes next. So that the clock's skew is at a minimal or under the necessary threshold, we now perform CTS (Clock Tree Synthesis). Following CTS, all of the assembled components are routed. A process known as "Static Timing Analysis" is used between each and every step in the physical design flow, from logic synthesis through routing, to analyze the design at each stage and confirm that it is actually right. Magic is an open source application to view the layouts for every stage. You can extract a tiny netlist, run a SPICE simulation, and compare the results with the post-layout Simulation using ngspice.Physical Design begins with Floor planning - placing the preplaced cells, power planning etc., secondly Placement
OpenLane Flow
The RTL Level Design is then synthesized using a Logic Synthesizer. We use Yosys which is an Open Source Logic Synthesizer. The RTL Netlist is then converted into a synthesised netlist where there are details about the standard cells and its implementations. Yosys takes the RTL design and timing .libs and verilog models of standard cells and converts into a RTL Netlist. abc does the tehnology mapping to the required skywater-pdk variants
Different strategies can be used to synthesize for the either the least area or the best timing. To analyse this, synthesis exploration utility generates a report showing the effect on delays/timing/area et.,
This is used to suit the design configuration and generate reports with different metrics to select the best. This is also used for regression testing
This is an optional step carried out by Fault. It is used to test the design
This is done by OpenROAD flow. The macros and IPs are placed in the core before proceding further. This is called as pre-placement. Floor planning is done separately for the macros and it is called macro floor planning. They are placed in such a way that they are closer to the inputs/outputs/other macros where more connections are present. Then to prevent the loading effects de-coupling capacitors are placed so that the logic states are well within the noise margin.
When several blocks tap power from a single source, there is a problem of Voltage Droop at the Vdd and Ground Bounce at the Vss which can again push the logic out of the required noise margin into the undefined state. To mitigate this Vdd and Vss are placed as horizontal and vertical strips in the chip so that the blocks can tap power from the nearest source.
There are two types of placement. The other required logic is placed optimally. Placement is of two steps
- Global Placement- finds the optimal position for each cells. These positions are not necessarly correct, cells may overlap
- Detialed Placement - After Global placement is done minimal alterations are done to correct the issues
To ensure minimum skew the Clock is routed optimally through the circuit using different algorithms. This is done in the OpenROAD flow. This is done by TritonCTS.
Long wires acts as antennas and cause accumulation of charges during the fabrication process damaging the transistor. To avoid this bridging is used to pass the wire through different layers or an antenna diode cell is added to leak away the charges
- OpenLane approach - Insert Fake Diode to every cell input during placement. This matches the footprint of the library of the antenna diode. The Antenna Checker is run to check for violations, if there are violations then the fake diode is swapped with a real one.
- OpenROAD approach - In the global route step, the antenna violation is addressed automatically by inserting an antenan diode OpenLane allows the user to chose either of the above approaches
This step is used to implement the interconnect using the different metal layers specified in the PDK. There are two steps
- Global Routing - This is done inside the OpenROAD flow (FastRoute)
- Detailed Routing - This is performed using TritonRoute outside the OpenROAD flow after the global routing. Before performing this step the Logic Equivalence Check is performed by Yosys, since OpenROAD does some optimisations the circuit.
From the .def file, the parasitic extraction is done to generate the .spef file (Standard Prasitic Exchange Format) which produces an accurate analog model of the circuit by including the parasitic effects due to wires, parasitic capacitances, etc.,
At this stage again OpenSTA is used to perform the Static Timing Analysis.
- Design Rule Check (DRC) is performed by Magic
- Layout Versus Schematic (LVS) is performed by Netgen
The routed .def file is used my Magic to generate the GDSII file
Refer to Kanish R1 GIthub or [OpenLane build Script by Nikson Jose] for OpenLane installation and environment setup.If the installation is carried out on a Virtual Machine/Linux, the following repository can be used from reference (https://github.com/nickson-jose/openlane_build_script)
First go to the openlane directory and open terminal
make mount
The terminal changes into the docker instance. Open the OpenLane in interactive mode.
./flow.tcl -interactive
Set the package required by OpenLane
package require openlane 0.9
Run the synthesis
run_synthesis
OpenLane invokes the following
Yosys
- RTL Synthesis and maps to yosys generic cellsabc
- Technology mapping with the Skywater130 PDK. Heresky130_fd_sc_hd
Skywater Foundry produced High density standard cells are used.OpenSTA
- This does the Static Timing Analysis on the netlist generated after synthesis and generated the timing reports
View the synthesis statistics
- The flop ratio is defined as the ratio of the number of flops to the total number of cells
- Here flop ratio is 1596/10104 = 0.1579 (i.e: 15.8%) [From the synthesis statistics]
Chip Floor Planning Consideration
- The ratio of area occupied by the cells in the netlist to the total area of the core
- Best practice is to set the utilisation factor less than 50% so that there will be space for optimisations, routing, inserting buffers etc.,
- Aspect ratio is the ratio of height to the width of the die.
- Aspect Ratio of 1 indicates that the die is a square die
Floorplanning involves the following stages
-
Whenever there is a complex logic which is repeated multiple times or a design given by a third-party it can be perceived as abstract black box with input and output ports, clocks etc .,
-
These modules can be either macros or IP
- Macro - It is a module such as CPU Core which are developed by the entity fabicating the chip
- IP - It is an "Intellectual Propertly" which the entity fabricating the chip gets as a package from a third party or even packaged Hard IPs developed by the same entity. Common examples of IPs are SRAM, PLL, Protocol Converters etc.,
-
These Macros and IPs are placed in the core at first before placing the standard cells and power planning
-
These are optimally such that the cells which are more connected to each other are placed nearby and oriented for input and ouputs
- The power lines can have some RLC component causing the voltage to drop at the node where it enters the Blocks or the ground of the cell can be at a higher potential than ideally 0V
- When this happens, there is a chance such that the logic transitions are not to the upper or lower noise margins but to the forbidden state causing the circuit to misbehave
- This is prevented by adding a capacitor in parallel with the power and ground node of the block such that the capacitor decouples the block from the power source whenever there is a logic transition
- When there are several cells or blocks drawing power from the same power rail and sinking power to the same ground pin the following effects are observed
- Whenever there is alogic transition from 1 to 0 in a large number of cells then there is a Voltage Droop in the power lines as Voltage Drops from Vdd
- Whener there is a logic transition from 0 to 1 in a large number of cells simultaneously causes the ground potential to raise above 0V calles as Ground Bump
- These effects pose a risk of driving the logic state out of the specified noise margin.
- To avoid this the Vdd and Gnd are placed as a grid of horizontal and vertical tracks and the cell nearer to an intersection can tap power or sink power to the Vdd or Gnd intersection respectively
- The input, output and Clock pins are placed optimally such that there is less complication in routing or optimised delay
- There are different styles of pin placement in openlane like
random pin placement
,uniformly spaced
etc.,
Floorplan run on OpenLANE & review layout in Magic
Floorplan envrionment variables or switches:
FP_CORE_UTIL
- core utilization percentageFP_ASPECT_RATIO
- the cores aspect ratioFP_CORE_MARGIN
- The length of the margin surrounding the core areaFP_IO_MODE
- defines pin configurations around the core(1 = randomly equidistant/0 = not equidistant)FP_CORE_VMETAL
- vertical metal layer where I/O pins are placedFP_CORE_HMETAL
- horizontal metal layer where I/O pins are placed
Note: Usually, the parameter values for vertical metal layer and horizontal metal layer will be 1 more than that specified in the files
Importance files in increasing priority order:
floorplan.tcl
- System default settingsconifg.tcl
sky130A_sky130_fd_sc_hd_config.tcl
To run the picorv32a floorplan in openLANE:
run_floorplan
Post the floorplan run, a .def file will have been created within the results/floorplan
directory. We may review floorplan files by checking the floorplan.tcl.
The system defaults will have been overriden by switches set in conifg.tcl and further overriden by switches set in sky130A_sky130_fd_sc_hd_config.tcl.
To view the floorplan, Magic is invoked after moving to the results/floorplan directory:
magic -T /home/parallels/OpenLane/vsdstdcelldesign/libs/sky130A.tech lef read tmp/merged.nom.lef def read results/floorplan/picorv32a.def &
One can zoom into Magic layout by selecting an area with left and right mouse click followed by pressing "z" key.
Various components can be identified by using the what command in tkcon window after making a selection on the component.
Zooming in also provides a view of decaps present in picorv32a chip.
The standard cell can be found at the bottom left corner.
You can clearly see I/O pins, Decap cells and Tap cells. Tap cells are placed in a zig zag manner or you can say diagonally
Library Binding and Placement
First we need to bind the netlist with physical cells. We have shapes for OR, AND and every cell for pratice purpose. But in reality we dont have such shapes, we have give an physical dimensions like rectangles or squares weight and width. This information is given in libs and lefs. Now we place these cells in our design by initilaising it.
The next step is placement. Once we initial the design, the logic cells in netlist in its physical dimisoins is placed on the floorplan. Placement is perfomed in 2 stages:
Global Placement: Cells will be placed randomly in optimal positions which may not be legal and cells may overlap. Optimization is done through reduction of half parameter wire length. Detailed Placement: It alters the position of cells post global placement so as to legalise them. Legalisation of cells is important from timing point of view.
Optimization is stage where we estimate the lenght and capictance, based on that we add buffers. Ideally, Optimization is done for better timing.
Post placement, the design can be viewed on magic within results/placement directory:
magic -T /home/parallels/OpenLane/vsdstdcelldesign/libs/sky130A.tech lef read tmp/merged.nom.lef def read results/floorplan/picorv32a.def &
Note: Power distribution network generation is usually a part of the floorplan step. However, in the openLANE flow, floorplan does not generate PDN. It is created after post CTS. The steps are - floorplan, placement, CTS, Post CTS and then PDN
As we know, From logic synthesis to routing and STA, each and evry stage has one thing in common i.e., logic gates/ logic cells. In order for the tool understand these gates are and their timing, we need to characterize these cells.
Library is a place where we get information about every cell. It has differents cells with different size, functionality,threshold voltages. There is a typical cell design flow steps.
- Inputs : PDKS(process design kit) : DRC & LVS, SPICE Models, library & user-defined specs.
- Design Steps :Circuit design, Layout design (Art of layout Euler's path and stick diagram), Extraction of parasitics, Characterization (timing, noise, power).
- Outputs: CDL (circuit description language), LEF, GDSII, extracted SPICE netlist (.cir), timing, noise and power .lib files
A typical standard cell characterization flow that is followed in the industry includes the following steps:
- Read in the models and tech files
- Read extracted spice Netlist
- Recognise behavior of the cells
- Read the subcircuits
- Attach power sources
- Apply stimulus to characterization setup
- Provide neccesary output capacitance loads
- Provide neccesary simulation commands
Now all these 8 steps are fed in together as a configuration file to a characterization software called GUNA. This software generates timing, noise, power models. These .libs are classified as Timing characterization, power characterization and noise characterization.
In standard cell characterisation, One of the classification of libs is timing characterisation.
Timing defintion | Value |
---|---|
slew_low_rise_thr | 20% value |
slew_high_rise_thr | 80% value |
slew_low_fall_thr | 20% value |
slew_high_fall_thr | 80% value |
in_rise_thr | 50% value |
in_fall_thr | 50% value |
out_rise_thr | 50% value |
out_fall_thr | 50% value |
Propagation Delay The time difference between when the transitional input reaches 50% of its final value and when the output reaches 50% of its final value. Poor choice of threshold values lead to negative delay values. Even thought you have taken good threshold values, sometimes depending upon how good or bad the slew, the dealy might be still +ve or -ve.
Propagation delay = time(out_thr) - time(in_thr)
Transition Time
The time it takes the signal to move between states is the transition time , where the time is measured between 10% and 90% or 20% to 80% of the signal levels.
Rise transition time = time(slew_high_rise_thr) - time (slew_low_rise_thr)
Low transition time = time(slew_high_fall_thr) - time (slew_low_fall_thr)
CMOS inverter ngspice simulations
``ngspice`` is opesoure engine where simulations are done.- PnR is a iterative flow and hence, we can make changes to the environment variables in the fly to observe the changes in our design.
- Let us say If I want to change my pin configuration along the core from equvi distance randomly placed to someother placement, we just set that IO mode variable on command prompt as shown below
set ::env(FP_IO_MODE) 2
- Before performing a SPICE simulation we need to create SPICE Deck SPICE Deck provides information about the following:
- Component connectivity - Connectivity of the Vdd, Vss,Vin, substrate. Substrate tunes the threshold voltage of the MOS.
- component values - values of PMOS and NMOS, Output load, Input Gate Voltage, supply voltage.
- Node Identification and naming - Nodes are required to define the SPICE Netlist
For example
M1 out in vdd vdd pmos w = 0.375u L = 0.25u
,cload out 0 10f
- Simulation commands
- Model file - information of parameters related to transistors Simulation of CMOS using different width and lengths. From the waveform, irrespective of switching the shape of it are almost same.
From the waveform we can see the characteristics are maintained across all sizes of CMOS. So CMOS as a circuit is a robust device hence use in designing of logic gates. Parameters that define the robustness of the CMOS are
- The Switching Threshold of a CMOS inverter is the point where the Vin = Vout on the DC Transfer characreristics.
- At this point, both the transistors are in saturation region, means both are turned on and have high chances of current flowing driectly from VDD to Ground called Leakage current.
Through transient analysis, we calculate the rise and fall delays of the CMOS by SPICE Simulation. As we know delays are calculated at 50% of the final values.
- First, clone the required mag files and spicemodels of inverter,pmos and nmos sky130. The command to clone files from github link is:
git clone https://github.com/nickson-jose/vsdstdcelldesign.git
once I run this command, it will create vsdstdcelldesign
folder in openlane directory.
Inorder to open the mag file and run magic go to the directory
For layout we run magic command
magic -T sky130A.tech sky130_inv.mag &
Ampersand at the end makes the next prompt line free, otherwise magic keeps the prompt line busy. Once we run the magic command we get the layout of the inverter in the magic window
Inception of Layout and CMOS Fabrication Process
The 16-mask CMOS (Complementary Metal-Oxide-Semiconductor) fabrication process involves several crucial steps for creating integrated circuits. Let's break it down with some jargon:
-
Substrate Selection:
- In the initial phase, the appropriate semiconductor substrate is chosen.
-
Active Region Creation:
- To isolate the active regions for transistors, the process begins with the deposition of SiO2 and Si3N4 layers, followed by photolithography and silicon nitride etching.
- This is known as LOCOS (Local Oxidation of Silicon), where oxide is grown in certain regions.
- Subsequently, Si3N4 is removed using hot phosphoric acid.
-
N-Well and P-Well Formation:
- The N-well and P-well regions are created separately.
- P-well formation involves photolithography and ion implantation of p-type Boron material into the p-substrate.
- N-well is formed similarly with n-type Phosphorus material.
- High-temperature furnace processes drive-in diffusion to establish well depths, known as the tub process.
-
Gate Formation:
- The gate is a pivotal CMOS transistor terminal that controls threshold voltages for transistor switching.
- A polysilicon layer is deposited and photolithography techniques are applied to create NMOS and PMOS gates.
- Important parameters for gate formation include oxide capacitance and doping concentration.
-
Lightly Doped Drain (LDD) Formation:
- LDD is created to mitigate hot electron and short channel effects.
-
Source & Drain Formation:
- Thin oxide layers are added to avoid channel effects during ion implantation.
- N+ and P+ implants are performed using Arsenic implantation and high-temperature annealing.
-
Local Interconnect Formation:
- Thin screen oxide is removed through etching in HF solution.
- Titanium deposition through sputtering is initiated.
- Heat treatment results in chemical reactions, producing low-resistant titanium silicon dioxide for interconnect contacts and titanium nitride for top-level connections, enabling local communication.
-
Higher Level Metal Formation:
- To achieve suitable metal interconnects, non-planar surface topography is addressed.
- Chemical Mechanical Polishing (CMP) is utilized by doping silicon oxide with Boron or Phosphorus to achieve surface planarization.
- TiN and blanket Tungsten layers are deposited and subjected to CMP.
- An aluminum (Al) layer is added and subjected to photolithography and CMP.
- This constitutes the first level of interconnects, and additional interconnect layers are added to reach higher-level metal layers.
-
Dielectric Layer Addition:
- Finally, a dielectric layer, typically Si3N4, is applied to safeguard the chip.
This complex process results in the creation of advanced integrated circuits with multiple layers of interconnects, essential for modern electronic devices.
- From Layout, we see the layers which are required for CMOS inverter. Inverter is, PMOS and NMOS connected together.
- Gates of both PMOS and NMOS are connected together and fed to input(here ,A), NMOS source connected to ground(here, VGND), PMOS source is connected to VDD(here, VPWR), Drains of PMOS and NMOS are connected together and fed to output(here, Y).
The First layer in skywater130 is
localinterconnect layer(locali)
, above that metal 1 is purple color and metal 2 is pink color. If you want to see connections between two different parts, place the cursor over that area and press S one times. The tkson window gives the component name.
- The layout of a design is defined in a specific file called LEF.
- It includes design rules (tech LEF) and abstract information about the cells.
Tech LEF
- Technology LEF file contains information about the Metal layer, Via Definition and DRCs.Macro LEF
- Contains physical information of the cell such as its Size, Pin, their direction.
-
First we need to provide bounding box width and height in tkson window. lets say that width of BBOX is 1.38u and height is 2.72u. The command to give these values to magic is
property Fixed BBOX (0 0 1.32 2.72)
-
After this, Vdd, GND segments which are in metal 1 layer, their respective contacts and atlast logic gates layout is defined Inorder to know the logical functioning of the inverter, we extract the spice and then we do simulation on the spice. To extract it on spice we open TKCON window, the steps are
-
Know the present directory -
pwd
-
create an extration file - the command is
extract all
andsky130_inv.ext
files has been created -
create spice file using .ext file to be used with our ngspice tool - the commands are
ext2spice cthresh 0 rthresh 0
- extracts parasatic capcitances also since these are actual layers - nothing is created in the folderext2spice
- a filesky130_inv.spice
has been created.
SKY130 Tech File Labs
let us see what is inside the spice Deck In the spice file subcircuit(subckt), pmos and nmos node connections are defined
For NMOS XO Y A VGND VGND sky130_fd_pr_nfet_01v8
. The order is Cell_name Drain Gate Source Substrate model_name
.
For PMOS X1 Y A VPWR VPWR sky130_fd_pr_pfet_01v8
. The order is cell_name Drain Gate Source Substrate model_name
.
For transient anaylsis, we would like to define these following connections and extra nodes for these in spice file
- VGND to VSS
- Supply voltage from VPWR to Ground - extra nodes here will be 0 and VDD with a value of 3.3v
- sweep in/pulse between A pin and VGND (0)
Before, editing the file, make sure scaling is proper, we measure the value of the gride size from the magic layout and define using
.option scale=0.01u
in the Deck file.
Now keeping the connection in mind, define the required commands in the file. Along with this we need to include libs for nmos nshort.lib
and pmos pshort.lib
and define transient analysis commands too. We comment the subckt since we are trying to input the controls and transient analysis also. Model names are changed to nshort_model.0
and pshort_model.0
according to the libs of nmos and pmos.
These voltage sources and simulation commands are defined in the Deck file.
.include ./libs/pshort.lib .include ./libs/nshort.lib VDD VPWR 0 3.3V VSS VGND 0 0V Va A VGND PULSE(0V 3.3V 0 0.1ns 0.1ns 2ns 4ns) .tran 1n 20n .control run .endc .end
Spice Deck is done and now to run spice simulation invoke ngspice in the tool and pass the source file.
ngspice sky130_inv.spice
On the prompt you can see the values the ngspice has taken. To see the plot, use
plot y vs time a
characterization of the inverter standard cell depends on Four timing parameters
Rise Transition: Time taken for the output to rise from 20% to 80% of max value Fall Transition: Time taken for the output to fall from 80% to 20% of max value Cell Rise delay: difference in time(50% output rise) to time(50% input fall) Cell Fall delay: difference in time(50% output fall) to time(50% input rise)
The above timing parameters can be computed by noting down various values from the ngspice waveform.
Rise Transition : 2.25421 - 2.18636 = 0.006785 ns / 67.85ps
Fall Transitio : 4.09605 - 4.05554 = 0.04051ns/40.51ps
Cell Rise Delay : 2.21701 - 2.14989 = 0.06689ns/66.89ps
Cell Fall Delay : 4.07816 - 4.05011 = 0.02805ns/28.05ps
- In-depth overview of Magic's DRC engine
- Introduction to Google/Skywater DRC rules
- Lab : Warm-up exercise : Fixing a simple rule error
- Lab : Main exercie : Fixing or create a complex error
- setup to view the layouts
- For extracting and generating views, Google/skywater repo files were built with Magic
- Technology file dependency is more for any layout. hence, this file is created first.
- Since, Pdk is still under development, there are some unfinished tech files and these are packaged for magic along with lab exercise layout and bunch of stuff into the tar ball
We can download the packaged files from web using wget
command. wget stands for web get, a non-interactive file downloader command.
wget http://opencircuitdesign.com/open_pdks/archive/drc_tests.tgz
The archive file drc_tests.tgz is downloaded into our user directory
once extraction is done, drc_tests file is created and you will have all the information about magic layout for this lab exercise
Now run MAGIC
For better graphics use command magic -d XR
Now, lets see an example of simple failing set of rules of metal 1 layer. you can either run this by magic command line magic -d XR met1.mag
or from the magic console window, menu - file - open -load file9here, met1.mag)
We use following commands to see metal cut as shown.
cif see VIA2
First load the poly file by load poly.mag
on tkcon window.
Finding the error by mouse cursor and find the box area, Poly.9 is violated due to spacing between polyres and poly.
We find that distance between regular polysilicon & poly resistor should be 22um but it is showing 17um and still no errors . We should go to sky130A.tech file and modify as follows to detect this error.
In line
spacing npres *nsd 480 touching_illegal \
"poly.resistor spacing to N-tap < %d (poly.9)"
change to
spacing npres allpolynonres 480 touching_illegal \
"poly.resistor spacing to N-tap < %d (poly.9)"
Also,
spacing xhrpoly,uhrpoly,xpc alldiff 480 touching_illegal \
"xhrpoly/uhrpoly resistor spacing to diffusion < %d (poly.9)"
change to
spacing xhrpoly,uhrpoly,xpc allpolynonres 480 touching_illegal \
"xhrpoly/uhrpoly resistor spacing to diffusion < %d (poly.9)"
Timing Analysis and Clock Tree Synthesis (CTS)
During Placement, entire mag information is not necessary. Only the PR boundary, I/O ports, Power and ground rails of the cell is required. This information is defined in LEF file. The main objective is to extract lef from the mag file and plug into our design flow.
Track :A path or a line on which metal layers are drawn for routing. Track is used to define the height of the standard cell.
To implement our own stdcell, few guidelines must be followed
- I/O ports must lie on the intersection on Horizontal and vertical tracks
- Width and Height of standard cell are odd mutliples of Horizontal track pitch and Vertical track pitch
This information is defined in tracks.info
.
li1 X 0.23 0.46
li1 Y 0.17 0.34
before grid on:
To ensure that ports lie on the intersection point, the grid spacing in Magic (tkcon) must be changed to the li1 X and li1 Y values. After providing the command, we have following:
grid 0.46um 0.34um 0.23um 0.17um
However, certain properties and definitions need to be set to the pins of the cell. For LEF files, a cell that contains ports is written as a macro cell, and the ports are the declared as PINs of the macro.
The way to define a port is through Magic console and following are the steps:
- In Magic Layout window, first source the .mag file for the design (here inverter). Then Edit >> Text which opens up a dialogue box.
- When you double press S at the I/O lables, the text automatically takes the string name and size. Ensure the Port enable checkbox is checked and default checkbox is unchecked as shown in the figure:
-
In the above figure, The number in the textarea near enable checkbox defines the order in which the ports will be written in LEF file (0 being the first).
-
For power and ground layers, the definition could be same or different than the signal layer. Here, ground and power connectivity are taken from metal1
After defining ports, the next step is setting port class and port use attributes.
Select port A in magic:
port class input
port use signal
Select Y area
port class output
port use signal
Select VPWR area
port class inout
port use power
Select VGND area
port class inout
port use ground
Name the custom cell through tkcon window as sky130_vsdinv.mag
.
We generate lef file by command:
lef write
This generates sky130_vsdinv.lef file.
We have created a custom standard cell in previous steps of an inverter. Copy lef file, sky130_fd_sc_hd_typical.lib, sky130_fd_sc_hd_slow.lib & sky130_fd_sc_hd_fast.lib to src folder of picorv32a from libs folder vsdstdcelldesign. Then modify the config.tcl as follows.
# Design
set ::env(DESIGN_NAME) "picorv32a"
set ::env(VERILOG_FILES) "$::env(DESIGN_DIR)/src/picorv32a.v"
set ::env(CLOCK_PORT) "clk"
set ::env(CLOCK_NET) $::env(CLOCK_PORT)
set ::env(GLB_RESIZER_TIMING_OPTIMIZATIONS) {1}
set ::env(LIB_SYNTH) "$::env(OPENLANE_ROOT)/designs/picorv32a/src/sky130_fd_sc_hd__typical.lib"
set ::env(LIB_SLOWEST) "$::env(OPENLANE_ROOT)/designs/picorv32a/src/sky130_fd_sc_hd__slow.lib"
set ::env(LIB_FASTEST) "$::env(OPENLANE_ROOT)/designs/picorv32a/src/sky130_fd_sc_hd__fast.lib"
set ::env(LIB_TYPICAL) "$::env(OPENLANE_ROOT)/designs/picorv32a/src/sky130_fd_sc_hd__typical.lib"
set ::env(EXTRA_LEFS) [glob $::env(OPENLANE_ROOT)/designs/$::env(DESIGN_NAME)/src/*.lef]
set filename $::env(DESIGN_DIR)/$::env(PDK)_$::env(STD_CELL_LIBRARY)_config.tcl
if { [file exists $filename] == 1} {
source $filename
}
To integrate standard cell in openlane flow after make mount
, perform following commands:
prep -design picorv32a -tag RUN_2023.09.09_20.37.18 -overwrite
set lefs [glob $::env(DESIGN_DIR)/src/*.lef]
add_lefs -src $lefs
run_synthesis
synthesis report :
sta report:
Basically, Delay is a parameter that has huge impact on our cells in the design. Delay decides each and every other factor in timing.
For a cell with different size, threshold voltages, delay model table is created where we can it as timing table.
Delay of a cell depends on input transition and out load
.
Lets say two scenarios,
we have long wire and the cell(X1) is sitting at the end of the wire : the delay of this cell will be different because of the bad transition that caused due to the resistance and capcitances on the long wire.
we have the same cell sitting at the end of the short wire: the delay of this will be different since the tarn is not that bad comapred to the earlier scenario.
Eventhough both are same cells, depending upon the input tran, the delay got chaned. Same goes with o/p load also.
VLSI engineers have identified specific constraints when inserting buffers to preserve signal integrity. They've noticed that each buffer level must maintain consistent sizing, but their delays can vary depending on the load they drive. To address this, they introduced the concept of "delay tables," which essentially consist of 2D arrays containing values for input slew and load capacitance, each associated with different buffer sizes. These tables serve as timing models for the design.
When the algorithm works with these delay tables, it utilizes the provided input slew and load capacitance values to compute the corresponding delay values for the buffers. In cases where the precise delay data is not readily available, the algorithm employs a technique of interpolation to determine the closest available data points and extrapolates from them to estimate the required delay values.
We perform synthesis and found that it has positive slack and met timing constraints.
During Floorplan,504 endcaps, 6731 tapcells
got placed. Design has 275 original rows
Now run_placement
After placement, we check for legality &To check the layout invoke magic from the results/placement directory:
magic -T /home/parallels/OpenLane/vsdstdcelldesign/libs/sky130A.tech lef read tmp/merged.nom.lef def read results/floorplan/picorv32a.def &
Post-synthesis timing analysis Using OpenSTA
Timing analysis is carried out outside the openLANE flow using OpenSTA tool. For this, pre_sta.conf
is required to carry out the STA analysis. Invoke OpenSTA outside the openLANE flow as follows:
sta pre_sta.conf
sdc file for OpenSTA is modified like this:
base.sdc is located in vsdstdcelldesigns/extras directory. So, I copied it into our design folder using
cp my_base.sdc /home/parallels/OpenLane/designs/picorv32a/src/
Since I have no Violations I skipped this, but have hands on experience on timing analysis using OpenSTA.
Since clock is propagated only once we do CTS, In placement stage, clock is considered to be ideal. So only setup slack is taken into consideration before CTS.
Setup time: minimum time required for the data to be stable before the active edge of the clock to get properly captured.
Setup slack : data required time - data arrival time
clock is generated from PLL which has inbuilt circuit which cells and some logic. There might variations in the clock generation depending upon the ckt. These variations are collectivity known as clock uncertainity. In that jitter is one of the parameter. It is uncertain that clock might come at that exact time withought any deviation. That is why it is called clock_uncertainity Skew, Jitter and Margin comes into clock_uncertainity
Clock Jitter : deviation of clock edge from its original position.
From the timing report, we can improve slack by upsizing the cells i.e., by replacing the cells with high drive strength and we can see significant changes in the slack.
Clock Tree Synthesis using Tritoncts
Clock tree synthesis (CTS) can be implemented in various ways, and the choice of the specific technique depends on the design requirements, constraints, and goals. Here are some different types or approaches to clock tree synthesis:
Balanced Tree CTS: In a balanced tree CTS, the clock signal is distributed in a balanced manner, often resembling a binary tree structure. This approach aims to provide roughly equal path lengths to all clock sinks (flip-flops) to minimize clock skew. It's relatively straightforward to implement and analyze but may not be the most power-efficient solution.
H-tree CTS: An H-tree CTS uses a hierarchical tree structure, resembling the letter "H." It is particularly effective for distributing clock signals across large chip areas. The hierarchical structure can help reduce clock skew and optimize power consumption.
Star CTS: In a star CTS, the clock signal is distributed from a single central point (like a star) to all the flip-flops. This approach simplifies clock distribution and minimizes clock skew but may require a higher number of buffers near the source.
Global-Local CTS: Global-Local CTS is a hybrid approach that combines elements of both star and tree topologies. The global clock tree distributes the clock signal to major clock domains, while local trees within each domain further distribute the clock. This approach balances between global and local optimization, addressing both chip-wide and domain-specific clocking requirements.
Mesh CTS: In a mesh CTS, clock wires are arranged in a mesh-like grid pattern, and each flip-flop is connected to the nearest available clock wire. It is often used in highly regular and structured designs, such as memory arrays. Mesh CTS can offer a balance between simplicity and skew minimization.
Adaptive CTS: Adaptive CTS techniques adjust the clock tree structure dynamically based on the timing and congestion constraints of the design. This approach allows for greater flexibility and adaptability in meeting design goals but may be more complex to implement.
Impact: Crosstalk is a significant concern in VLSI design due to the high integration density of components on a chip. Uncontrolled crosstalk can lead to data corruption, timing violations, and increased power consumption. Mitigation: VLSI designers employ various techniques to mitigate crosstalk, such as optimizing layout and routing, using appropriate shielding, implementing proper clock distribution strategies, and utilizing clock gating to reduce dynamic power consumption when logic is idle
Purpose: In VLSI circuits, the clock distribution network is crucial for synchronous operation. Clock signals must reach all parts of the chip while minimizing skew and maintaining signal integrity. Shielding Techniques: VLSI designers may use shielding techniques to isolate the clock network from other signals, reducing the risk of interference. This can include dedicated clock routing layers, clock tree synthesis algorithms, and buffer insertion to manage clock distribution more effectively. Clock Domain Isolation: VLSI designs often have multiple clock domains. Shielding and proper clock gating help ensure that clock signals do not propagate between domains, avoiding metastability issues and maintaining synchronization.
In this stage clock is propagated and make sure that clock reaches each and every clock pin from clock source with mininimum skew and insertion delay. Inorder to do this, we implement H-tree using mid point strategy. For balancing the skews, we use clock invteres or bufferes in the clock path.
Before attempting to run CTS in TritonCTS tool, if the slack was attempted to be reduced in previous run, the netlist may have gotten modified by cell replacement techniques. Therefore, the verilog file needs to be modified using the write_verilog
command. Then, the synthesis, floorplan and placement is run again. To run CTS use the below command:
run_cts
After CTS run, my slack values are
setup:12.97,Hold:0.23
here my both values are not voilating
Since, clock is propagated, from this stage, we do timing analysis with real clocks. From now post cts analysis is performed by operoad within the openlane flow
openroad
read_lef <path of merge.nom.lef>
read_def <path of def>
write_db pico_cts.db
read_db pico_cts.db
read_verilog /home/parallels/OpenLane/designs/picorv32a/runs/RUN_09-09_11-20/results/synthesis/picorv32a.v
link_design picorv32a
read_liberty $::env(LIB_SYNTH_COMPLETE)
read_sdc /home/parallels/OpenLane/designs/picorv32a/src/my_base.sdc
set_propagated_clock (all_clocks)
report_checks -path_delay min_max -format full_clock_expanded -digits 4
Hold slack:
setup slack:
type this in openlane
echo $::env(CTS_CLK_BUFFER_LIST)
set $::env(CTS_CLK_BUFFER_LIST) [lreplace $::env(CTS_CLK_BUFFER_LIST) 0 0]
echo $::env(CTS_CLK_BUFFER_LIST)
After changing the files, load the placement stage def file and run cts again. Now, again run OpenROAD and create another db and everything else is same. Report after post_cts is
Setup slack - 2.2379 , Hold slack - 0.1869
Final steps in RTL2GDS
Routing is the process of establishing a physical connection between two pins. Algorithms designed for routing take source and target pins and aim to find the most efficient path between them, ensuring a valid connection exists.
The Maze Routing algorithm, such as the Lee algorithm, is one approach for solving routing problems. In this method, a grid similar to the one created during cell customization is utilized for routing purposes. The Lee algorithm starts with two designated points, the source and target, and leverages the routing grid to identify the shortest or optimal route between them.
The algorithm assigns labels to neighboring grid cells around the source, incrementing them from 1 until it reaches the target (for instance, from 1 to 7). Various paths may emerge during this process, including L-shaped and zigzag-shaped routes. The Lee algorithm prioritizes selecting the best path, typically favoring L-shaped routes over zigzags. If no L-shaped paths are available, it may resort to zigzag routes. This approach is particularly valuable for global routing tasks.
However, the Lee algorithm has limitations. It essentially constructs a maze and then numbers its cells from the source to the target. While effective for routing between two pins, it can be time-consuming when dealing with millions of pins. There are alternative algorithms that address similar routing challenges.
DRC verifies whether a design meets the predefined process technology rules given by the foundry for its manufacturing. DRC checking is an essential part of the physical design flow and ensures the design meets manufacturing requirements and will not result in a chip failure. It defines the Quality of chip. They are so many DRCs, let us see few of them
Design rules for physical wires
Minimum width of the wire Minimum spacing between the wires Minimum pitch of the wire To solve signal short violation, we take the metal layer and put it on to upper metal layer. we check via rules Via width via spacing
Power Distribution Network generation
Unlike the general ASIC flow, Power Distribution Network generation is not a part of floorplan run in OpenLANE. PDN must be generated after CTS and post-CTS STA analyses:
we can check whether PDN has been created or no by check the current def environment variable: echo $::env(CURRENT_DEF)
prep -design picorv32a -tag Run 12.07.10.11
gen_pdn
- Once the command is given, power distribution netwrok is generated.
- The power distribution network has to take the
design_cts.def
as the input def file. - Power rings,strapes and rails are created by PDN.
- From VDD and VSS pads, power is drawn to power rings.
- Next, the horizontal and vertical strapes connected to rings draw the power from strapes.
- Stapes are connected to rings and these rings are connected to std cells. So, standard cells get power from rails.
- The standard cells are designed such that it's height is multiples of the vertical tracks /track pitch.Here, the pitch is 2.72. Only if the above conditions are adhered it is possible to power the standard cells.
- There are definitions for the straps and the rails. In this design, straps are at metal layer 4 and 5 and the standard cell rails are at the metal layer 1. Vias connect accross the layers as required.
Routing
In the realm of routing within Electronic Design Automation (EDA) tools, such as both OpenLANE and commercial EDA tools, the routing process is exceptionally intricate due to the vast design space. To simplify this complexity, the routing procedure is typically divided into two distinct stages: Global Routing and Detailed Routing.
The two routing engines responsible for handling these two stages are as follows:
-
Global Routing: In this stage, the routing region is subdivided into rectangular grid cells and represented as a coarse 3D routing graph. This task is accomplished by the "FASTE ROUTE" engine.
-
Detailed Routing: Here, finer grid granularity and routing guides are employed to implement the physical wiring. The "tritonRoute" engine comes into play at this stage. "Fast Route" generates initial routing guides, while "Triton Route" utilizes the Global Route information and further refines the routing, employing various strategies and optimizations to determine the most optimal path for connecting the pins.
-
Initial Detail Routing: TritonRoute initiates the detailed routing process, providing the foundation for the subsequent routing steps.
-
Adherence to Pre-Processed Route Guides: TritonRoute places significant emphasis on following pre-processed route guides. This involves several actions:
-
Initial Route Guide Analysis: TritonRoute analyzes the directions specified in the preferred route guides. If any non-directional routing guides are identified, it breaks them down into unit widths.
-
Guide Splitting: In cases where non-directional routing guides are encountered, TritonRoute divides them into unit widths to facilitate routing.
-
Guide Merging: TritonRoute merges guides that are orthogonal (touching guides) to the preferred guides, streamlining the routing process.
-
Guide Bridging: When it encounters guides that run parallel to the preferred routing guides, TritonRoute employs an additional layer to bridge them, ensuring efficient routing within the preprocessed guides.
-
Assumes route guide for each net satisfy inter guide connectivity Same metal layer with touching guides or neighbouring metal layers with nonzero vertically overlapped area( via are placed ).each unconnected termial i.e., pin of a standard cell instance should have its pin shape overlapped by a routing guide( a black dot(pin) with purple box(metal1 layer))
-
In summary, TritonRoute is a sophisticated tool that not only performs initial detail routing but also places a strong emphasis on optimizing routing within pre-processed route guides by breaking down, merging, and bridging them as needed to achieve efficient and effective routing results.
Works on MILP(Mixed Integer linear programming) based panel routing scheme with Intra-layer parallel and Inter-layer sequential routing framework
Inputs : LEF, DEF, Preprocessed route guides
Output : Detailed routing solution with optimized wire length and via count
Constraints : Route guide honoring, connectivity constraints and design rules.
The space where the detailed route takes place has been defined. Now TritonRoute handles the connectivity in two ways.
Access Point(AP) : An on-grid point on the metal of the route guide, and is used to connect to lower-layer segments, pins or IO ports,upper-layer segments. Access Point Cluster(APC) : A union of all the Aps derived from same lower-layer segment, a pin or an IO port, upper-layer guide.
TritonRoute run for routing
Make sure the CURRENT_DEF is set to pdn.def
Start routing by using
run_routing
The options for routing can be set in the config.tcl file. The optimisations in routing can also be done by specifying the routing strategy to use different version of TritonRoute Engine. There is a trade0ff between the optimised route and the runtime for routing.
For the default setting picorv32a takes approximately 30 minutes according to the current version of TritonRoute.
Here drc violation is zero:
The design can be viewed on magic within results/routing directory. Run the follwing command in that directory:
magic -T /home/parallels/OpenLane/vsdstdcelldesign/libs/sky130A.tech lef read tmp/merged.nom.lef def read results/routing/picorv32a.def &
In tkcon type the follow command to check where sky130_vsdinv exist or not
getcell sky130_vsdinv(not working in some magic version)
what
expand
flip-flop to standard cell ratio = 1613/18508 = 0.0871
cd Desktop/work/tools/openlane_working_dir/OpenLane/
./flow.tcl -interactive
package require openlane 0.9
prep -design picorv32a
run_synthesis
run_floorplan
detailed_placement
run_cts
run_routing
cd Desktop/OpenLane
make mount
./flow.tcl -design picorv32a
I sciencerly thank Mr. Kunal Ghosh(Founder/VSD) for helping me out to complete this flow smoothly.
- Kunal Ghosh, VSD Corp. Pvt. Ltd.
- Chatgpt
- Kanish R,Colleague,IIIT B
- Pruthvi Parate,Colleague,IIIT B
- Bhargav DV,Colleague,IIIT B