0% found this document useful (0 votes)
29 views84 pages

VLSI PD Master Reference Guide

The VLSI Physical Design Master Reference Guide provides comprehensive study notes covering foundational and advanced topics in VLSI design, including CMOS basics, synthesis, and physical design methodologies. It also includes a personalized skills roadmap and 130 interview questions with answers to aid in job preparation. This guide is tailored for Shivani Shetkar, an ASIC Physical Design Engineer, focusing on 14nm and 32nm technologies.

Uploaded by

Tara Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views84 pages

VLSI PD Master Reference Guide

The VLSI Physical Design Master Reference Guide provides comprehensive study notes covering foundational and advanced topics in VLSI design, including CMOS basics, synthesis, and physical design methodologies. It also includes a personalized skills roadmap and 130 interview questions with answers to aid in job preparation. This guide is tailored for Shivani Shetkar, an ASIC Physical Design Engineer, focusing on 14nm and 32nm technologies.

Uploaded by

Tara Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

VLSI Physical Design

Master Reference Guide


Study Notes · Advanced Topics · Skills Roadmap · 130 Interview Q&A;

Chapters 1–15 | Advanced Chapters A1–A7 | Interview Sections 1–15

CMOS Basics · Synthesis · Floorplan · Placement · CTS · Routing · STA · Low Power · DRC/LVS · FinFET · DFT · CDC ·
ECO · Scripting · Power Analysis

Tools: ICC-II · PrimeTime · Design Compiler · Calibre · Fusion Compiler · OpenLane

Prepared for: Shivani Shetkar · ASIC Physical Design Engineer · 14nm & 32nm

This is the complete VLSI Physical Design Master Reference — combining foundational study notes (Chapters
1–15), advanced topics (CDC, SI scripting, power analysis, mixed-signal, FinFET), a personalised job-readiness
skills plan, and 130 interview questions with full spoken answers. Use it to study, revise before interviews, and as
an ongoing reference during your UK job search.

Table of Contents

PART 1 Study Notes — Chapters 1 to 15

Ch 1 VLSI & CMOS Fundamentals Transistors · NMOS/PMOS · CMOS · Logic gates · Flip-flops

Ch 2 Complete ASIC Design Flow RTL to GDS · All stages · Key file formats

Ch 3 Logic Synthesis SDC constraints · Design Compiler · Multi-Vt

Floorplanning & Power


Ch 4 Die sizing · Macro placement · PDN · IR drop · EM
Planning

Ch 5 Placement Global/detailed · Blockages · Congestion · Scan reorder

Ch 6 Clock Tree Synthesis Skew · Latency · ICG · Useful skew · Hold violations

Ch 7 Routing DRC rules · LVS · Antenna effect · Crosstalk · NDR

Ch 8 Static Timing Analysis Setup/Hold derivation · MMMC · OCV/AOCV · CPPR · Closure

Ch 9 Low Power Design UPF · Power gating · Level shifters · Retention · DVFS

VLSI Physical Design Master Reference · Shivani Shetkar · Page 1


Ch 10 Physical Verification & Sign-off Calibre DRC/LVS · ERC · DFM · Formal · Tape-out checklist

Ch 11 Advanced Nodes — FinFET FinFET structure · Multi-patterning · EUV · PD implications

Ch 12 Design For Test (DFT) Scan chains · ATPG · MBIST · Boundary Scan · Reordering

Ch 13 ECO Methodology Timing ECO · Setup/hold fixes · 8-step process

Ch 14 EDA Tools Reference ICC-II · PrimeTime · DC · Calibre · OpenLane commands

Ch 15 Glossary & Formulas 18 key equations · 15 rules of thumb · 50-term glossary

PART 2 Advanced Topics — Chapters A1 to A7

A1 Clock Domain Crossing (CDC) Metastability · 2-flop sync · Gray code · Async FIFO

A2 Advanced STA GBA vs PBA · SI-aware STA · Half-cycle paths · CPPR deep dive

A3 Tcl Scripting for PD Variables · Collections · Procedures · Automation scripts

A4 Power Analysis & Thermal Voltus/RedHawk · IR/EM sign-off · Thermal management

A5 Advanced DFT Scan compression · EDT · At-speed testing · IJTAG

A6 Mixed-Signal PD Substrate noise · Guard rings · Analog/digital co-existence

A7 Skills & Job Readiness UK interview topics · Skill gaps · Learning plan · GitHub portfolio

PART 3 130 Interview Questions & Answers — Sections 1 to 15

S1 VLSI & CMOS Basics Transistors, CMOS, flip-flops — 7 Q&A;

S2 Logic Synthesis SDC, compile_ultra, timing reports — 5 Q&A;

S3 Floorplanning & Power PDN, IR drop, EM, macro placement — 5 Q&A;

S4 Placement place_opt, congestion, blockages, scan reorder — 4 Q&A;

S5 CTS Skew, hold fixes, ICG, useful skew, NDR — 6 Q&A;

S6 Routing route_opt, antenna, crosstalk, DRC, LVS — 5 Q&A;

S7 STA Deep Dive Setup/hold eqs, MMMC, OCV, CPPR, closure — 7 Q&A;

S8 Low Power UPF, level shifters, isolation, power gating, DVFS — 5 Q&A;

S9 DRC, LVS & Sign-off Calibre, ERC, metal fill, formal verification — 4 Q&A;

S10 DFT Scan chains, MBIST, PD responsibility — 3 Q&A;

S11 CDC Metastability, multi-bit buses, SDC constraints — 3 Q&A;

S12 Advanced Nodes FinFET, 14nm DRC, multi-patterning — 3 Q&A;

S13 Behavioural / Project Your projects, challenges, career — 5 Q&A;

S14 Rapid-Fire Technical 15 short-answer questions

S15 Questions to Ask Interviewer 12 strong questions to ask the company

VLSI Physical Design Master Reference · Shivani Shetkar · Page 2


Chapter 1: VLSI & CMOS Fundamentals

Transistors · CMOS · Logic Gates · Flip-Flops · Timing Basics

1.1 What is VLSI?

VLSI (Very Large Scale Integration) integrates millions to billions of transistors on a single silicon chip. Moore's Law
(1965) observed that transistor count doubles every ~2 years. Modern chips like Apple M2 contain 20 billion
transistors on a 5nm process node.

Era Level Transistors Example

1960s SSI < 100 74-series basic gates

1970s MSI 100 – 10K Counters, multiplexers

1975–85 LSI 10K – 100K Intel 8080 (6K), Z80 (8.5K)

1985–2000 VLSI 100K – 10M Pentium (3.1M), i486 (1.2M)

2000–now ULSI 10M – 100B+ Apple M2 (20B), NVIDIA H100 (80B)

1.2 Silicon & Doping Basics

Intrinsic Si Pure silicon. 4 valence electrons, crystal lattice. Very few free carriers at room temperature.
Poor conductor.

N-type doping Add phosphorus or arsenic (5 valence electrons) → extra free electron → n-type. Used for
NMOS channel.

P-type doping Add boron (3 valence electrons) → missing electron = 'hole' (positive carrier) → p-type.
Used for PMOS channel.

P-N junction Interface between P and N silicon. Allows current in one direction (diode). Basis of all
transistors.

1.3 The MOSFET Transistor

The MOSFET (Metal-Oxide-Semiconductor Field-Effect Transistor) is the fundamental switch. Understanding it is the
foundation of ALL Physical Design.

NMOS Cross-Section

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Gate (G)

■■■■■■■■■■■■■■■ Gate Oxide (SiO■ / High-k)

■■■■■■■■■■■■■■■ Polysilicon gate

n+ ■channel■ n+

■■■■■ ■■■■■

Source(S) Drain(D) ← n+ diffusion

■■■■■■■■■■■■■■■■■■■■

p-substrate (Body B)

When Vgs > Vt → electrons attracted → n-channel forms → ID flows

When Vgs < Vt → no channel → transistor OFF → ID ≈ 0

VLSI Physical Design Master Reference · Shivani Shetkar · Page 3


Property NMOS PMOS

Turn ON Vgs > +Vt (gate HIGH) Vgs < -|Vt| (gate LOW)

Pulls output LOW (strong '0') HIGH (strong '1')

Speed Faster (electron mobility ~2-3× hole mobility) Slower for same W

Body connection To GND To VDD

Symbol Arrow pointing IN Arrow pointing OUT

Threshold Voltage Variants (Multi-Vt)

Vt Type Vt (typical) Speed Leakage Use In PD

HVT ~0.5V Slow Low Non-critical paths — save power

SVT/RV ~0.4V Medium Medium Default for most cells


T

LVT ~0.3V Fast High Critical timing paths only

ULVT ~0.2V Fastest Highest Most critical paths only

MOSFET Operating Regions

Cut-off (OFF) Vgs < Vt. Channel absent. ID = leakage only. Transistor is OFF.

Linear (triode) Vgs > Vt, Vds < (Vgs-Vt). Full channel. Acts like a resistor. ID = µCox(W/L)[(Vgs-Vt)Vds -
Vds²/2].

Saturation (ON) Vgs > Vt, Vds ≥ (Vgs-Vt). Channel pinched off at drain. ID = ½µCox(W/L)(Vgs-Vt)². Max
current.

Subthreshold Vgs slightly < Vt. Weak inversion. Exponential leakage: ID ∝ exp(Vgs/nVT). Source of static
power.

■ PD relevance: larger W → more saturation current → faster switching → shorter cell delay. Driving a cell with low Vt (LVT)
reduces delay but increases leakage. PD engineers choose cell size AND Vt variant to close timing while minimising power.

1.4 CMOS Inverter

CMOS Inverter Circuit

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

VDD

■■■ PMOS (gate = Vin)

■■■■■■■ Vout

■■■ NMOS (gate = Vin)

GND

Vin = LOW: PMOS ON, NMOS OFF → Vout = VDD ('1')

Vin = HIGH: PMOS OFF, NMOS ON → Vout = GND ('0')

Steady state: always one device OFF → no DC path → P_static ≈ 0

VLSI Physical Design Master Reference · Shivani Shetkar · Page 4


P_dynamic = α · C_load · VDD² · f

P_static = VDD · I_leakage [dominant at advanced nodes, idle state]

■ Key CMOS advantage: static power ≈ 0. In steady state, one transistor is always OFF, blocking the DC path from VDD to
GND. Power is consumed only during switching transitions.

1.5 Logic Gates in CMOS

NAND2: PMOS in PARALLEL (pull-up), NMOS in SERIES (pull-down)

NOR2: PMOS in SERIES (pull-up), NMOS in PARALLEL (pull-down)

NAND2 truth table: NOR2 truth table:

A B ■ Y = NOT(A·B) A B ■ Y = NOT(A+B)

0 0 ■ 1 0 0 ■ 1

0 1 ■ 1 0 1 ■ 0

1 0 ■ 1 1 0 ■ 0

1 1 ■ 0 1 1 ■ 0

NAND/NOR are preferred in CMOS synthesis — they use transistors efficiently.

AND = NAND + INV (one extra cell). OR = NOR + INV.

Transistor count per gate (28nm standard cell, W=1)

Gate NMOS count PMOS count Total Notes

INV (X1) 1 1 2 Simplest cell

NAND2 2 series 2 parallel 4 NMOS drives output LOW when both A=B=1

NOR2 2 parallel 2 series 4 PMOS drives output HIGH when both A=B=0

NAND3 3 series 3 parallel 6 Each NMOS series = more resistance → upsize

AND2 3 (NAND+INV) 3 6 = NAND2 + INV

DFF ~8 ~8 ~16 Latch + clock logic

ICG (clock gate)


~4 ~4 ~8 Latch + AND for glitch-free gating

1.6 D Flip-Flop — Memory and Timing

D Flip-Flop Timing Diagram

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

CLK: ■■■ ■■■■ ■■■■ ■■■

■■■■ ■■■■ ■■■■

D: ■■[ 0 ][ 1 ][ 0 ]■■

Q: ■■■■■■■■[ 0 ][ 1 ][ 0 ]■■

↑ ↑ ↑

rising edge captures D → Q after Tcq delay

Setup window: D must be stable ≥ Tsu BEFORE clock edge

Hold window: D must remain stable ≥ Th AFTER clock edge

|← Tsu →|← Th →|

D: ■■■■■■■■■[stable]■■■■■■■■■■■

CLK: ↑ edge

VLSI Physical Design Master Reference · Shivani Shetkar · Page 5


Setup time (Tsu) Minimum time D must be stable BEFORE the clock edge. Violation → metastability.
Enforced by setup STA check.

Hold time (Th) Minimum time D must stay stable AFTER the clock edge. Violation → wrong data captured.
Enforced by hold STA check.

Clock-to-Q (Tcq) Propagation delay from clock edge to Q output valid. ~50–200ps. Adds to the launch-side
data arrival time.

Metastability If Tsu or Th is violated, the FF enters an undefined intermediate voltage state. It eventually
resolves, but the time is unbounded — and may exceed one clock period, causing system
failure.

■ The ENTIRE purpose of Static Timing Analysis (STA) is to ensure: (1) data arrives at every FF before the setup window
closes, AND (2) data does not arrive so fast it violates hold. Every timing fix in PD serves this goal.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 6


Chapter 2: Complete ASIC Design Flow — RTL to GDS

Every stage from specification to silicon — detailed

2.1 The Full Flow

SPECIFICATION

■ (system reqs, PPA targets, interfaces, power budget)

RTL DESIGN (Verilog / SystemVerilog / VHDL)

■ (behaviour description — NOT physical layout yet)

FUNCTIONAL SIMULATION (VCS, Questa, Xcelium)

■ (verify RTL does the right thing — testbenches, coverage)

LOGIC SYNTHESIS (Design Compiler / Genus / Fusion Compiler)

■ (RTL → gate netlist; apply SDC timing constraints)

DFT INSERTION (Tessent / Modus)

■ (add scan FFs, BIST controllers for manufacturing test)

■■■■■■■■■■■■■■■ PHYSICAL DESIGN ■■■■■■■■■■■■■■■■■■■■■■■■

FLOORPLANNING (ICC-II / Innovus)

■ (die size, IO, macro placement, power grid creation)

PLACEMENT (place_opt)

■ (place standard cells at legal x,y coordinates)

CLOCK TREE SYNTHESIS (clock_opt)

■ (build balanced clock distribution, fix hold violations)

ROUTING (route_opt)

■ (connect all nets with metal wires on correct layers)

SIGN-OFF (PrimeTime SI, Calibre DRC/LVS, Voltus IR/EM)

■ (timing PASS, DRC clean, LVS clean, IR/EM within limits)

TAPE-OUT → GDS to Foundry → Photomasks → Silicon Wafers

2.2 Key File Formats

.v / .sv / .vhd RTL source (Verilog/SV/VHDL). Input to synthesis.

.sdc Synopsys Design Constraints. Timing constraints (clocks, delays, exceptions). Used by ALL
tools.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 7


.lib (Liberty) .db compiled form. Timing & power tables per cell per PVT corner. NLDM indexed by
(input_slew, output_load).

.lef Library Exchange Format. Cell physical abstract (pin positions, obstruction layers) +
technology rules (routing layers, pitch).

.ndm New Data Model (Synopsys ICC-II). Unified .lib + .lef + .gds in one file.

.def Design Exchange Format. Snapshot of physical state: die, placements, routing.

.spef Standard Parasitic Exchange Format. Extracted R + C of routed wires. Back-annotated to


PrimeTime.

.gds / .oasis Binary layout polygon data by layer. Final deliverable to foundry.

.upf Unified Power Format (IEEE 1801). Power domains, isolation, level shifters for multi-VDD
designs.

2.3 Standard Cell Library Anatomy

The standard cell library is the set of pre-characterised logic primitives used by synthesis and PD. Key properties:

• All cells have the SAME HEIGHT (e.g., 7.5µm at 14nm) — they tile perfectly in placement rows
• Variable WIDTH — proportional to function complexity and drive strength
• Timing characterised at multiple PVT corners (SS, TT, FF) → multiple .lib files
• Multiple drive strengths: X1, X2, X4, X8 — same logic, different W/L ratio
• Multiple Vt variants: HVT, SVT, LVT — same logic, different threshold voltage

Example: How a NAND2 cell is characterised

NAND2_X2_SVT: 2-input NAND, drive strength X2, standard Vt. Characterised at SS (0.72V, 125°C): rise delay =
0.15ns @ (slew=0.05ns, load=0.02pF). Characterised at FF (0.88V, -40°C): rise delay = 0.06ns. The PD tool selects
the worst-case corner for each timing check.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 8


Chapter 3: Logic Synthesis

RTL → Gate Netlist · SDC · Design Compiler · Optimisation

3.1 Synthesis — Three Phases

Phase 1: ELABORATION

- Parse RTL, build technology-independent GTECH network

- Infer FFs from always_ff constructs

- Infer arithmetic (adders, multipliers, comparators)

Phase 2: TECHNOLOGY-INDEPENDENT OPTIMISATION

- Boolean minimisation (remove redundant logic)

- Common sub-expression elimination (share repeated logic)

- Constant propagation (remove always-0 / always-1 logic)

Phase 3: TECHNOLOGY MAPPING

- Replace GTECH with library cells (NAND, NOR, INV, DFF...)

- Select cells meeting timing + area + power targets

- Insert buffers for fanout / load driving

- Multi-Vt assignment (HVT on slack paths, LVT on critical)

3.2 SDC Constraints — Complete Reference

SDC is the universal language for timing constraints — used by synthesis, PnR, and STA sign-off.

Clock Definition
# Define 500 MHz clock (period = 2ns)

create_clock -name clk -period 2.0 [get_ports clk]

# Uncertainty: models jitter + skew budget

set_clock_uncertainty -setup 0.1 [get_clocks clk] ;# 100ps

set_clock_uncertainty -hold 0.05 [get_clocks clk] ;# 50ps

# Source latency: PLL delay before reaching chip

set_clock_latency -source 0.3 [get_clocks clk]

# Generated (divided) clock

create_generated_clock -name clk_div2 \

-source [get_ports clk] -divide_by 2 [get_pins u_div/Q]

Input / Output Constraints


# Input delay: data arrives 0.5ns after clock edge at our port

set_input_delay -max 0.5 -clock clk \

[remove_from_collection [all_inputs] [get_ports clk]]

set_input_delay -min 0.1 -clock clk \

[remove_from_collection [all_inputs] [get_ports clk]]

# Output delay: external FF needs data stable 0.3ns before edge

set_output_delay -max 0.3 -clock clk [all_outputs]

set_output_delay -min 0.0 -clock clk [all_outputs]

VLSI Physical Design Master Reference · Shivani Shetkar · Page 9


# Set drive / load for realistic analysis

set_driving_cell -cell BUF_X4_SVT -pin Y [all_inputs]

set_load 0.05 [all_outputs]

Timing Exceptions
# FALSE PATH: never a real functional path

set_false_path -from [get_ports rst_n] ;# async reset

set_false_path -from [get_clocks clkA] -to [get_clocks clkB] ;# CDC

# MULTICYCLE PATH: allow 2 cycles for data to settle

set_multicycle_path 2 -setup -from [get_cells u_mult*]

set_multicycle_path 1 -hold -from [get_cells u_mult*] ;# CRITICAL!

# MAX/MIN DELAY override

set_max_delay 1.5 -from [get_ports data_in] -to [get_pins u_reg/D]

■ ALWAYS pair setup and hold for multicycle paths. set_multicycle_path 2 -setup without setting hold = incorrect hold analysis.
The hold is by default checked 1 cycle before the last setup edge.

3.3 Design Compiler Key Commands

# 1. Library setup

set_target_library {[Link] [Link]}

set_link_library {* [Link] [Link]}

# 2. Read and elaborate RTL

read_verilog design.v

elaborate TOP

current_design TOP

# 3. Apply constraints

source [Link]

# 4. Compile (compile_ultra = highest quality)

compile_ultra -gate_clock -scan -no_autoungroup

# 5. Key reports

report_qor ;# WNS, TNS, area summary

report_timing -delay_type max ;# setup worst path

report_timing -delay_type min ;# hold worst path

report_area ;# cell area breakdown

report_power ;# dynamic + leakage

# 6. Write outputs

write_file -format verilog -hierarchy -output netlist.v

write_sdc [Link]

write_scan_def -output [Link]

3.4 Multi-Vt Assignment Strategy

Vt Assignment Based on Path Slack:

Slack < 0.05ns (critical): → LVT (fast, accept more leakage)

Slack 0.05–0.2ns (near-crit): → SVT (balanced, default)

VLSI Physical Design Master Reference · Shivani Shetkar · Page 10


Slack > 0.2ns (non-critical):→ HVT (save leakage, acceptable speed)

Example result: 10% LVT + 30% SVT + 60% HVT

Leakage saving vs all-SVT: ~40-50%

Timing impact: zero (all LVT on critical paths already pass)

Example: Leakage calculation with multi-Vt

100K cells. All-SVT leakage = 2.0mW. HVT leakage ≈ 0.4× SVT. LVT leakage ≈ 2.5× SVT. Mix: 60% HVT + 30%
SVT + 10% LVT. Leakage = 60K×0.8µW×0.4 + 30K×0.8µW×1.0 + 10K×0.8µW×2.5 = 19.2 + 24.0 + 20.0 = 0.632mW
→ 68% reduction from 2.0mW!

VLSI Physical Design Master Reference · Shivani Shetkar · Page 11


Chapter 4: Floorplanning & Power Planning

Die sizing · Macro placement · PDN · IR drop · EM

4.1 Floorplanning Overview

Floorplanning is the most influential physical design step. A bad floorplan can make timing closure impossible
regardless of other optimisations.

Good vs Bad Floorplan (cross-section view):

BAD: GOOD:

■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■

■ SRAM (centre) ■ ■SRAM■ cells ■SRAM■

■ ■ ■ ■ ■ ■

■ cells scattered ■ ■ctrl■■■■■■■■■■■PHY ■

■ ■ ■ ■ cells ■ ■

■ PHY (far) ■ ■■■■■■■■■■■■■■■■■■■■

■■■■■■■■■■■■■■■■■■■■ Macros on edges,

→ Long wires, congestion open routing in centre

4.2 Die and Core Sizing

Core Area = Total_Std_Cell_Area / Target_Utilisation

Die Size = sqrt(Core_Area + Macro_Area) with chosen aspect ratio + IO margins

Example: Die sizing worked example

Synthesis: std cell area = 800,000 µm². SRAMs = 200,000 µm². Target util = 65%. Core for std cells = 800,000/0.65
= 1,230,769 µm². Total core = 1,230,769 + 200,000 = 1,430,769 µm². Aspect ratio 1:1 → core = 1196µm × 1196µm ≈
1.2mm × 1.2mm. Add IO ring (60µm each side) → die = 1320µm × 1320µm = 1.74mm².

Utilisation Routing Risk Recommendation


Headroom

50–60% High (abundant) Low Complex SoCs or first tapeouts

65–70% Good Medium Standard for most designs

70–75% Tight High Only with expert team

>75% Insufficient Critical DRC/routing failures likely

4.3 Macro Placement Rules

Macro Placement Best Practices:

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

■ IO Pads / Bond Pads / Flip-chip Bumps ■

■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■

■ ■ ■■■■■■ ■■■■■■ ■■■■■■ ■ ■

■ ■ ■SRAM■ ■SRAM■ ■PHY ■← near IO ■ ■

VLSI Physical Design Master Reference · Shivani Shetkar · Page 12


■ ■ ■ ■ Standard ■■■■■■ ■■■■■■ ■ ■

■ ■ ■ctrl■ Cell ■ ■

■ ■ ■■■■■■ Region ■■■■■■■ ■ ■

■ ■ (open for ■ PLL ■← corner ■ ■

■ ■ std cells ■■■■■■■ ■ ■

■ ■ and routing) ■ ■

■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Rules:

1. Macros along die EDGES → open routing channel in centre

2. Related macros TOGETHER (SRAM next to its controller)

3. IO-connected macros NEAR their pads (PHY, SerDes)

4. PLL in corner (isolated from switching noise)

5. Align macro corners to routing grid (e.g., 0.2µm)

6. Minimum 5–10 routing tracks between adjacent macros

7. Add HALOS (keep-out zones) around every macro

4.4 Power Distribution Network (PDN)

PDN Hierarchy:

Package / PCB power delivery

■ Bond wires / C4 bumps

Core Ring (M8/M9, wide, perimeter of core)

■ Via arrays to lower layers

Horizontal Straps (M7/M8, pitch ~8-20µm, wide)

Vertical Straps (M5/M6, pitch ~8-20µm, wide)

■ Via arrays

M1 Cell Rails (narrow VDD/VSS rail inside every cell row)

■ Contacts

Transistor source/drain

# ICC-II PDN creation commands

create_pg_ring -nets {VDD VSS} \

-layers {{M8 width=1.2 spacing=0.5} {M9 width=1.2 spacing=0.5}}

create_pg_mesh -nets {VDD VSS} \

-layers {{M7 width=0.4 pitch=8.0} {M8 width=0.4 pitch=8.0}}

connect_pg_net -automatic -all_blocks

compile_pg

check_pg_connectivity -check_std_cell_pins all

VLSI Physical Design Master Reference · Shivani Shetkar · Page 13


4.5 IR Drop — Theory, Calculation, Fix

V_drop = I_cell × R_grid_path

Acceptable: V_drop < 5% × VDD (e.g., <40mV for VDD=0.8V)

Timing impact: every 10mV IR drop ≈ 0.5–1% increase in cell delay

Static IR drop DC component from leakage current. Always present. Fix: add straps.

Dynamic IR drop Transient voltage droop during simultaneous switching. Peak can be 3× static. Fix: decap
cells.

Hot spot Region with many switching cells drawing simultaneous current. Identified by IR map
colours.

Decap cell Capacitor cell placed near hot spots. Provides local charge reservoir to absorb dynamic
current demand.

■ IR drop causes timing failure indirectly: cell receives lower VDD → slower delay → setup slack worsens. A 5% IR drop at 0.8V
(40mV) can add 4–8% delay to affected cells.

4.6 Electromigration (EM)

EM = gradual metal atom displacement from high current density → voids (opens) or hillocks (shorts) over years.

Black's Law: MTF = A × J^(-n) × exp(Ea / kT)

where MTF = mean time to failure; J = current density; Ea ≈ 0.9eV (Cu); T = temperature (K).

Jmax limit Foundry specifies max J per layer at temperature. Typical: M1 = 1mA/µm, M8 = 8mA/µm.

Power vs signal EM Power nets: use average current. Signal nets: use RMS current (accounts for duty cycle).

Via EM Current squeeze through small via cross-section. Most vulnerable location. Use double/quad
vias.

Fix: wider wire R ∝ 1/width. Double width → half current density → exponentially longer lifetime.

Fix: redundant via 2×1 or 2×2 via array distributes current across multiple vias.

■ EM is a 10-YEAR reliability concern. Chips must survive 10 years at 125°C junction temperature. Synopsys Voltus and
Cadence RedHawk perform EM sign-off. Calibre PERC also checks EM rules.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 14


Chapter 5: Placement

Global placement · Legalisation · Detailed optimisation · Congestion

5.1 Placement Overview

Placement assigns physical (x,y) coordinates to every standard cell. It is timing-driven — cells on critical paths are
placed closer together to minimise wire delay. Placement happens in three progressive stages.

Three-Stage Placement Flow:

1. GLOBAL PLACEMENT

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Goal: minimise total wirelength (HPWL — half-perimeter wirelength)

Method: analytical or force-directed — treat wires as springs

Result: coarse distribution, cells MAY overlap

Before global placement: After global placement:

■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■

■ all cells ■ ■ ■ ■■ ■ ■ ■ ■

■ stacked at ■ ■ ■■ ■ ■■ ■ ■

■ (0,0) ■ ■■■ ■ ■ ■■ ■ ■■ ■

■ ■ ■ ■ ■ ■■ ■ ■ ■

■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■

(cells may overlap)

2. LEGALISATION

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Goal: move cells to nearest legal placement row/site

All cells: aligned to row + site grid, no overlaps

Preserves global placement distribution as much as possible

3. DETAILED PLACEMENT

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Goal: local perturbations to improve timing + congestion

Methods: cell swapping, sliding, mirror/flip, Vt swapping

5.2 Placement Rows and Sites

Standard Cell Placement Rows

Row 3 (flip): ■■■ VDD ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[cell] [cell] [cell] [cell] (flipped)

Row 2 (normal)■■■ VSS ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[cell] [cell] [cell] [cell]

Row 1 (flip): ■■■ VDD ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[cell] [cell] [cell] [cell] (flipped)

Row 0 (normal)■■■ VSS ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

[cell] [cell] [cell] [cell]

(baselined at VDD/VSS shared between adjacent rows)

VLSI Physical Design Master Reference · Shivani Shetkar · Page 15


Site = minimum atomic grid unit (e.g., 0.19µm at 14nm)

All cell x-coordinates must be integer multiples of site pitch

Placement row Horizontal strip of uniform height. All cells in one row are the same height.

Site (x-grid) Minimum x-coordinate unit. Cells placed on integer multiples of site pitch.

Row orientation Alternate rows flipped (FS) so adjacent rows share VDD/VSS rails. Reduces N-well area.

Double-height cell Spans 2 rows. Used for drive strength > 12 or power switch cells. Must start at even row.

Cell flip Mirror cell horizontally when output is on the left — reduces wire length to fanout cells on the
left side.

5.3 Timing-Driven Placement

In TDP (Timing-Driven Placement), the placer assigns positions with timing as the primary objective, not just minimum
wirelength. Critical nets are weighted to pull connected cells together.

How Timing-Driven Placement Works:

1. Estimate wire delay using HPWL model:

Wire delay ≈ R_wire × C_wire

C_wire ≈ 0.2fF/µm × wire_length

Wire_length estimate = HPWL = (xmax - xmin) + (ymax - ymin)

for all pins connected to the net

2. Weight critical nets:

net_weight = 1 / (path_slack + epsilon)

Critical paths (slack → 0) get very high weight

Non-critical paths get low weight

3. Placer minimises:

Σ (net_weight × HPWL) across all nets

→ critical nets pull their cells together

→ non-critical nets have freedom to spread

■ ICC-II place_opt runs timing analysis DURING placement using virtual RC models, and iteratively updates net weights to guide
cells into timing-optimal positions.

5.4 Placement Blockages

Blockage Prevents When Used ICC-II Command


Type

Hard ALL cell placement Under macros, IO create_placement_blockage -type hard


pads, analog

Soft Placement unless Near macro edges — create_placement_blockage -type soft


legalisation requires lower priority zone

Partial Exceeding a density Congestion reduction -type partial -blocked_percentage 50


(density) % near macros

VLSI Physical Design Master Reference · Shivani Shetkar · Page 16


Buffer Buffer/inverter cells CTS halos — stop -type buffer_only
only clock buffers here

5.5 Congestion — Measurement and Fixes

Congestion = routing demand exceeds routing capacity in a local area. Identified by the GRC (Global Routing
Congestion) map.

Congestion Overflow = max(0, Routing_Demand - Routing_Capacity) per GCell

Congestion Map Interpretation:

Low congestion: ■■■■■■■■■■ (grey/white — most area)

Medium congestion: ■■■■■■■■■■ (yellow — monitor)

High congestion: ■■■■■■■■■■ (orange/red — MUST fix)

GCell overflow > 0 means: some nets CANNOT be routed in that tile.

→ If not fixed before routing, causes unrouted nets + DRC violations

Typical target: ZERO overflow tiles before starting detailed routing

Allow a few (<0.5% of tiles) for experienced teams

Congestion fixes (in order of preference):


• 1. Reduce local cell density — use set_app_options -name [Link].max_density -value 0.60
• 2. Insert placement blockages near macros to push cells away from routing-blocked areas
• 3. Widen routing channels — increase gap between adjacent macros
• 4. Move macros to open up blocked routing paths (floorplan change)
• 5. Increase core size (more area → lower density → less congestion)
• 6. Add routing layers (expensive — process option)

# ICC-II congestion commands

report_congestion ;# print overflow summary

set_app_options -name [Link].max_density -value 0.65

place_opt ;# re-run with new density limit

# Check specific congested region

report_congestion -layer {M2 M3} ;# per-layer overflow

5.6 Scan Chain Reordering

After synthesis, scan chains are in synthesis order (alphabetical/hierarchical). After placement, these FFs may be
scattered across the die — creating very long scan wires.

Before Reorder: After Reorder:

■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■

■FF1 ■■■■■■■■■■■■■■FF6 ■FF1■■FF2■■FF3 ■

■ ↑ long wire ■ ■ (adjacent, short) ■

■FF2 ■ ■ ■FF4■■FF5■■FF6 ■

■ ■ ■ ■ (adjacent, short) ■

■ FF3■■■■■■■■■■■■■FF5 ■ ■

■FF4 ↑ long wires ■ ■■■■■■■■■■■■■■■■■■■■

■■■■■■■■■■■■■■■■■■■■ → 30-50% less scan wire

VLSI Physical Design Master Reference · Shivani Shetkar · Page 17


# ICC-II scan reorder (AFTER place_opt, BEFORE route_opt)

set_scan_configuration -chain_count 16

compile_scan

report_scan_chain ;# verify chain connectivity

■ Scan reordering reduces scan routing wirelength by 30–50%. Always run AFTER placement, BEFORE routing. Skipping this
step wastes routing resources on long scan wires.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 18


Chapter 6: Clock Tree Synthesis (CTS)

Skew · Latency · Topologies · ICG · Useful Skew · Hold Fix

6.1 Why CTS?

Without CTS, a single clock wire drives thousands of flip-flops directly. The wire resistance and capacitance cause
different arrival times at different FFs — this is clock skew. Skew can easily be several nanoseconds, making timing
closure impossible.

Without CTS (single wire): With CTS (balanced tree):

CLK ■■■■■■■■■■■■■■■■■■■■■FF1 CLK

■ / \

■■■■■■■■■■■■■■■■■■■■FF2 BUF BUF

(FF2 clock / \ / \

arrives BUF BUF BUF BUF

0.8ns ■ ■ ■ ■

later!) FF1 FF2 FF3 FF4

All arrive ±30ps

Skew without CTS: 800ps → wastes 800ps of your clock period!

Skew with CTS: 30ps → only 30ps wasted on skew

■ Clock power is 20–40% of total dynamic power. CTS must balance low skew, low insertion delay, and low power
simultaneously.

6.2 Key Clock Parameters

Clock Skew Difference in clock arrival time at two FFs. Skew = |Lat_FF1 - Lat_FF2|. Target: < 50–150ps
at 500MHz+.

Insertion Delay Total delay through the clock tree buffers (clock port → FF clock pin). Typical: 300ps–2ns.
The CTS tool builds the tree to equalise this across all FFs.

Source Latency Delay from external clock source to the chip clock port (models PCB trace + PLL). Set via
set_clock_latency -source.

Clock Slew Rise/fall transition time at FF clock pin. Slow slew → high power, jitter susceptibility. Target
< 100–200ps. Controlled by buffer sizing.

Clock Jitter Cycle-to-cycle variation in clock period. Caused by PLL phase noise. Modelled as clock
uncertainty in STA.

Half-cycle path Path between opposite-edge FFs (rising launch, falling capture). Only half a period
available. Very tight timing constraint.

Clock domain Set of FFs driven by the same clock. A chip may have 2–20+ independent clock domains.

6.3 CTS Topologies

Buffered Tree (most common):

CLK

■■ BUF ■■■ BUF ■■■ BUF ■■ FF1

VLSI Physical Design Master Reference · Shivani Shetkar · Page 19


■ ■■ BUF ■■ FF2

■■ BUF ■■■ BUF ■■ FF3

■■ BUF ■■ FF4

Skew: ~50-150ps. Power: medium. Most flexible — handles irregular FF distribution.

H-Tree (for regular arrays):

CLK

■■■■■■■■■

■ ■ Each branch is equal length

■■■■■ ■■■■■ → perfect skew balance from geometry

FF FF FF FF

Skew: <20ps. Good for memory arrays, datapaths.

Clock Mesh (ultra-low skew, high power):

■■■■■■■■■■■■■ Grid of connected wires.

■■■■■■■■■■■■■ Very low impedance → very low skew (<5ps).

■■■■■■■■■■■■■ Power: HIGH (large capacitance).

■■■■■■■■■■■■■ Used in high-end CPUs, GPUs.

6.4 ICC-II CTS Setup and Commands

# ■■ Pre-CTS Setup ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# Set CTS quality targets

set_clock_tree_options -max_transition 0.1 ;# 100ps max slew

set_clock_tree_options -target_skew 0.05 ;# 50ps target skew

set_clock_tree_options -max_capacitance 0.05 ;# 50fF max cap at buffer output

# Define cells CTS can use for buffering

set_lib_cell_purpose -include cts {CLKBUF* CLKINV*}

set_lib_cell_purpose -exclude cts {*HVT*} ;# don't use HVT for CTS

# Apply NDR (2x width, 2x spacing) to all clock nets

create_routing_rule CLK_NDR -multiplier_width 2 -multiplier_spacing 2

set_clock_tree_options -routing_rule CLK_NDR

# Prevent CTS buffering into halos (keep-out around macros)

# Done via placement blockage -type buffer_only

# ■■ Run CTS ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

clock_opt ;# CTS + post-CTS opt (holds fixed too)

# ■■ Reports ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

report_clock_qor ;# skew, insertion delay, buffer count

report_clock_timing -type skew ;# skew between all sink pairs

report_clock_timing -type latency ;# insertion delay per FF

report_power ;# check clock power % of total

6.5 Hold Violations After CTS — Why They Appear

VLSI Physical Design Master Reference · Shivani Shetkar · Page 20


Hold violations do NOT appear during synthesis or before CTS because clock latencies are assumed to be zero (ideal
clock). After CTS, real latency is applied — and short data paths between adjacent FFs may violate hold.

Why hold violations appear after CTS:

FF1 (launch) FF2 (capture)

■ ■

Q■■[data path: 0.08ns]■■D ■

■ ■

CLK_launch = 0.50ns CLK_capture = 0.55ns

Hold check (same clock edge):

Data arrival = 0.50 + 0.08 = 0.58ns

Hold required = 0.55 + Th = 0.55 + 0.03 = 0.58ns + uncertainty

Hold slack = 0.58 - (0.58 + 0.05) = -0.05ns ← VIOLATION

The data path is SO SHORT that data from cycle N

reaches FF2 BEFORE FF2's clock captures cycle N-1!

Fix: insert a delay buffer on the 0.08ns data path.

Hold Slack = (T_launch_lat + T_data_min) - (T_capture_lat + Th + T_hold_uncert)

Hold fix methods:


• Insert delay/hold buffers on the data path (clock_opt does this automatically)
• Swap data path cells to HVT (slower = more delay = hold fixed)
• Useful skew: delay launch FF's clock arrival (add buffers to launch clock branch)

■ clock_opt in ICC-II automatically inserts hold buffers during post-CTS optimisation. Hold must be fixed BEFORE routing —
fixing holds after routing requires re-routing changed nets which disrupts the existing routing.

6.6 Useful Skew

Useful skew intentionally makes the clock arrive at different times at launch and capture FFs, to improve timing
between a specific pair.

Setup improvement with useful skew:

Normal: Launch FF CLK = 0.50ns

Capture FF CLK = 0.50ns (zero skew)

Setup slack = 2.0 + (0.50-0.50) - 1.4 - 0.05 - 0.1 = 0.45ns

With useful skew (delay capture by 0.1ns):

Launch FF CLK = 0.50ns

Capture FF CLK = 0.60ns (+0.1ns useful skew)

Setup slack = 2.0 + (0.60-0.50) - 1.4 - 0.05 - 0.1 = 0.55ns

→ 0.10ns improvement!

Hold check after useful skew:

Hold slack = (0.50 + data_min) - (0.60 + Th + uncert)

→ might worsen hold! Must check.

Setup slack with useful skew = Tclk + (Tcap_lat - Tlaunch_lat) - Tdata - Tsu -
Tuncert

VLSI Physical Design Master Reference · Shivani Shetkar · Page 21


■ Useful skew is bounded: adding skew on one path may create hold violations on adjacent paths. The tool must optimise skew
scheduling globally across all paths simultaneously.

6.7 Clock Gating — Power Reduction

Clock gating stops the clock from reaching a register bank when it is idle. Since P_dynamic ∝ f, stopping the clock
eliminates all dynamic power for idle cells.

ICG (Integrated Clock Gate) structure:

EN ■■■ D Q ■■■

CLK ■■■ CLK(latch) ■

■■■■ AND ■■■ Gated CLK → FFs

CLK ■■■■■■■■■■■■■■

The latch holds EN stable during the HIGH phase of CLK.

This prevents glitches from EN from creating spurious clock pulses.

The AND gate passes/blocks CLK based on the latched EN.

EN=1: clock passes through → FFs receive clock → can capture new data

EN=0: clock blocked → FFs hold state → zero dynamic power for those FFs

Clock gating coverage = (# FFs with ICG) / (total # FFs)

Target: > 80% coverage for low-power designs

Typical savings: 30-50% of total dynamic power

ICG enable timing The EN signal must meet setup and hold timing at the ICG input relative to CLK. STA must
check ICG enable paths.

ICG placement ICGs should be placed close to their sink FF cluster to minimise clock tree imbalance.

Auto clock gating Design Compiler inserts ICGs automatically during compile_ultra -gate_clock when register
banks are found.

Glitch prevention The latch in the ICG prevents EN glitches from creating spurious clock pulses — this is why
a plain AND gate is NOT used.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 22


Chapter 7: Routing

Global routing · DRC rules · LVS · Antenna effect · SI · NDR

7.1 Routing Flow

Routing Flow:

1. GLOBAL ROUTING

Divides chip into GCells (tiles). Assigns each net to a

GCell sequence without exact tracks. Fast — reveals

congestion. Drives congestion-fix iterations.

2. TRACK ASSIGNMENT

Assigns exact metal tracks within each GCell.

Creates actual wire geometry. Enforces basic DRC.

3. DETAILED ROUTING

Assigns exact layer + coordinates. Enforces ALL DRC rules.

Uses pattern routing + search-and-repair.

4. SEARCH & REPAIR

Iterative post-routing pass. Fixes remaining DRC violations.

5. VIA OPTIMISATION

Replaces single vias with double/stacked vias for EM + yield.

6. REDUNDANT VIA INSERTION

Adds extra vias on non-critical nets for yield improvement.

7.2 Metal Layer Stack and Usage

Typical 14nm Metal Stack (10 metal layers):

M9/M10 (Ultra-thick) ■■ Power ring, very long buses, supply straps

M7/M8 (Thick) ■■ Power mesh, wide data buses

M5/M6 (Intermediate)■■ Long signal routing, power straps

M3/M4 (Standard) ■■ Medium-length signal routing

M2 (Standard) ■■ Short signal routing (horizontal)

M1 (Local) ■■ Local cell connections (horizontal, cell rails)

LI (Local inter) ■■ Gate-to-diffusion connections inside cells [FinFET only]

Routing direction convention (alternating H/V):

M1 Horizontal, M2 Vertical, M3 Horizontal, M4 Vertical ...

(Alternating directions minimise vias for point-to-point routes)

7.3 Design Rules — Complete Reference

Design rules are geometric constraints that ensure the foundry can reliably manufacture the layout. Violating them
causes manufacturing failures. The foundry provides a DRC deck (SVRF language for Calibre).

Minimum Width Each metal shape must be at least W_min wide. Prevents resistive/open connections. e.g.,
(W.M1) M1 min-width = 0.064µm at 14nm.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 23


Minimum Spacing Two shapes on the same layer must be separated by at least S_min. Prevents shorts
(SP.M1) between wires.

Minimum Enclosure Metal must surround each via by at least E_min on all sides. Ensures electrical contact
(ENC.V1) despite lithography variation.

Minimum Area Metal shapes must meet a minimum area to avoid thin, high-resistance connections.
(MA.M1)

EOL (End-of-Line) Wire ends need extra spacing to adjacent wires — wire tips are more vulnerable to
Spacing lithography rounding.

Metal Density Each layer must have density between min% and max% for CMP planarisation uniformity.

Antenna Rule Max ratio of wire area to connected gate oxide area. Exceeding it causes gate damage
during plasma etching.

Via Enclosure Metal must extend beyond via edges by a minimum amount on all four sides.

Multi-patterning Adjacent wires on the same patterned layer must have different lithography mask 'colours'.
Colour

7.4 LVS — Layout vs Schematic

LVS verifies the physical layout represents the correct circuit. Calibre extracts transistors and connections from GDS,
then compares to the reference netlist.

LVS Flow:

GDS (layout) Reference Netlist (from synthesis)

■ ■

▼ Calibre xRC extraction ■

Extracted Netlist ■

■ ■

■■■■■■■■■■■■■■■■ COMPARE ■■■■■■■■■■■■■

■■■■■■■■■■■■■■■■■

PASS FAIL

(layout matches) (differences found)

Short: two nets merged without reason

Open: net broken, missing connection

Missing device: cell in netlist, not in layout

Extra device: cell in layout, not in netlist

Parameter mismatch: wrong W/L or fin count

# Calibre LVS command

calibre -lvs -hier -turbo [Link] lvs_rules.svrf

# For large designs: black-box SRAMs (don't compare internals)

# (controlled in the .svrf rule file with LVS BOX statement)

7.5 Antenna Effect — Theory and Fix

VLSI Physical Design Master Reference · Shivani Shetkar · Page 24


During plasma etching (a fabrication step), long metal wires act as antennas and accumulate electric charge. If the
gate oxide is not yet connected to a drain/source (charge dissipation path), the accumulated charge tunnels through
the gate oxide — causing permanent damage.

Antenna Effect Mechanism:

Step 1: M1 wire deposited and etched

■■■■■■■ long M1 wire ■■■■■■■

■ ■

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■GATE

(Drain not yet connected — no discharge path)

Step 2: Plasma etching injects charge onto M1

■■■■■■■[+++++++++++++++++]■■■■■■■

■ charge accumulates on wire ■

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■GATE OXIDE

→ Charge tunnels through thin oxide → permanent damage!

Antenna Ratio = (Wire Area + Via Area above gate) / Gate Oxide Area

Foundry limit: e.g., 400:1 for M1, 800:1 for M2

→ Exceeding limit = antenna violation

Three fix methods:


• Fix 1 — Antenna Diode: insert a reverse-biased diode at the gate pin. Provides safe discharge path for
accumulated charge. Standard fix. No functional impact.
• Fix 2 — Wire Bridging (Metal Jumper): at some point along the wire, route up to a higher metal layer and back
down. Higher layers are deposited later in the process → less charge accumulates. Effectively resets the antenna
counter.
• Fix 3 — Wire Splitting: configure the router to limit maximum wire length per metal layer, preventing antenna
build-up.

Antenna Ratio = Σ(wire area + via area on layers above gate) / gate oxide area

# ICC-II: check and fix antenna

check_antenna

report_antenna_violations ;# list all nets exceeding antenna ratio

insert_diodes -nets [get_nets *] -cell ANTDIODE_X1 ;# auto-fix

7.6 Signal Integrity — Crosstalk

Crosstalk is capacitive coupling between adjacent parallel metal wires. When an aggressor net switches, it induces
voltage noise on the victim net.

Crosstalk Coupling Mechanism:

Aggressor net: ■■■■■■■■■[switching 0→1]■■■■■

■ Cc (coupling capacitor)

Victim net: ■■■■■■■■■■■■■[noise spike]■■■

Coupling capacitor Cc increases with:

- Longer parallel run distance

- Smaller spacing between wires

- Same metal layer

VLSI Physical Design Master Reference · Shivani Shetkar · Page 25


Two types of crosstalk effect:

1. GLITCH (victim is static, aggressor switches):

→ Voltage spike on victim

→ If spike > noise margin: FUNCTIONAL FAILURE

2. DELAY (victim is switching):

→ In-phase (both switch same direction): victim switches FASTER

→ Out-of-phase (opposite directions): victim switches SLOWER

→ Out-of-phase INCREASES setup delay: WORST CASE for timing

Induced noise voltage ∆V ≈ Cc / (Cc + Cvictim) × ∆V_aggressor

SI fixes:
• Increase wire spacing between aggressor and victim (reduces Cc)
• Add shielding wires (VDD or VSS) between aggressor and victim (Cc couples to supply, not victim)
• Reroute aggressor or victim to a different layer or path
• Upsize the victim driver (lower output impedance reduces Cc × Rvictim time constant)
• Apply NDR (wider wire, wider spacing) to critical nets

■ Crosstalk is the dominant sign-off challenge at 28nm and below. At 14nm, coupling capacitance can be 30–50% of total wire
capacitance. Always run PrimeTime SI (set_delay_calculation -si_mode ARNOLDI) for final sign-off.

7.7 Non-Default Routing Rules (NDR)

2W/2S NDR Double minimum width AND double minimum spacing. Applied to all clock nets. Reduces
wire resistance (EM), reduces coupling capacitance (SI).

Shielding Route VDD or VSS alongside a critical net. Eliminates coupling from adjacent aggressors.

Where applied Clock nets: always 2W/2S. PLL output: shielded. High-speed data buses: 2W/2S on
selected layers.

# ICC-II: create and apply NDR for clock nets

create_routing_rule CLK_NDR \

-multiplier_width 2 -multiplier_spacing 2

set_routing_rule -rule CLK_NDR \

[get_nets -of_objects [get_clock_network_cells]]

# Apply shielding to a critical net

add_shielding_net -net_name critical_data_bus \

-with_existing_routing -shield_nets {VDD VSS}

VLSI Physical Design Master Reference · Shivani Shetkar · Page 26


Chapter 8: Static Timing Analysis (STA)

Setup · Hold · MMMC · OCV · AOCV · CPPR · SI · Timing Closure

8.1 What is STA?

STA is an exhaustive, formal method of verifying that every timing path in the design meets its setup and hold
requirements — without simulation. The tool builds a directed timing graph, computes delay along EVERY path
simultaneously, and checks constraints. This covers all 2^N possible input combinations implicitly.

Property STA Gate-Level Simulation (GLS)

Coverage Exhaustive — all paths simultaneously Only paths exercised by test vectors

Speed Minutes for 100M gate design Hours per scenario

Accuracy Conservative (worst-case models) Accurate for simulated scenarios

Setup effort Write SDC constraints Write comprehensive testbench

Industrial usage PRIMARY sign-off method Supplementary validation

8.2 Timing Path Components

A Register-to-Register Timing Path:

LAUNCH side: CAPTURE side:

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

CLK CLK

■ ■

BUF■BUF■BUF (clock tree) BUF■BUF■BUF

■ T_launch_lat ■ T_capture_lat

▼ ▼

[FF1 Q]■■■logic1■■■logic2■■■logic3■■[FF2 D]

T_cq T_l1 T_l2 T_l3 T_su

■■■■■■■■■ T_data_path = Tcq + ΣT_logic ■■■■■■■■■

Data Arrival = T_launch_edge + T_launch_lat + T_data_path

Data Required = T_capture_edge + T_capture_lat - T_su - T_uncert

Setup Slack = Data_Required - Data_Arrival

[MUST be >= 0 for timing closure]

8.3 Setup Timing — Complete Derivation

Arrival = T_launch + T_launch_latency + T_cq + Σ(T_cell_i) + Σ(T_wire_i)

Required = T_capture + T_capture_latency - T_su - T_uncertainty

Setup Slack = Required - Arrival [must be ≥ 0]

Example: Setup calculation — full worked example

VLSI Physical Design Master Reference · Shivani Shetkar · Page 27


Clock period Tclk = 2.0ns. Launch latency = 0.50ns. Capture latency = 0.55ns. Tcq = 0.12ns. Logic delay = 0.95ns.
Wire delay = 0.25ns. Total data path = 0.12 + 0.95 + 0.25 = 1.32ns. Tsu = 0.05ns. Clock uncertainty = 0.10ns. |
Arrival = 0 + 0.50 + 1.32 = 1.82ns. Required = 2.0 + 0.55 - 0.05 - 0.10 = 2.40ns. Setup Slack = 2.40 - 1.82 = +0.58ns
→ PASS ✓

8.4 Hold Timing — Complete Derivation

The hold check prevents new data from arriving at a FF before it has safely captured the previous data. Hold is
checked at the SAME clock edge (0-cycle difference), using MINIMUM delays.

Hold Arrival = T_launch + T_launch_latency + T_data_MIN

Hold Required = T_capture + T_capture_latency + T_hold + T_hold_uncertainty

Hold Slack = Arrival - Required [must be ≥ 0]

Example: Hold violation worked example

Launch latency = 0.50ns. Capture latency = 0.55ns. Min data path delay = 0.05ns (very short — adjacent FFs, no
logic). Th = 0.03ns. Hold uncertainty = 0.05ns. | Hold Arrival = 0.50 + 0.05 = 0.55ns. Hold Required = 0.55 + 0.03 +
0.05 = 0.63ns. Hold Slack = 0.55 - 0.63 = -0.08ns → FAIL ✗ Fix: insert a delay buffer of ≥ 0.08ns on the data path.

■ Hold violations are INDEPENDENT of clock frequency. They exist even at 1 MHz. They only appear after CTS when real clock
latency is known. Short paths between adjacent FFs (placed close together after placement) are most vulnerable.

8.5 Path Types and WNS/TNS/FEP

PI→FF path Primary input to flip-flop. Constrained by set_input_delay.

FF→FF path Register to register — most common and critical. Constrained by clock period.

FF→PO path Flip-flop to primary output. Constrained by set_output_delay.

PI→PO path Purely combinational — input to output. Constrained by both input/output delays.

WNS (Worst Negative Most negative slack across ALL paths. The hardest single path to fix. Must reach 0.
Slack)

TNS (Total Negative Sum of all negative slacks. Zero = all paths pass. Large TNS = many violations.
Slack)

FEP (Failing Endpoint) FF or output with at least one failing path. Count reduces as ECOs are applied.

8.6 MMMC — Multi-Mode Multi-Corner

MMMC analyses all operating modes and PVT corners simultaneously. This is the MANDATORY approach for
production sign-off — single-corner analysis misses real failures.

MMMC Concept:

Modes: Functional mode | Test/scan mode | Low-power mode

Corners: SS (slow-slow) | TT (typical) | FF (fast-fast)

+ RC_worst | + RC_typical | + RC_best

Analysis views (combinations):

func_setup = functional mode + SS corner → worst setup

VLSI Physical Design Master Reference · Shivani Shetkar · Page 28


func_hold = functional mode + FF corner → worst hold

test_setup = test mode + SS corner → scan timing

Each view has different SDC + timing libraries.

STA checks ALL views simultaneously — catches all corners.

Corner Process VDD Temperatur Analysed For


e

SS Slow NMOS + Low Hot Setup — slowest cells, worst delay


(Slow-Slow) PMOS (−10%) (+125°C)

FF Fast NMOS + High Cold Hold — fastest cells, minimum delay


(Fast-Fast) PMOS (+10%) (−40°C)

TT (Typical) Nominal Nominal Nominal Power estimation, functional sim

SF / FS Skewed Varies Varies Hold on specific asymmetric paths

RC_worst Nominal Nominal Hot Wire resistance maximum → setup


devices

RC_best Nominal Nominal Cold Wire resistance minimum → hold


devices

# PrimeTime MMMC setup

create_mode func_mode

create_mode test_mode

create_corner ss_corner -lib_files {slow_rc_worst.db}

create_corner ff_corner -lib_files {fast_rc_best.db}

create_analysis_view func_setup -mode func_mode -corner ss_corner

create_analysis_view func_hold -mode func_mode -corner ff_corner

create_analysis_view test_setup -mode test_mode -corner ss_corner

set_analysis_view -setup {func_setup test_setup} -hold {func_hold}

update_timing -full

report_qor ;# all views simultaneously

8.7 OCV, AOCV, POCV — Variation Modelling

Even within one PVT corner, cells on different parts of the chip experience different conditions (process gradients, IR
drop, temperature). OCV models this pessimism.

OCV Concept:

Launch path: cells are 'slower' than nominal (late derate)

Capture path: cells are 'faster' than nominal (early derate)

This models the worst case: launch side is slow (data arrives late)

AND capture side is fast (clock arrives early = less time for data).

Flat OCVM (simple but pessimistic):

Apply uniform derating to ALL cells on the path:

Late derate (for setup, launch): 1.05 → cells 5% slower

VLSI Physical Design Master Reference · Shivani Shetkar · Page 29


Early derate (for setup, capture): 0.95 → cells 5% faster

AOCV (more accurate — depth-based derating):

Deep paths (20 stages): statistical averaging reduces variation

→ apply smaller derating (e.g., 1.02 instead of 1.05)

Short paths (2 stages): little averaging → larger derating needed

→ apply larger derating (e.g., 1.08)

POCV (most accurate — uses sigma from characterisation):

Each cell has mean delay + sigma (standard deviation)

Statistical STA: combines N×sigma from all cells in path

Most accurate, least pessimistic, requires POCV-enabled .lib

# Flat OCVM

set_timing_derate -cell_delay -data -early 0.95 -late 1.05

set_timing_derate -cell_delay -clock -early 0.98 -late 1.02

# Enable AOCV (requires AOCV table in .lib)

set_ocvm_mode advanced

# CPPR — always enable for sign-off

set_app_var timing_remove_clock_reconvergence_pessimism true

8.8 CPPR — Clock Path Pessimism Removal

In OCV analysis, both launch and capture clock paths are derated. But they share a common segment from the clock
root to their divergence point. This shared segment is derated twice — once as late (launch) and once as early
(capture). This is physically impossible — the same wire CANNOT be both late AND early.

CPPR Example:

CLK (root)

■■■■■■■■■ ← COMMON SEGMENT

■ (shared by launch and capture clock paths)

■■BUF■BUF■■FF1 (launch FF)

■■BUF■BUF■■FF2 (capture FF)

Without CPPR: common segment is derated LATE for launch path

AND derated EARLY for capture path

→ double pessimism on the common segment

CPPR credit = (late_derated_common) - (early_derated_common)

= typically 10–30ps

Corrected slack = raw_slack + CPPR_credit

→ Recovers false pessimism without any real circuit change

CPPR Credit = (T_common × late_derate) - (T_common × early_derate)

■ Always enable CPPR in production sign-off: set_app_var timing_remove_clock_reconvergence_pessimism true. Not enabling
CPPR typically wastes 10–30ps — forcing unnecessary cell upsizing.

8.9 Timing Closure — Step-by-Step Methodology

VLSI Physical Design Master Reference · Shivani Shetkar · Page 30


Timing Closure Flowchart:

Run PrimeTime sign-off

WNS < 0? ■■No■■■ DONE (timing closed!)

Yes

Identify worst path (report_timing -path_type full_clock)

Diagnose root cause:

■■■ Cell delay dominant? → upsize or swap to LVT

■■■ Wire delay dominant? → move cells closer

■■■ Skew dominant? → apply useful skew

■■■ Fanout high? → insert buffer tree

Apply fix (place_eco_cells / size_cell / insert_buffer)

Re-route changed nets (route_eco)

Re-extract SPEF (write_parasitics)

Re-run PrimeTime → verify improvement

Loop back to top

Setup Fix Techniques

Fix How It Works Side Effects When to Use

Cell upsize BUF_X1 → BUF_X4: more Area + power Driver has high fanout load
drive strength → faster increase

LVT swap HVT → LVT: lower Vt → Leakage power Quick wins on near-critical paths
faster switching increase

Buffer Split long wire → reduces Area increase, 2 Long wire is dominant delay source
insertion wire delay new cells

Useful skew Delay capture clock → data May worsen Clock latency imbalance available
has more time adjacent hold

Cell Move cells closer → shorter Local Large wire delay between critical cells
relocation wire congestion may
worsen

VLSI Physical Design Master Reference · Shivani Shetkar · Page 31


Chapter 9: Low Power Design

Power sources · Multi-voltage · UPF · Power gating · Retention · DVFS

9.1 Sources of Power

P_total = P_dynamic + P_short_circuit + P_static

P_dynamic = α · C · VDD² · f [switching power]

P_static = VDD · I_leakage [dominant at advanced nodes]

α (activity factor) Fraction of cycles a node switches. Clock: α≈1.0. Data bus: α≈0.1–0.3. Idle register: α≈0.

VDD² lever Most powerful: reduce VDD by 20% → save 36% dynamic power. Foundation of DVFS.

I_leakage at 7nm Exponential with temp. Can be 40–60% of total power at idle. HVT cells have 5–10× less
leakage than LVT.

9.2 Multi-Voltage Design and UPF

Multi-Voltage Domain Example:

VDD_HIGH (1.0V) VDD_LOW (0.7V)

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■

■ High-perf CPU ■ ■ Low-power DSP■

■ core ■ ■ block ■

■ ■■■■■■ ■

■ (fast, high V) ■ LS ■ (slow, low V) ■

■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■

Level Shifter (LS)

converts 0.7V logic → 1.0V logic

At boundary, you need:

- Level Shifters (LS): convert signal voltage level

- Isolation Cells: clamp outputs when domain powers off

- Retention Registers: save/restore state during power-down

Level Shifter (LS) Converts signal from low-VDD domain to high-VDD domain or vice versa. Must be powered
by BOTH supply domains.

ELS (Enable Level Combines level shifting + isolation in one cell. Most efficient at domain crossings.
Shifter)

Isolation Cell When a domain powers off, its outputs float. Isolation cells clamp output to safe value (0 or
1). Controlled by iso_enable signal.

Retention Register DFF with shadow latch backed by always-on supply. SAVE: copies main FF to shadow.
RESTORE: copies back after power-on.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 32


Header switch (PMOS) PMOS between primary VDD and virtual VVDD. Switches OFF to cut power to the block.

Footer switch (NMOS) NMOS between virtual VVSS and primary VSS. Alternative to header. NMOS smaller than
PMOS.

UPF Code Example


# Define power domains

create_power_domain PD_TOP -elements {u_top}

create_power_domain PD_LP -elements {u_lp_block}

# Define supply nets

create_supply_port VDD -domain PD_TOP

create_supply_port VDD_LP -domain PD_LP

# Power states

add_power_state PD_LP -state ON {-supply_expr {VDD_LP == FULL}}

add_power_state PD_LP -state OFF {-supply_expr {VDD_LP == OFF}}

# Isolation: clamp outputs to 0 when PD_LP powers off

set_isolation ISO_LP \

-domain PD_LP -isolation_signal iso_en \

-isolation_sense high -clamp_value 0

# Level shifting on PD_LP outputs

set_level_shifter LS_LP -domain PD_LP -applies_to outputs

# Retention for all FFs in PD_LP

set_retention RET_LP -domain PD_LP \

-save_signal {save_n low} -restore_signal {restore_n low}

9.3 Power Gating

Power Gating with Header Switch:

VDD (always on)

■■■ PMOS Header Switch ← sleep_n signal

VVDD (virtual VDD — can be cut off)

[Standard cells in power-gated domain]

VSS (always on)

sleep_n = 0: PMOS OFF → VVDD floats → domain has no power

sleep_n = 1: PMOS ON → VVDD = VDD → domain powered normally

Wake-up sequence (to prevent di/dt noise):

- Enable switches gradually (daisy-chain)

- Deactivate isolation cells AFTER power is stable

- Apply RESTORE signal to retention registers

VLSI Physical Design Master Reference · Shivani Shetkar · Page 33


Switch sizing: one switch for every 50-100µm of cell row

Area overhead: ~5-10% of gated domain area

■ Isolation cells must be activated BEFORE the domain powers off, and deactivated AFTER power-on is stable. Violating this
sequence causes X-propagation (unknown values) into the always-on domain.

9.4 DVFS — Dynamic Voltage and Frequency Scaling

P_dynamic = α · C · VDD² · f

Reducing VDD by 20% and f by 20%: P reduces to (0.8)² × 0.8 = 0.512 → 49% savings!

DVFS Operating Points Example:

Mode VDD f P_dynamic (relative)

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

High Perf: 1.0V 1.0GHz 1.00 (baseline)

Normal: 0.9V 800MHz 0.58 (42% savings)

Low Power: 0.8V 500MHz 0.26 (74% savings!)

Ultra-low: 0.7V 200MHz 0.07 (93% savings!)

PD implications of DVFS:

- Design must meet timing at ALL (VDD, f) operating points

- MMMC must include VF corners for all operating modes

- Power grid must handle current at all supply voltages

- IR drop analysis needed at each VDD level

VLSI Physical Design Master Reference · Shivani Shetkar · Page 34


Chapter 10: Physical Verification & Sign-off

Calibre DRC/LVS · ERC · DFM · Formal verification · Tape-out checklist

10.1 Sign-off Checklist

Before delivering GDS to the foundry, ALL of the following must be clean. Missing even one sign-off check = no
tape-out.

■ DRC: Zero violations in Calibre DRC using the foundry's official rule deck
■ LVS: Layout netlist matches reference netlist exactly (Calibre LVS)
■ ERC: No floating gates, no open power/ground connections, no latch-up risk
■ Antenna: All antenna ratios within foundry limits OR waived with diodes
■ Metal Density: All metal layers within min/max density range for CMP uniformity
■ STA Sign-off: WNS ≥ 0 and TNS = 0 across ALL MMMC views, PrimeTime SI with SPEF, AOCV, CPPR
■ IR Drop: Static < 5% VDD; Dynamic < 10% VDD in all operating modes
■ EM Sign-off: All wires/vias within foundry Jmax EM current density limits
■ Formal Verification: Post-PD netlist == pre-PD netlist (Synopsys Formality or Cadence Conformal)
■ Power Analysis: Total chip power within product specification
■ GDS Quality: Correct cell names, layer mapping, scale, seal ring, pad frame, fill density
10.2 Calibre DRC — Flow and Debugging

Calibre DRC Flow:

GDS file + DRC rule deck (foundry .svrf)

calibre -drc -hier -turbo [Link] drc_rules.svrf

Output: [Link] (RVE database)

[Link] (violation count per rule)

Open in Calibre RVE GUI:

- Rule list → click a rule → see all violations

- Each violation: rule name, layer, exact coordinates

- Navigate from violation list to layout view

- Fix violated shapes → re-run DRC → verify clean

Common DRC violations and how to fix them:


SP.M1 (spacing) Two M1 wires too close. Fix: move one wire, increase spacing, or reroute.

W.M2 (width) M2 wire too narrow (e.g., after antenna fix shortened the wire). Fix: widen.

ENC.V1 (enclosure) Via not fully enclosed by M1 or M2 metal. Fix: extend metal in the failing direction.

DENSITY.M3 (density) Metal density too low/high. Fix: add/remove metal fill patterns.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 35


[Link] (antenna) Antenna ratio exceeded. Fix: insert diode cell or add metal jumper to higher layer.

EOL.M1 (end-of-line) Wire end too close to adjacent wire. Fix: increase spacing at wire end.

10.3 ERC and DFM

Floating Gate ERC Gate terminal not connected to any net. Will have undefined logic state. Fix: connect to VDD
or VSS pull-up/pull-down.

Open Power ERC Cell VDD or VSS pin not connected to power grid. Fix: fix PDN connectivity in ICC-II.

Latch-up ERC N-well to P-diffusion spacing too small. Latch-up can latch VDD to GND permanently —
destroying the chip.

DFM — Redundant Via Replace single vias with 2×1 or 2×2 via arrays. Most impactful yield improvement. Reduces
open-via failure rate.

DFM — Critical Area Regions where a particle defect would most likely cause a short or open. Minimise by
spreading wires and avoiding narrow spaces.

10.4 Formal Verification

Formal verification uses mathematical proof to confirm two circuit representations are functionally IDENTICAL. Run
after every netlist modification:
• After synthesis: RTL vs gate netlist
• After DFT insertion: pre-scan vs post-scan netlist
• After every ECO: pre-ECO vs post-ECO netlist
• Before tape-out: final post-route netlist vs reference netlist

# Synopsys Formality

read_sverilog -container r -libname WORK -01 reference.v

read_sverilog -container i -libname WORK -01 implementation.v

set_top r:/WORK/TOP

set_top i:/WORK/TOP

verify

report_failing_points ;# list mismatched logic cones

VLSI Physical Design Master Reference · Shivani Shetkar · Page 36


Chapter 11: Advanced Nodes — FinFET & Sub-28nm

FinFET structure · Multi-patterning · EUV · PD implications

11.1 Planar MOSFET vs FinFET

Planar MOSFET (≥28nm): FinFET (<20nm):

■■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Gate Gate wraps around THREE sides

■ ■■■■■■■■■■■■

■■■■■■■ oxide ■■ gate on ■■ oxide

■■■■■■■■■■■ channel ■ top+sides■

n+ n+ ■■■■■■■■■■■■

S D ↑ ↑ ↑

■■■■■■■■■■■■■ fin (vertical Si channel)

(gate controls ONE side) (gate controls 3 sides)

Issues at short gate lengths in planar:

- Short-channel effects: hard to turn off → high leakage

- DIBL (Drain Induced Barrier Lowering): drain voltage

lowers the energy barrier → leaks even when OFF

FinFET advantage:

- Gate wraps 3 sides → much better electrostatic control

- Can turn OFF much more decisively → lower leakage

- Subthreshold slope approaches ideal 60mV/decade

Property Planar (≥28nm) FinFET (<20nm)

Gate control 1 side (planar) 3 sides (tri-gate) → better control

Leakage High at short L Significantly lower

Drive strength Continuous W Quantised: 1, 2, 3, N fins


sizing

Subthreshold 80–100 mV/decade ~65 mV/decade


slope

Used at 130nm → 28nm 20nm → 3nm (TSMC, Samsung, Intel)

11.2 FinFET PD Implications

• Drive strength quantisation: you CANNOT continuously size cells in FinFET. Cell size = number of fins (1-fin,
2-fin, 4-fin). PnR selects from discrete library entries only.
• Tighter DRC rules: metal pitch at 14nm ≈ 48nm vs 90nm at 28nm. DRC deck has 500+ rules vs ~100 at 130nm.
• Fin direction: fins run vertically across the full cell height. Gate runs horizontally over fins. Fin orientation is fixed
— all NFET and PFET fins must follow the global fin direction.
• Local interconnect (LI/M0): at TSMC N5/N3, additional routing layers below M1 (LISD, M0, LIG) provide local
connections between gate, source, and drain without using M1 routing tracks.
• N-well proximity effects: at advanced nodes, neighbouring transistors' N-wells interact, causing Vt variation. Strict
spacing rules between N-well regions.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 37


• Congestion: routing resources per unit area are proportionally scarcer at advanced nodes. Congestion
management is significantly harder at 14nm vs 32nm.

■ Your Opensparc FPU project was at 14nm. At this node, every routing track counts. The minimum M1 pitch is ~48nm — a
single M1 wire is barely wider than a few atoms! This is why DRC decks are so large and complex at FinFET nodes.

11.3 Multi-Patterning

At sub-20nm, a single lithography exposure cannot print the minimum pitch. Multi-patterning decomposes one layout
layer into multiple masks to achieve sub-resolution pitch.

SADP (Self-Aligned Double Patterning) — used at 14nm/10nm:

Step 1: Print CORE pattern at 2× minimum pitch

■ ■ ■ ■ (core lines — one mask)

Step 2: Deposit sidewall spacers on core lines

■■ ■■ ■■ ■■ (spacers form on edges)

Step 3: Etch away core, leave spacers

■ ■ ■ ■ (4 lines from 2 cores!)

Result: 4 lines from 1 mask = 2× density

Pitch of final pattern = spacer width

Router 'colouring' requirement for LELE (Litho-Etch-Litho-Etch):

Adjacent wires on M3 must have DIFFERENT colours (masks):

■■■ M3 Colour A (Mask 1) ■■■

■■■ M3 Colour B (Mask 2) ■■■ ← must be different mask

■■■ M3 Colour A (Mask 1) ■■■

Violation: two adjacent same-colour wires = DRC error

Technique Masks Used At Key Challenge

LELE 2 20nm–14n Router must colour-assign all wires; avoid unresolvable conflicts
m

SADP 1+spacer 14nm–7nm Spacer width controls final pitch; core mask must be correct
s

SAQP 2+spacer 7nm–5nm Two SADP rounds; very restrictive design rules
s

EUV 1 7nm and Single-mask; fewer MP constraints; EUV source power challenges
below

VLSI Physical Design Master Reference · Shivani Shetkar · Page 38


Chapter 12: Design For Test (DFT)

Scan chains · ATPG · MBIST · Boundary Scan · PD impact

12.1 Why DFT?

Manufacturing defects are unavoidable. Without DFT, defective chips reach customers. DFT adds controllability and
observability to every node so test patterns can detect faults.

Stuck-at-0 fault Node permanently 0 due to manufacturing defect.

Stuck-at-1 fault Node permanently 1 due to manufacturing defect.

Transition fault Node fails to switch within expected time. Detected by at-speed testing.

Fault coverage % of modelled faults detectable. Target: >98%.

Controllability Ability to set any node to 0 or 1 via primary inputs (high = easier to test).

Observability Ability to observe any internal node at a primary output (high = faults detectable).

12.2 Scan Chain Architecture

Scan Flip-Flop vs Normal FF:

Normal DFF: Scan FF (SFF):

D ■■■ D FF ■■■ Q D ■■■■

CK■■■ SI ■■■■■■■ MUX ■■■ D FF ■■■ Q

SE ■■■■ CK■■■

SE=0: functional (D passes)

SE=1: scan mode (SI passes)

Chain of N scan FFs:

SCAN_IN ■■■ SFF1 ■■■ SFF2 ■■■ ... ■■■ SFFN ■■■ SCAN_OUT

(SE=1, apply N clocks to shift in a test pattern)

Test Sequence:

1. SHIFT IN: SE=1, shift test pattern into all N FFs (N clocks)

2. CAPTURE: SE=0, apply ONE functional clock — all FFs capture

their combinational logic response simultaneously

3. SHIFT OUT: SE=1, shift out captured response (N clocks)

4. COMPARE: compare shift-out to expected pattern → fault detected?

Example: Scan test time calculation

1,000,000 FFs, 100 scan chains → 10,000 FFs per chain. Each pattern: 10,000 shift clocks + 1 capture clock. At
100MHz shift frequency: 10,001 × 10ns = 100µs per pattern. 50,000 test patterns → 5 seconds test time per chip.
More chains = shorter test time but more IO pins needed.

12.3 Scan Reordering

Before Reorder (synthesis order = random): After Reorder (geometric):

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■■■■■■

VLSI Physical Design Master Reference · Shivani Shetkar · Page 39


■ FF1 (top-left) ■ ■ FF1■■■FF2■■■FF3 ■

■ ■ long wire ■ ■ (nearest neighbours) ■

■ ▼ ■ ■■■■■■■ ■ ■

■ FF6 (bottom-right) ■ ■ FF4■■■FF5■■■FF6 ■

■ ■ very long wire ■ ■ (nearest neighbours) ■

■ ▼ ■ ■ ■

■ FF2 (top-middle)... ■ ■■■■■■■■■■■■■■■■■■■■■■■■■

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 30-50% shorter scan wires

ICC-II commands:

set_scan_configuration -chain_count 16

compile_scan ;# reorder to nearest-neighbour

report_scan_chain ;# verify chains

■ Run scan reorder AFTER place_opt, BEFORE route_opt. Reduces scan routing wirelength by 30–50%. Skipping this causes
unnecessary routing congestion from long scan wires.

12.4 ATPG and MBIST

Test Type What It Tests Detected Faults Speed

Stuck-at ATPG Logic stuck at 0 or 1 Stuck-at, bridging Shift freq (slow)

Transition Fails to switch in time Timing defects Functional speed


ATPG

IDDQ Excess quiescent Bridges, oxide Static (no clock)


current defects

MBIST All memory cells SRAM stuck-at, Dedicated controller


(March algorithms) coupling

Boundary Scan Board-level pin PCB JTAG clock


(JTAG) connections opens/shorts

MBIST (Memory Built-In Self Test): SRAMs cannot be tested efficiently via scan chains. MBIST adds a dedicated
controller that applies March algorithms to test every memory cell. PD task: place MBIST controller close to its SRAM
(short routing to memory ports).

VLSI Physical Design Master Reference · Shivani Shetkar · Page 40


Chapter 13: ECO Methodology

Timing ECO · Setup fixes · Hold fixes · Functional ECO

13.1 What is an ECO?

An ECO (Engineering Change Order) is a minimal targeted modification to a placed-and-routed design to fix a specific
problem without re-running the full PnR flow. ECOs preserve surrounding layout.

ECO Type Purpose When Risk

Timing ECO Fix setup or hold Post-route Low — local changes


violations sign-off

Functional ECO Fix a logic bug Post-tapeout High — logic must change
discovered late respin

Power ECO Fix IR drop or EM After sign-off Medium — PDN change


violation

DRC ECO Fix DRC violations After DRC run Low — geometry only
from Calibre

13.2 Timing ECO — Complete Step-by-Step

Step 1: Identify failing path

report_timing -delay_type max -slack_lesser_than 0 -max_paths 10

→ Note: startpoint, endpoint, path slack (WNS)

Step 2: Analyse the path (report_timing -path_type full_clock)

→ Read: clock latency (launch+capture), cell delays, wire delays

→ Identify the BIGGEST contributor to delay

Step 3: Classify root cause

Cell delay > 60%? → upsize or swap Vt

Wire delay > 60%? → move cells closer or insert buffer

Skew contribution? → apply useful skew

Fanout > 6? → insert buffer tree

Step 4: Generate ECO (PrimeTime auto-ECO)

pt_eco_opt -setup -path_slack_threshold -0.1

write_changes -format icc2 -output [Link]

Step 5: Apply in ICC-II

source [Link] ;# move cells

route_eco -nets {changed_nets};# re-route only changed nets

Step 6: Re-extract and re-verify

write_parasitics -output [Link]

;# back to PrimeTime → confirm WNS improved

Step 7: Formal equivalence check

;# Formality: confirm ECO did not change logic function

VLSI Physical Design Master Reference · Shivani Shetkar · Page 41


Setup Fix Techniques

Fix Mechanism Side Effects When to Apply

Cell upsizing BUF_X1→BUF_X4: more Area+power Driver has high fanout load
drive → faster

LVT swap HVT→LVT: lower Vt → Leakage Quick wins on near-critical


faster increase

Buffer Split long wire → reduce 2 new cells Wire delay dominant
insertion RC delay

Useful skew Delay capture clock May worsen Available skew margin exists
arrival hold

Cell relocation Cells closer → shorter Local Large wire between critical cells
wire congestion

Hold Fix Techniques

Fix Mechanism Notes

Hold buffer Add delay cell on data path Most common; auto-done by ICC-II clock_opt
insertion (adds ~20–50ps each)

HVT cell swap Swap LVT/SVT to HVT: slower Also saves leakage
= more delay

Useful skew (delay Delay launch FF's clock → Must check setup impact on adjacent paths
launch) data launches later

VLSI Physical Design Master Reference · Shivani Shetkar · Page 42


Chapter 14: EDA Tools — Complete Command Reference

ICC-II · PrimeTime · Design Compiler · Calibre · OpenLane

14.1 ICC-II — Complete PnR Flow

# ■■ 1. Create Library ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

create_lib [Link] \

-technology [Link] \

-ref_libs {[Link] [Link] [Link]}

# ■■ 2. Read Netlist & Constraints ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

read_verilog netlist.v

link_block

read_sdc [Link]

read_upf [Link]

# ■■ 3. Floorplan ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

initialize_floorplan \

-core_utilization 0.70 \

-core_offset {2 2 2 2}

place_pins -ports [get_ports *]

set_cell_location -coordinates {50 50} [get_cells u_sram]

# ■■ 4. Power Planning ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

create_pg_ring -nets {VDD VSS} \

-layers {{M8 0.8 1.6} {M9 0.8 1.6}}

create_pg_mesh -nets {VDD VSS} \

-layers {{M7 0.4 4.0} {M8 0.4 4.0}}

connect_pg_net -automatic -all_blocks

compile_pg

# ■■ 5. Placement ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

place_opt

check_legality

report_congestion

# ■■ 6. CTS ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

set_clock_tree_options -max_transition 0.1 -target_skew 0.05

clock_opt

report_clock_qor

# ■■ 7. Routing ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

route_opt

check_routes

report_route_drc

# ■■ 8. Filler Cells ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

add_fillers -cell_names {FILLCAP_X8 FILL_X4 FILL_X2 FILL_X1}

VLSI Physical Design Master Reference · Shivani Shetkar · Page 43


create_metal_fill -layers {M1 M2 M3 M4 M5}

# ■■ 9. Outputs ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

write_parasitics -format spef -output [Link]

write_gds -output [Link]

write_verilog -output design_final.v

14.2 PrimeTime — Full STA Sign-off Flow

# 1. Library

set_app_var link_path {* [Link] [Link]}

# 2. Read design

read_verilog design_final.v

link_design TOP

# 3. MMMC

create_mode func_mode

create_corner ss_corner -lib_files {[Link]}

create_corner ff_corner -lib_files {[Link]}

create_analysis_view func_setup -mode func_mode -corner ss_corner

create_analysis_view func_hold -mode func_mode -corner ff_corner

set_analysis_view -setup {func_setup} -hold {func_hold}

# 4. Constraints

read_sdc [Link]

# 5. Parasitics

read_parasitics -format spef [Link]

# 6. OCV + CPPR

set_timing_derate -cell_delay -data -early 0.95 -late 1.05

set_app_var timing_remove_clock_reconvergence_pessimism true

# 7. SI

set_delay_calculation -si_mode ARNOLDI

# 8. Update and report

update_timing -full

report_qor

report_timing -delay_type max -slack_lesser_than 0 -max_paths 20

report_timing -delay_type min -slack_lesser_than 0 -max_paths 20

report_clock_timing -type skew

report_power

14.3 Design Compiler — Synthesis Reference

set_target_library {[Link] [Link]}

set_link_library {* [Link] [Link]}

read_verilog design.v

elaborate TOP

current_design TOP

VLSI Physical Design Master Reference · Shivani Shetkar · Page 44


source [Link]

compile_ultra -gate_clock -scan -no_autoungroup

report_qor

write_file -format verilog -hierarchy -output netlist.v

write_sdc [Link]

14.4 Calibre and OpenLane Reference

# Calibre DRC

calibre -drc -hier -turbo [Link] drc_rules.svrf

# Calibre LVS

calibre -lvs -hier [Link] lvs_rules.svrf

# KLayout DRC (free — Skywater 130nm)

klayout -b -r [Link] -rd input=[Link] [Link]

# Netgen LVS (free)

netgen -batch lvs '[Link] CELL' '[Link] CELL'

# OpenLane (free RTL-to-GDS)

pip install openlane

python -m openlane [Link]

OpenLane Tool ICC-II Equivalent Function

Yosys Design Compiler RTL synthesis → gate netlist

OpenROAD floorplan initialize_floorplan Die sizing, IO placement

OpenROAD place_opt Global + detailed placement


placement

TritonCTS clock_opt Clock tree synthesis

TritonRoute/FastRout route_opt Global + detailed routing


e

OpenSTA PrimeTime Static timing analysis

Magic Calibre DRC Layout DRC check

Netgen Calibre LVS Layout vs schematic

VLSI Physical Design Master Reference · Shivani Shetkar · Page 45


Chapter 15: Glossary, Formulas & Quick Reference

All key equations · Rules of thumb · 50+ term A–Z glossary

15.1 All Key Formulas

Setup Slack:

Slack = (Tcapture + Tcap_lat - Tsu - Tuncert) - (Tlaunch + Tlaunch_lat + Tdata)

Hold Slack:

Slack = (Tlaunch + Tlaunch_lat + Tdata_min) - (Tcapture + Tcap_lat + Th +


Tuncert_hold)

Dynamic Power:

P_dynamic = α × C_load × VDD² × f

Leakage Power:

P_leakage = VDD × I_leakage [exponential with temperature]

CMOS Inverter Switching:

E_switch = C × VDD² (energy per switching event)

IR Drop:

V_drop = I × R → acceptable < 5% × VDD

Timing impact of IR drop:

10mV IR drop ≈ 0.5–1% increase in cell delay

Electromigration (Black's Law):

MTF = A × J^(-n) × exp(Ea / kT) [J must stay below Jmax]

Antenna Ratio:

Ratio = Σ(wire+via area above gate) / gate oxide area [< foundry limit]

Crosstalk Noise:

∆V ≈ Cc / (Cc + Cvictim) × ∆V_aggressor

Core Area:

Core Area = Total_Std_Cell_Area / Target_Utilisation

Congestion Overflow:

Overflow = max(0, Routing_Demand - Routing_Capacity) per GCell

DVFS Power Saving (V only):

Reduce VDD 20%: P_dyn × (0.8)² = 0.64 → 36% savings

DVFS Power Saving (V + f):

Reduce both 20%: P_dyn × (0.8)² × 0.8 = 0.512 → 49% savings

VLSI Physical Design Master Reference · Shivani Shetkar · Page 46


Clock Skew:

Skew = |Latency_FF1 - Latency_FF2| [target < 50–150ps]

CPPR Credit:

Credit = (T_common × late_derate) - (T_common × early_derate) [~10-30ps]

NLDM Delay:

Cell delay = f(input_slew, output_load) [2D lookup table in .lib]

Wire RC delay:

T_wire = 0.38 × R_wire × C_wire [Elmore delay model]

15.2 Industry Rules of Thumb

Core utilisation 60–70% for complex SoCs; 50% for first tapeouts; max 75% before congestion risk

IR drop limit Static < 5% VDD; Dynamic < 10% VDD; e.g., <40mV for VDD=0.8V

Clock skew target < 50–150ps for 500MHz+ designs

CTS insertion delay 300ps–2ns typical depending on frequency and technology

Hold buffer count 2–8% of total cell count is typical post-CTS

Scan chain length 500–2000 FFs per chain; more chains = faster test but more IO

OCV derating (28nm) 5% data path, 3% clock path (flat OCVM)

Clock power fraction 20–40% of total dynamic power

NDR for clocks Always: 2W (double width) + 2S (double spacing)

Metal fill density Foundry typically requires 20–80% density per metal layer

DFM — redundant via Replace all single vias with double vias wherever space allows

EM limit (signal) 1–5 mA/µm wire width (check foundry spec per layer and temperature)

Scan reorder savings 30–50% scan routing wirelength reduction

LVT leakage vs SVT LVT has ~2–4× more leakage than SVT at same cell function

FO4 delay Buffer driving 4× its own gate load. Standard benchmark ≈ 50–100ps at 14nm

15.3 Complete A–Z Glossary

AOCV Advanced On-Chip Variation — depth/distance-based OCV. More accurate than flat OCVM.

ATPG Automatic Test Pattern Generation — creates test vectors for fault detection.

BIST Built-In Self Test — dedicated hardware to test memories or logic blocks autonomously.

BSC Boundary Scan Cell — scan cell at IO pad for JTAG board-level test (IEEE 1149.1).

CDC Clock Domain Crossing — signal between different clock domains. Needs synchroniser.

CMP Chemical Mechanical Planarisation — fab process that polishes each metal layer flat.

CPPR Clock Path Pessimism Removal — removes double-derating of shared clock path.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 47


CTS Clock Tree Synthesis — builds buffered clock distribution tree with low skew.

DEF Design Exchange Format — snapshot of physical design state.

DFM Design For Manufacturability — yield improvement beyond basic DRC compliance.

DFT Design For Test — adds testability (scan, BIST, JTAG) for manufacturing testing.

DRC Design Rule Check — geometric verification against foundry manufacturing rules.

DVFS Dynamic Voltage and Frequency Scaling — adjusts VDD and f to save power.

ECO Engineering Change Order — targeted minimal fix to placed/routed design.

ECSM Effective Current Source Model — advanced transistor model for SI analysis.

ELS Enable Level Shifter — combined level shift + isolation cell at domain boundaries.

EM Electromigration — metal atom migration from high current density. Long-term failure.

ERC Electrical Rule Check — floating gates, open power pins, latch-up spacing.

EUV Extreme Ultraviolet Lithography — 13.5nm wavelength; single-pattern at 7nm and below.

FEP Failing Endpoint — FF or output with at least one negative timing arc.

FinFET Fin Field-Effect Transistor — 3D transistor; gate wraps 3 sides of vertical fin.

FO4 Fan-out of 4 — buffer driving 4× its own input capacitance. Standard delay benchmark.

GDS GDSII — binary layout polygon file. Final deliverable to foundry for mask making.

GRC Global Routing Congestion — tile-based routing demand vs supply map.

HVT High Threshold Voltage — slower cell with less leakage. Used on non-critical paths.

ICG Integrated Clock Gate — glitch-free clock gating cell (latch + AND gate).

IR Drop Voltage drop across power grid resistance. Reduces effective VDD at cells.

JTAG Joint Test Action Group (IEEE 1149.1) — boundary scan standard.

LEF Library Exchange Format — cell physical abstracts + technology layer rules.

LVS Layout vs Schematic — verifies extracted layout matches reference netlist.

LVT Low Threshold Voltage — faster cell with more leakage. Used on critical paths.

MBIST Memory BIST — tests SRAMs using March algorithms via dedicated controller.

MMMC Multi-Mode Multi-Corner — simultaneous STA across all modes and PVT corners.

MTF Mean Time to Failure — EM reliability metric. Must meet 10-year product lifetime.

NDM New Data Model — Synopsys ICC-II unified library format (.lib + .lef + .gds).

NDR Non-Default Routing Rule — custom width/spacing. Applied to clock nets (2W/2S).

OCV On-Chip Variation — spatial PVT variation modelled by derating cell delays.

PBA Path-Based Analysis — cell-by-cell accurate STA for worst-case paths (vs graph-based GBA).

PDN Power Distribution Network — VDD/VSS grid delivering power to all cells.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 48


POCV Parametric OCV — statistical (mean+N×sigma) variation model. Most accurate.

PT PrimeTime — Synopsys industry-standard STA and sign-off tool.

SADP Self-Aligned Double Patterning — spacer technique for 2× pitch reduction at 14nm/10nm.

SAQP Self-Aligned Quad Patterning — two SADP iterations for 4× pitch reduction at 7nm/5nm.

SDC Synopsys Design Constraints — timing constraint language used by all EDA tools.

SFF Scan Flip-Flop — DFF with extra scan-in and scan-enable inputs for DFT.

SI Signal Integrity — analysis of crosstalk noise and delay from capacitive coupling.

SPEF Standard Parasitic Exchange Format — extracted R+C for STA back-annotation.

STA Static Timing Analysis — formal exhaustive timing verification without simulation.

TAP Test Access Port — JTAG controller FSM (TDI, TDO, TMS, TCK).

TNS Total Negative Slack — sum of all failing path slacks. Zero = timing closed.

UPF Unified Power Format (IEEE 1801) — describes power domains, isolation, level shifters.

WNS Worst Negative Slack — most negative single path slack. Must reach ≥ 0.

Chapter A1: Clock Domain Crossing (CDC)

Metastability · Synchronisers · FIFO · CDC verification

A1.1 What is CDC and Why It's Dangerous

CDC occurs when a signal passes from a flip-flop in one clock domain to a flip-flop in a different, asynchronous clock
domain. The receiving FF may sample the signal at the exact moment it is transitioning — violating setup or hold time
— causing metastability. This is one of the most common sources of hard-to-debug silicon failures.

CDC Problem Illustration:

Domain A (clk_a = 100MHz): Domain B (clk_b = 133MHz, different PLL):

FF_A ■■■■ data ■■■■■■■■■■■■■■■■■■■■■■■ FF_B

■ ■

clk_a clk_b

Problem: clk_a and clk_b have no fixed phase relationship.

FF_B might sample 'data' at EXACTLY the moment FF_A is switching.

→ FF_B enters metastable state (output stuck at mid-voltage ~VDD/2)

→ If metastability doesn't resolve before next clk_b edge:

→ Wrong data propagates → functional failure

Metastability resolution time: exponentially distributed.

P(failure in T seconds) ∝ exp(-T / τ)

where τ ≈ 50–100ps for modern FFs at 1GHz

■ CDC bugs do NOT appear in RTL simulation (clocks are ideal) or in STA (false paths are set across domains). They only
manifest on real silicon, intermittently, and are extremely hard to reproduce and debug.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 49


A1.2 Two-Flop Synchroniser

The most common CDC fix: add two flip-flops in the destination domain before using the signal. The first FF may go
metastable, but has a full clock period to resolve before the second FF samples it.

Two-Flop Synchroniser:

Domain A ■ Domain B

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

FF_A ■■■ data ■■■■■■■ FF_sync1 ■■■ FF_sync2 ■■■ FF_dest

■ ■ ■ ■ ■

clk_a ■ clk_b clk_b clk_b

CDC crossing point

FF_sync1: may go metastable — has 1/clk_b period to resolve

FF_sync2: samples resolved output of FF_sync1 — safe

Verilog implementation:

always_ff @(posedge clk_b or negedge rst_b_n) begin

if (!rst_b_n) begin

sync_stage1 <= 1'b0;

sync_out <= 1'b0;

end else begin

sync_stage1 <= data_a; // may go metastable

sync_out <= sync_stage1; // safe — resolved

end

end

PD requirement: FF_sync1 and FF_sync2 must be physically adjacent

(minimise wire length between them) and use low-Vt, fast FFs.

A1.3 Multi-Bit CDC — FIFO and Gray Code

Two-flop synchroniser only works for SINGLE-BIT signals. For multi-bit buses, multiple bits transition simultaneously,
and a synchroniser may capture them in different combinations — corrupting data.

Gray code counter Increment a counter where only ONE bit changes per count step. Apply a 2-flop sync to the
Gray-coded count. Only 1 bit ever transitions → safe to synchronise. Decode Gray back to
binary after the synchroniser. Used for FIFO pointers.

Async FIFO Most robust multi-bit CDC solution. Write pointer in write domain (clk_a). Read pointer in
read domain (clk_b). Both pointers Gray-coded before crossing. FIFO flags (full, empty)
computed by comparing synchronised pointers.

Handshake protocol Req/ack protocol: sender asserts REQ, waits for ACK from receiver. Each signal is
individually synchronised. Slow (2–4 latency cycles) but correct for single transfers.

Enable pulse For single-cycle enable pulses: stretch the pulse to be wider than the destination clock
synchroniser period before synchronising. Or use a toggle synchroniser.

Gray Code vs Binary — Example (3-bit):

VLSI Physical Design Master Reference · Shivani Shetkar · Page 50


Binary: 000 001 010 011 100 101 110 111

Gray: 000 001 011 010 110 111 101 100

↑ ↑ ↑ ↑ ↑ ↑

Only 1 bit changes at each step!

Binary 2→3: 010→011 — 1 bit changes ✓

Binary 3→4: 011→100 — 3 bits change ✗ (unsafe for CDC!)

Gray 3→4: 010→110 — 1 bit changes ✓

A1.4 CDC Verification

set_false_path in STA CDC paths must be false-pathed in SDC: set_false_path -from [get_clocks clk_a] -to
[get_clocks clk_b]. Otherwise STA gives false timing violations on crossing paths.

CDC static analysis Synopsys SpyGlass CDC, Cadence JasperGold CDC — formally identify all crossing
tools signals, classify them (single-bit, multi-bit), verify synchroniser presence.

Metastability MTBF MTBF = exp(Tw/τ) / (fc × fa × Td). Must be > 100 years for reliable operation. PD must
calc ensure synchroniser FFs are fast (use LVT) and physically close.

PD placement Synchroniser FFs (sync_stage1 + sync_stage2) must be placed adjacent. Use


constraint create_placement_constraint -type cluster to keep them together.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 51


Chapter A2: Advanced STA — Beyond the Basics

PBA · SI-aware STA · Half-cycle paths · Timing constraints debug

A2.1 GBA vs PBA — Graph-Based vs Path-Based Analysis

STA tools use two modes of delay calculation. Understanding the difference is important for both accuracy and
efficient ECO closure.

Property GBA (Graph-Based) PBA (Path-Based)

Method Tag each node with worst-case arrival/required Trace each specific path end-to-end with actual
times from all incoming paths input slew at each stage

Speed Very fast — one pass through the timing graph Slow — must re-compute for each path
individually

Accuracy Conservative — may report pessimistic More accurate — computes actual slew/delay
violations per path

Usage General timing analysis, all paths pt_eco_opt, final sign-off on critical paths

Tool command (default mode) set_app_var


timing_enable_path_based_analysis true

■ PBA can recover 10–50ps of false pessimism on critical paths compared to GBA. pt_eco_opt uses PBA internally — this is
why PT-suggested ECOs are sometimes more aggressive than what you'd estimate manually from GBA reports.

A2.2 SI-Aware STA

Standard STA without SI uses extracted SPEF parasitics but ignores coupling capacitors. SI-aware STA (PrimeTime
SI) models crosstalk delay on every net.

SI delay calculation flow in PrimeTime:

1. Load SPEF with coupling caps (Cc between nets)

read_parasitics -format spef design_with_cc.spef

2. Enable SI calculation

set_delay_calculation -si_mode ARNOLDI

# ARNOLDI = waveform-based, most accurate

# PRIME = faster, less accurate

3. SI analysis computes for each net:

For each aggressor switching OUT-OF-PHASE with victim:

→ Worst-case crosstalk delay = delay + ∆t_crosstalk

4. Reports

report_si_bottleneck -cost_type delta_delay ;# worst SI-affected paths

report_noise_on_net [get_nets critical_net] ;# glitch analysis on net

5. Fixes for SI delta_delay:

- Increase spacing between aggressor and victim

- Add shield wire (VDD/VSS) between them

- Reroute aggressor/victim on different layers

VLSI Physical Design Master Reference · Shivani Shetkar · Page 52


- Upsize victim driver (reduces Zout, less coupling effect)

A2.3 Half-Cycle Paths

A half-cycle path connects a FF that launches on the rising clock edge to one that captures on the FALLING edge (or
vice versa). Only half the clock period is available.

Half-Cycle Path Example:

CLK: ■■ ■■■■ ■■■

■■■■ ■■■■

↑ ↓

Rising Falling edge

FF1 (rising edge): captures at each ↑

FF2 (falling edge): captures at each ↓

Data from FF1 must travel to FF2 in only HALF a clock period!

For a 2ns clock: only 1ns for the data path (vs 2ns for normal paths)

STA handling:

Setup: constraint = Tclk/2 (half-period path group)

Hold: checked at the PREVIOUS falling edge (−Tclk/2)

Where half-cycle paths appear:

- Negative-edge triggered FFs in a design using both edges

- DDR interfaces (data valid on both rising and falling edges)

- Pulse-width modulator outputs

- Some high-speed multipliers using both edges

■ In PrimeTime, half-cycle paths are reported in a separate path group. Check for them with: report_timing -path_group
half_cycle_path. They are harder to close — treat them as double-frequency paths.

A2.4 Clock Reconvergence and CPPR Deep Dive

Clock Reconvergence Paths (setup timing):

CLK_ROOT

■■■■■■■■■■■■■■■

■ SHARED ■ ← Tcommon = 0.2ns delay here

■ SEGMENT ■

■■■BUF■■■■■■FF_launch (launch latency = 0.45ns total)

■■■BUF■■■■■■FF_capture (capture latency = 0.40ns total)

Without CPPR:

Launch clock (late derate 1.05): 0.45ns × 1.05 = 0.4725ns

Capture clock (early derate 0.95): 0.40ns × 0.95 = 0.3800ns

Slack = 2.0 + 0.3800 - 0.4725 - 1.4 - 0.05 - 0.10 = 0.358ns

With CPPR (remove double-derating on 0.2ns shared segment):

CPPR credit = 0.2 × (1.05 - 0.95) = 0.2 × 0.10 = 0.020ns

Corrected slack = 0.358 + 0.020 = 0.378ns (+20ps recovered!)

VLSI Physical Design Master Reference · Shivani Shetkar · Page 53


Command: set_app_var timing_remove_clock_reconvergence_pessimism true

VLSI Physical Design Master Reference · Shivani Shetkar · Page 54


Chapter A3: Tcl Scripting for Physical Design

Variables · Collections · Procs · Automation scripts

A3.1 Tcl Language Fundamentals

Tcl (Tool Command Language) is the scripting language inside every Synopsys and Cadence EDA tool. Writing your
own scripts is what separates junior from senior PD engineers.

# ■■ Variables ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

set my_var "hello" ;# string variable

set count 42 ;# integer

set slack -0.050 ;# float

set name [get_object_name [current_design]] ;# command substitution

# ■■ Arithmetic ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

set result [expr {$count + 10}] ;# = 52

set bad [expr {$slack < 0}] ;# = 1 (true) if slack is negative

# ■■ String operations ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

set full_name [string cat $name "_v2"]

set upper [string toupper $my_var]

set length [string length $my_var]

# ■■ Lists ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

set my_list {a b c d e}

set item0 [lindex $my_list 0] ;# = a

lappend my_list f ;# add to list

set len [llength $my_list] ;# = 6

# ■■ Conditionals ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

if {$slack < 0} {

puts "FAIL: slack = $slack"

} elseif {$slack < 0.05} {

puts "WARNING: tight slack = $slack"

} else {

puts "PASS"

# ■■ Loops ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

foreach item $my_list {

puts "item: $item"

for {set i 0} {$i < 10} {incr i} {

puts "i = $i"

A3.2 EDA Tool Collections and Queries

VLSI Physical Design Master Reference · Shivani Shetkar · Page 55


The most important Tcl skill for PD is manipulating design object collections — cells, nets, pins, ports, clocks. These
commands work in ICC-II and PrimeTime.

# ■■ Get objects ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

get_cells * ;# all cells in design

get_cells u_cpu* ;# cells starting with u_cpu

get_nets {clk rst_n data_bus*} ;# specific nets by name

get_pins u_reg/D ;# specific pin

get_ports [all_inputs] ;# all input ports

get_clocks * ;# all clock objects

# ■■ Filter collections ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

# Get all LVT cells

set lvt_cells [filter_collection [get_cells *] {@lib_cell.name =~ *LVT*}]

# Get all cells with positive leakage > threshold

set large_cells [filter_collection [get_cells *] {@area > 10}]

# ■■ Iterate over collection ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

foreach_in_collection cell [get_cells *] {

set cell_name [get_object_name $cell]

set ref [get_attribute $cell ref_name]

set area [get_attribute $cell area]

puts "$cell_name $ref area=$area"

# ■■ Count objects ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

set num_cells [sizeof_collection [get_cells *]]

puts "Total cells: $num_cells"

# ■■ Get attributes ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

get_attribute [get_cells u_buf] ref_name ;# library cell name

get_attribute [get_cells u_buf] full_name ;# hierarchical path

get_attribute [get_nets clk] net_type ;# clock/signal/power

A3.3 Procedures (Functions)

# Define a reusable procedure

proc report_slack_summary {threshold} {

set failing [get_timing_paths -delay_type max \

-slack_lesser_than $threshold -max_paths 1000]

set count [sizeof_collection $failing]

set wns [get_attribute [lindex $failing 0] slack]

puts "Failing paths: $count WNS: $wns ns"

return $count

# Call the procedure

set violations [report_slack_summary 0.0]

if {$violations > 0} { puts "TIMING NOT CLOSED" }

VLSI Physical Design Master Reference · Shivani Shetkar · Page 56


# Procedure with default argument

proc get_critical_cells {{slack_limit -0.05}} {

set paths [get_timing_paths -slack_lesser_than $slack_limit]

set cells [get_cells -of_objects $paths]

return $cells

A3.4 Practical PD Automation Scripts

# Script 1: Auto-swap near-critical cells to LVT

# Find cells on paths with slack < 0.05 and swap SVT → LVT

proc swap_to_lvt {slack_threshold} {

set paths [get_timing_paths -delay_type max \

-slack_lesser_than $slack_threshold -max_paths 500]

set changed 0

foreach_in_collection path $paths {

set cells [get_cells -of_objects $path]

foreach_in_collection cell $cells {

set ref [get_attribute $cell ref_name]

if {[regexp {SVT} $ref]} {

set lvt_ref [regsub {SVT} $ref {LVT}]

if {[sizeof_collection [get_lib_cells $lvt_ref]] > 0} {

size_cell $cell $lvt_ref

incr changed

puts "Swapped $changed cells to LVT"

# Script 2: Write timing summary to file

proc write_timing_report {filename} {

set fh [open $filename w]

puts $fh "=== TIMING SUMMARY ==="

puts $fh "WNS setup: [get_attribute [current_design] \

slack_wns_max_delay]"}

puts $fh "TNS setup: [get_attribute [current_design] \

slack_tns_max_delay]"}

close $fh

# Script 3: Identify and report all hold violators

proc report_hold_violators {} {

set paths [get_timing_paths -delay_type min -slack_lesser_than 0]

puts "Hold violations: [sizeof_collection $paths]"

VLSI Physical Design Master Reference · Shivani Shetkar · Page 57


foreach_in_collection p $paths {

puts " [get_attribute $p startpoint_clock] → \

[get_attribute $p endpoint] slack=[get_attribute $p slack]"

VLSI Physical Design Master Reference · Shivani Shetkar · Page 58


Chapter A4: Power Analysis and Thermal Management

Voltus/RedHawk · Power grid analysis · Thermal · EM sign-off

A4.1 Static vs Dynamic Power Analysis

Power analysis determines whether the chip's power delivery system is adequate and whether the chip will overheat.
Two primary analysis types:

Type Input Activity Analyses Tool

Static IR drop Leakage current only DC resistance of grid Voltus static mode

Dynamic IR VCD/SAIF switching Transient voltage Voltus dynamic mode


drop activity file droop

Static EM Average current (Iavg Avg Jmax check Voltus/RedHawk EM


per net)

Dynamic EM RMS current (Irms per Irms Jmax check Voltus/RedHawk EM


net with duty cycle)

Thermal Power map + package Junction temperature Ansys Totem / Celsius


thermal model

A4.2 Power Analysis Flow (Voltus)

# Synopsys Voltus power/IR/EM flow

# 1. Start Voltus within IC Compiler II or standalone

set_db rail_analysis_config {

-mode time_domain

-method dynamic_vectorbased

# 2. Read design and power intent

read_power_intent -1801 [Link]

# 3. Provide switching activity

# Option A: VCD (Value Change Dump from simulation)

read_activity_file -format vcd sim_output.vcd

# Option B: SAIF (Switching Activity Interchange Format)

read_activity_file -format saif [Link]

# Option C: Estimated activity

set_switching_activity -toggle_rate 0.2 -static_probability 0.5 \

[all_registers]

# 4. Run analysis

analyze_rail -power_domains {PD_TOP PD_LOW}

# 5. Reports

report_rail -type power ;# dynamic + leakage per domain

report_rail -type static_ir_drop ;# IR drop map

VLSI Physical Design Master Reference · Shivani Shetkar · Page 59


report_rail -type dynamic_ir_drop ;# transient IR map

report_rail -type em ;# EM violations list

A4.3 Interpreting IR Drop and EM Results

IR Drop Report interpretation:

Net: VDD Domain: PD_TOP

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Max static IR drop: 18mV (2.25% of 0.8V) ✓

Max dynamic IR drop: 47mV (5.88% of 0.8V) ✗ FAIL!

Location of max drop: (350um, 420um)

Worst cell: u_alu/u_adder/FA_32

Action: add VDD strap at (350um, 420um) and add DCAP cells nearby

EM Report interpretation:

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Violation Layer Net J_avg Jmax_limit

-------- ------ ---------- --------- ----------

EM_VIO_1 M3 clk_net 1.82mA/um 1.50mA/um ← widen wire

EM_VIO_2 VIA2 power_strap 8.43mA/um 5.00mA/um ← add redundant via

Fix EM_VIO_1: set NDR on clk_net to 2x width → J halved

Fix EM_VIO_2: replace single via with 2x2 via array

A4.4 Thermal Analysis Basics

Junction temperature affects transistor speed, leakage, and reliability. High temperature → more leakage → more
power → higher temperature (thermal runaway risk).

T_junction = T_ambient + (P_chip × R_theta_JA)

where R_theta_JA = junction-to-ambient thermal resistance (°C/W). Typical package: 20–50°C/W. Chip with 2W and
R_theta=40°C/W at 25°C ambient: Tj = 25 + 2×40 = 105°C.

Hotspot Local region of the chip with peak power density. Usually in high-activity CPU cores or
arithmetic units.

Thermal gradient Temperature difference across the die. Large gradients → Vt variation → timing variation
(timing closure becomes temperature-dependent).

Thermal-aware Place high-power cells away from each other to distribute heat. Avoid concentrated hot
placement spots.

Chip-package Thermal bumps (for flip-chip packages) placed directly under hotspots to improve heat
co-design extraction.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 60


Chapter A5: Advanced DFT — Compression & At-Speed Testing

Scan compression · EDT · At-speed ATPG · IJTAG

A5.1 Scan Compression

Modern chips have tens of millions of flip-flops. Without compression, test data volume and test time would be
unacceptably large. Scan compression reduces the number of scan channels needed by a compression ratio (e.g.,
100:1).

Basic Scan (no compression): EDT Compressed Scan:

Test I/O ■■ chain1 (10K FFs) Test I/O ■■ DECOMPRESSOR

■■ chain2 (10K FFs) ■ ■ ■ ■

... ■■■■■■■■■■■■■■■■

■■ chain100 (10K FFs) Internal chains (1000 FFs each)

■■■■■■■■■■■■■■■■

100 scan pins needed! ■ ■ ■ ■

COMPRESSOR ■■ Test I/O

With 100:1 compression: only 1-2 scan I/O pins needed

The compressor/decompressor are small hardware blocks

added by the DFT tool (Tessent, Modus)

Compression ratio = (# internal scan chains) / (# test I/O pins)

Typical: 50:1 to 200:1 in production designs

EDT (Embedded Deterministic Test): Mentor/Tessent scan compression

TK scan: Synopsys equivalent

A5.2 At-Speed Testing (Transition Fault)

Stuck-at ATPG detects manufacturing defects that fix a node permanently. But some defects cause a node to switch
correctly but too slowly. These are only detected by running at the actual functional clock frequency.

Launch-on-Shift (LoS) The last shift clock (at functional speed) acts as the launch clock. Capture happens one
functional cycle later. Easy to implement but has poor coverage for slow-to-rise faults.

Launch-on-Capture Separate launch and capture clocks. Launch at functional speed, capture one cycle later.
(LoC) Better coverage but needs special clock handling.

Test coverage target >95% transition fault coverage for production. Requires careful at-speed clock routing — the
test clock must meet functional timing constraints.

PD impact At-speed test mode is a separate SDC mode in MMMC. Clock must transition at functional
frequency during test. Clock routing in test mode must meet functional timing.

A5.3 IJTAG (IEEE 1687) — Instrument-Level Test Access

IJTAG extends IEEE 1149.1 (JTAG) to provide standardised access to on-chip instruments such as BIST controllers,
PLL configuration registers, and embedded sensors.
SIB (Segment A 1-bit scan element that enables/disables access to a subtree of instruments. Allows
Insertion Bit) efficient navigation to any instrument without shifting through unrelated instruments.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 61


ICL (Instrument IEEE 1687 language that describes the topology of instruments and their connections.
Connectivity
Language)

PDL (Procedural Describes test procedures for accessing instruments in a retargetable way.
Description Language)

PD relevance IJTAG/JTAG cells must be placed in the IO ring. Their timing (to TCK) must be analysed in
the JTAG clock domain corner in MMMC.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 62


Chapter A6: Mixed-Signal PD Awareness

Analog/digital co-existence · Substrate noise · Guard rings · Floorplan

A6.1 Why Mixed-Signal PD is Different

Modern SoCs combine digital logic with analog circuits (ADCs, DACs, PLLs, SerDes, RF). Digital switching creates
substrate noise that disturbs analog circuits. The PD engineer must create physical isolation strategies.

Mixed-Signal Floorplan Strategy:

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

■ IO Ring / Bump Array ■

■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■

■ ■ ANALOG BLOCK ■ ISOLATION ■ ■ ■

■ ■ (PLL, ADC, ■ ZONE ■ ■ ■

■ ■ Bandgap, ■ (deep ■DIGITAL ■ ■

■ ■ SerDes PHY) ■ N-well, ■LOGIC ■ ■

■ ■ ■ guard rings)■ ■ ■

■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Key rules:

1. Analog macros on opposite side of die from high-switching logic

2. Wide substrate separation (>50µm) between analog and digital

3. Guard rings (VDD/VSS rings) around analog blocks

4. Separate analog VDD (AVDD) and digital VDD (DVDD)

5. Analog power never routes over digital switching cells

6. Independent ground plane for analog (AGND)

A6.2 Substrate Noise and Guard Rings

Substrate noise Digital switching induces current into the silicon substrate. This couples to analog circuits,
appearing as noise on sensitive nodes (ADC input, PLL VCO).

Deep N-well An N-well implanted deep under the P-substrate isolates a region from substrate noise.
Used under sensitive analog circuits. Requires process support.

Guard ring (P+) A ring of heavily-doped P+ contacts connected to VSS surrounding analog. Intercepts noise
current before it reaches analog. Width: typically 2–10µm.

Guard ring (N+) A ring of N+ contacts connected to VDD surrounding NMOS circuits. Prevents latch-up and
reduces noise coupling.

Latch-up A parasitic PNPN thyristor structure between PMOS source/body and NMOS source/body. If
triggered, it latches VDD to GND permanently — destroying the chip. Prevented by proper
guard ring spacing and N-well distances.

PD rule Never route high-activity digital nets directly over or under analog supply rails. Use metal
shielding layers between digital routing and analog supply.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 63


Chapter A7: Skills to Learn for Job Readiness

Prioritised skill gaps · Learning plan · What UK employers test

A7.1 What UK PD Employers Actually Test

Based on typical UK VLSI physical design engineer interviews (ARM, Imagination Technologies, Sifive, MediaTek
UK, Qualcomm UK, Cadence, Synopsys, FTDI, Dialog, Nordic, Renesas UK):

Topic Frequency in Interviews


Your Current LevelPriority

Setup/Hold timing fundamentals Always asked (100%) Good Deepen

STA path analysis (reading PT output)


Always (100%) Good Practice daily

Clock tree concepts + skew Always (100%) Good Deepen ICG

Floorplan / macro placement Very frequent (85%) Good Add PDN detail

IR drop and PDN analysis Very frequent (85%) Moderate LEARN NOW

DRC / LVS concepts Very frequent (85%) Weak CRITICAL GAP

Routing DRC rules Frequent (70%) Moderate Deepen

CDC / synchronisers Frequent (65%) Basic LEARN NOW

Power analysis / UPF Frequent (65%) Moderate Good base

Tcl scripting (practical) Frequent (60%) Basic Practice weekly

MMMC / OCV in STA Moderate (55%) Moderate Deepen

DFT / scan chain PD impact Moderate (50%) Basic Know concepts

FinFET/advanced node DRC Moderate (50%) Some (14nm exp) Mention your exp

Formal verification Occasional (30%) None Know concepts

Mixed-signal awareness Occasional (25%) None Know basics

A7.2 Skill-by-Skill Learning Plan

Calibre DRC/LVS (Critical — learn first)

Why it matters: Calibre is the industry standard. Not knowing it is the #1 reason graduates fail PD interviews.

How to learn: FREE: Download Skywater 130nm PDK ([Link]/google/skywater-pdk). Install KLayout (free,
[Link]). Run DRC on sample GDS. Read 10 DRC rules. Install Netgen (free). Run LVS on a sample design.
PAID: Mentor Calibre student license (check university access). Week 1–2: understand all 10 common violation
types. Week 3: fix them manually.

IR Drop and PDN Analysis

Why it matters: Frequently asked in UK interviews: 'Describe your PDN analysis experience. What was your IR
drop?'

How to learn: Action: in your existing ICC-II projects, run check_pg_connectivity and report_pg_supply_violation.
Then use report_rail if Voltus available. At minimum, learn the theory: V=IR, IR drop targets (5% static, 10%
dynamic), fix strategies (add straps, decap). Practice drawing PDN hierarchy from memory.

CDC (Clock Domain Crossing)

VLSI Physical Design Master Reference · Shivani Shetkar · Page 64


Why it matters: CDC questions appear in 65% of UK PD interviews. 'What is a 2-flop synchroniser and when is it
needed?'

How to learn: Theory: understand metastability, MTBF, two-flop synchroniser, Gray code, async FIFO. Tool:
SpyGlass CDC (if available). Free alternative: write CDC scenarios in RTL and verify using open-source tools
(Verilator + checks). Interview prep: be able to draw a 2-flop synchroniser from memory and explain why 2 FFs.

Tcl Scripting

Why it matters: Senior engineers are expected to automate common tasks. Having 2–3 scripts on GitHub
demonstrates practical skill.

How to learn: Practice: write ONE script per week. Start with: (1) a PrimeTime script that reads SPEF, runs timing,
and logs WNS/TNS to a file, (2) an ICC-II script that identifies the 10 most congested GCells and reports their
locations, (3) a Tcl utility proc library you can reuse across projects. Resource: Tcl Tutorial at
[Link]/man/tcl8.6/tutorial. Then upload all scripts to GitHub with clear README.

OpenLane + Skywater 130nm Portfolio

Why it matters: UK employers want EVIDENCE of hands-on work. A GitHub portfolio separates you from other
candidates.

How to learn: Projects to build (in order): (1) 8-bit ALU: RTL → GDS in OpenLane. Show timing report, congestion
map, DRC clean. (2) RISC-V (picorv32): run through OpenLane. Document every step. (3) Multi-power domain: add
UPF to a design. Show isolation cells inserted. Each project: 1-page PDF writeup + screenshots + GitHub README.
Install: pip install openlane (Docker). Free Skywater PDK.

MMMC and AOCV/POCV

Why it matters: Modern sign-off always uses MMMC + AOCV or POCV. Single-corner STA is no longer acceptable.

How to learn: Practice: in PrimeTime, create a 3-view MMMC setup (func_setup SS, func_hold FF, test_setup SS).
Enable AOCV with set_ocvm_mode advanced. Enable CPPR. Compare WNS with and without AOCV — see how
much pessimism is removed. Resource: PrimeTime User Guide Chapter 9 (MMMC) — free via Synopsys
SolvNetPlus.

A7.3 GitHub Portfolio — What to Build

A strong GitHub profile is increasingly important for UK semiconductor roles. Here is a prioritised list of what to build,
in order of impact:

Priority Project What to Show Time Estimate

RTL → GDS, DRC clean, timing report, congestion map,


1 (Must have) 8-bit ALU — OpenLane/Sky130 2 weeks
README with screenshots

2 (Must have) Timing Analysis Deep-Dive Take a design, introduce a setup violation, document1finding
week + ECO fix in PrimeTime

3 (Strong) Tcl Automation Script Library5+ reusable procs: timing summary, congestion report,
1 week
LVT swap automation

4 (Strong) Multi-Power Domain Design UPF-annotated design, show isolation cells + level shifters
2 weeks in ICC-II

5 (Good) DRC Analysis Report Run KLayout DRC, document 5 violations, explain each
1 week
+ fix strategy

6 (Bonus) RISC-V through OpenLane picorv32 full flow, timing closure steps, before/after ECO
2 weeks
comparison

VLSI Physical Design Master Reference · Shivani Shetkar · Page 65


Interview Questions & Answers — Complete Guide

130 Q&A; across all topics · Beginner to Advanced · Real interview style

This section contains 130 interview questions exactly as asked in UK VLSI Physical Design interviews (ARM,
Imagination, MediaTek, Qualcomm, Cadence, Synopsys, Nordic, Dialog). Each answer is written in first-person so
you can speak it aloud directly. Questions are organised from basic to advanced within each topic.

Section 1: VLSI & CMOS Fundamentals

Q: What is a MOSFET and how does it work?

A: A MOSFET is a voltage-controlled switch. It has four terminals: Gate, Drain, Source, and Body. In an NMOS
transistor, when the gate voltage exceeds the threshold voltage Vt, an inversion layer forms between drain and
source — creating a conducting channel. Current then flows from drain to source. When gate voltage is below Vt, no
channel forms and the device is OFF. This switching behaviour is the basis of all digital logic.

Q: What is the difference between NMOS and PMOS?

A: NMOS uses an n-type channel and turns ON when the gate is HIGH (Vgs > Vt). It is a strong pull-down device.
PMOS uses a p-type channel and turns ON when the gate is LOW (Vgs < -|Vt|). It is a strong pull-up device. NMOS
is approximately 2–3× faster than PMOS for the same width because electron mobility is higher than hole mobility. In
CMOS design, NMOS and PMOS are paired in complementary networks — PMOS as pull-up, NMOS as pull-down.

Q: Why is CMOS preferred over NMOS-only or PMOS-only logic?

A: CMOS consumes near-zero static power because in any steady state, either the PMOS or the NMOS is OFF,
blocking the DC path from VDD to GND. NMOS-only logic requires a pull-up resistor, which always draws current.
CMOS also provides full voltage swing (output reaches exactly VDD or GND), giving large noise margins. The
combination of low power, full swing, and scalability makes CMOS the dominant technology for all digital ICs.

Q: What is threshold voltage and what are HVT, SVT, and LVT cells?

A: Threshold voltage Vt is the minimum gate-to-source voltage required to create a conducting channel in a
MOSFET. In standard cell libraries, three variants are offered: HVT (High Vt) cells have a higher threshold, so they
are slower but have very low leakage — used on non-critical paths to save power. SVT (Standard Vt) is the balanced
default. LVT (Low Vt) has lower threshold, making it faster but with more leakage — used only on critical timing
paths. The PD tool assigns Vt variants based on path slack to optimise power while meeting timing.

Q: What is the CMOS inverter and explain its operation?

A: The CMOS inverter consists of one PMOS and one NMOS transistor. The PMOS source connects to VDD and
the NMOS source to GND. Both gates connect to the input. Both drains connect to the output. When input is LOW:
NMOS is OFF, PMOS is ON — output is pulled to VDD (logic 1). When input is HIGH: PMOS is OFF, NMOS is ON
— output is pulled to GND (logic 0). In steady state, one device is always OFF, so no DC current flows — this is why
CMOS has near-zero static power.

Q: Explain static and dynamic power in CMOS.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 66


A: Dynamic power is P = alpha × C × VDD² × f, where alpha is the activity factor, C is the load capacitance, VDD is
the supply voltage, and f is the clock frequency. This is consumed during each switching transition as the
capacitance is charged and discharged. Static power is P = VDD × Ileakage, caused by subthreshold leakage
current that flows even when transistors are OFF. At advanced nodes like 7nm and 5nm, leakage can be 40–60% of
total power. This is why HVT cells and power gating are so important at these nodes.

Q: What is a D flip-flop? Explain setup and hold time.

A: A D flip-flop is an edge-triggered memory element. On every rising clock edge, it captures the value at its D input
and presents it at Q. Setup time Tsu is the minimum time the D input must be stable before the clock edge to ensure
correct capture. Hold time Th is the minimum time D must remain stable after the clock edge. If setup is violated, the
FF may not capture the correct value and could go metastable — entering an undefined intermediate voltage state. If
hold is violated, the new data overwrites the just-captured data before it is safely stored.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 67


Section 2: Logic Synthesis

Q: What is logic synthesis and what are its inputs and outputs?

A: Logic synthesis converts RTL (Verilog or VHDL) into a technology-mapped gate-level netlist. The inputs are: the
RTL source files, SDC timing constraints specifying the target frequency and other requirements, and the standard
cell library (.db or .lib files characterised for the target PVT corner. The outputs are: a gate-level Verilog netlist, an
updated SDC file, and reports for timing, area, and power. In my work I used Synopsys Design Compiler and DC
NXT.

Q: What is an SDC file and what are the key commands in it?

A: SDC stands for Synopsys Design Constraints. It is the universal timing constraint language used by synthesis,
place-and-route, and PrimeTime. Key commands are: create_clock to define the clock frequency;
set_clock_uncertainty to model jitter and skew margin; set_input_delay and set_output_delay to constrain interface
paths; set_false_path for paths that are never sensitised such as CDC crossings and async resets;
set_multicycle_path for paths that need more than one cycle; set_max_transition for slew limits; and set_driving_cell
and set_load for interface modelling.

Q: What is a false path and a multicycle path? When do you use them?

A: A false path is a timing path that is never exercised in real operation, so STA should ignore it. Common examples
are: asynchronous reset paths, clock domain crossing paths between unrelated clocks, and configuration bits that
are set once at startup. I set these with set_false_path in SDC. A multicycle path is a path where the logic
intentionally takes more than one clock cycle to settle because the logic is too complex or the operation is not
needed every cycle — like a divider or multiplier. I set set_multicycle_path N -setup AND always pair it with
set_multicycle_path N-1 -hold, otherwise the hold check is performed at the wrong clock edge.

Q: What is compile_ultra in Design Compiler? What does it do differently from compile?

A: compile_ultra is Design Compiler's highest-quality compilation mode. Compared to basic compile, it uses more
sophisticated algorithms: it performs restructuring to reduce logic levels on critical paths, applies timing-driven
technology mapping, does more aggressive multi-Vt optimisation, and can perform register retiming to move
flip-flops across combinational logic to balance path delays. Key flags I use are -gate_clock to auto-insert clock
gating for power saving, -scan to prepare the netlist for DFT, and -no_autoungroup to preserve hierarchy for easier
ECOs later.

Q: How do you check if synthesis met timing? What do you look for in the report?

A: I run report_qor which gives a summary of the worst negative slack and total negative slack. The key metrics are
WNS — Worst Negative Slack — which must be zero or positive for timing closure, and TNS — Total Negative Slack
— which must be zero meaning no paths are failing. I also run report_timing -delay_type max to see the actual worst
setup path, which shows the complete breakdown of each stage: clock latency, each cell's delay, each wire's delay,
and the final slack. If WNS is negative I look at which cell or wire is contributing most delay and address that first.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 68


Section 3: Floorplanning & Power Planning

Q: Walk me through your floorplanning process.

A: I start by estimating the die size using the formula: Core Area = Standard Cell Area from synthesis divided by the
target utilisation, which I typically set at 65–70%. I add the macro areas on top. Then I determine the aspect ratio
based on package constraints and dominant macro shapes. For macro placement, I follow the rule of placing macros
along the die edges to leave the centre open for standard cell routing. I group related macros together — for
example, placing an SRAM next to its controller. I align all macro corners to the routing grid and add halos of
typically 10–15µm around each macro. In my RISC-V project at 32nm, I placed hard macros containing
approximately 25,000 cells using this methodology.

Q: What is utilisation in floorplanning? What happens if it's too high?

A: Utilisation is the fraction of the core area occupied by standard cells. For example, 70% utilisation means 70% of
the core area is used by cells, leaving 30% as white space for routing, buffer insertion, and hold fixing. If utilisation is
too high — above 75–80% — there is insufficient routing headroom. This causes congestion, where the router
cannot fit all required wires, leading to DRC violations and unrouted nets. The fix is to increase the core size, which
reduces utilisation. I typically target 65–70% for complex designs.

Q: What is a power distribution network and how is it structured?

A: The PDN is the hierarchy of metal structures that delivers VDD and VSS to every transistor in the design. Starting
from the top: the chip has bond pads or flip-chip bumps connected to the package. Inside the chip, a thick core ring
on the top metal layers (typically M8 or M9) runs around the core perimeter. From the ring, horizontal and vertical
power straps on intermediate metals (M5 to M8) carry current through the core. At the standard cell level, thin VDD
and VSS rails on M1 run inside every placement row. I create the PDN in ICC-II using create_pg_ring and
create_pg_mesh commands, followed by connect_pg_net to connect all cell power pins.

Q: What is IR drop and how do you fix it?

A: IR drop is the voltage loss along the resistance of the power grid, following Ohm's law V = I × R. A cell receiving
less than nominal VDD operates more slowly, which translates into timing violations. The acceptable limit is typically
less than 5% of VDD for static IR drop — so less than 40mV on a 0.8V supply. I fix IR drop by first identifying the
hot-spot location from the Voltus IR map. The most effective fix is to add more VDD or VSS metal straps in the
affected area, reducing the grid resistance. I also insert decoupling capacitor cells near high-switching regions to
absorb dynamic current surges. If a single area has very high activity, spreading the cells also helps.

Q: What is electromigration and how is it avoided?

A: Electromigration is the gradual physical displacement of metal atoms caused by momentum transfer from
conducting electrons at high current density. Over years of operation, this causes voids — open circuits — or hillocks
— short circuits — in the metal wires. It is governed by Black's equation where the mean time to failure is inversely
proportional to current density squared. The foundry specifies a maximum current density Jmax for each metal layer.
I avoid EM violations by ensuring no wire exceeds Jmax. Fixes include widening the wire (more width = lower current
density), using double or quad via arrays instead of single vias, and applying NDR rules with double width on clock
nets that carry continuous high-frequency switching current.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 69


Section 4: Placement

Q: Describe the placement flow in ICC-II.

A: Placement in ICC-II happens in three stages. First, global placement uses an analytical force-directed method to
minimise total estimated wirelength across all nets. Cells may overlap at this stage. Second, legalisation moves cells
to the nearest legal placement row and site, resolving all overlaps while preserving the global placement distribution.
Third, detailed placement performs local perturbations — cell swapping, sliding, and flipping — to improve timing and
congestion beyond what global placement achieves. I run all three with the single command place_opt, which also
runs timing analysis throughout to guide cells on critical paths closer together.

Q: What is congestion and how do you reduce it?

A: Congestion occurs when the routing demand in a local area exceeds the available routing capacity. I identify it
using the GRC — Global Routing Congestion — map, where red tiles indicate overflow. Overflow means some nets
cannot be routed within that tile, which will cause DRC violations. To fix congestion I first try reducing the local cell
density target — setting max_density to 60% in the congested region so place_opt spreads cells out. I add
placement blockages near macros to prevent cells from crowding into areas where routing channels are obstructed.
If congestion is severe I may need to revisit the floorplan — move a macro to open a routing channel, or increase the
core size. I verify the fix by re-running global routing and checking that overflow is zero.

Q: What are placement blockages and when do you use each type?

A: There are three main types. A hard blockage prevents any standard cell from being placed in a region — I use
these under hard macros, IO pads, and analog blocks. A soft blockage tells the placer to prefer not to place cells
there, but allows it during legalisation if there is no other space — I use these near macro edges. A partial or density
blockage limits cell density to a percentage — for example 50% — in a region, which reduces congestion near
macros without completely blocking placement. I also use buffer blockages specifically to prevent clock tree buffers
from being placed inside macro halos, which is important for clean CTS.

Q: What is scan reordering and why is it important?

A: After synthesis, the scan chain connects flip-flops in their synthesis order, which is typically alphabetical or
hierarchical. After placement, these FFs may be scattered across the die, creating very long scan routing wires that
consume routing resources and worsen congestion. Scan reordering re-stitches the scan chain after placement in
geometrical nearest-neighbour order, so each FF's scan output drives the physically closest FF's scan input. This
typically reduces scan routing wirelength by 30 to 50%. I always run compile_scan in ICC-II after place_opt and
before routing.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 70


Section 5: Clock Tree Synthesis

Q: What is CTS and why is it needed?

A: CTS — Clock Tree Synthesis — builds a balanced buffered tree to distribute the clock to all flip-flops with minimal
skew. Without CTS, a single clock wire driving thousands of FFs would have different arrival times at each FF due to
wire RC. The skew could be several nanoseconds — larger than the combinational logic delay — making timing
closure impossible. CTS inserts a hierarchy of buffers to equalise the clock arrival time at all sinks. I run CTS in
ICC-II using clock_opt, which performs CTS followed by post-CTS optimisation to fix hold violations introduced by
the real clock latency.

Q: Define clock skew, insertion delay, and clock uncertainty.

A: Clock skew is the difference in clock arrival time at two flip-flops. For example if FF1 receives the clock at 0.50ns
and FF2 at 0.55ns, the skew is 50ps. My target is less than 50 to 150ps for high-frequency designs. Insertion delay is
the total propagation delay through the clock tree buffers from the clock port to the FF clock pin — typically 300ps to
2ns. Clock uncertainty is modelled in SDC with set_clock_uncertainty and accounts for jitter from the PLL plus the
residual skew margin after CTS. I typically use 100ps setup uncertainty and 50ps hold uncertainty.

Q: Why do hold violations appear after CTS and how do you fix them?

A: Before CTS, the STA tool assumes ideal clocks with zero latency. After CTS, real clock latency values are
back-annotated to all flip-flops. For short data paths between adjacent flip-flops, the data may arrive at the capture
FF before the capture clock edge has passed — causing a hold violation. This is because the data path is so short
relative to the clock latency difference. Hold violations are fixed by inserting delay buffers on the short data path,
which adds propagation delay to ensure data does not arrive too early. ICC-II clock_opt does this automatically
during post-CTS optimisation. I also swap cells to HVT to add more delay on the shortest paths.

Q: What is an Integrated Clock Gate (ICG) and why is a plain AND gate not used?

A: An ICG is a glitch-free clock gating cell that stops the clock from reaching idle flip-flops, eliminating their dynamic
power consumption. A plain AND gate cannot be used because if the enable signal changes while the clock is HIGH,
it creates a spurious short clock pulse — a glitch — that could cause incorrect FF captures. The ICG contains a D
latch that samples the enable signal on the LOW phase of the clock and holds it stable throughout the HIGH phase
before passing it to the AND gate. This ensures the enable is only presented at the safe time. In my projects, clock
gating coverage was over 80%, which is typical for low-power designs.

Q: What is useful skew and how does it help timing closure?

A: Useful skew is the intentional introduction of a skew between launch and capture flip-flops to improve path timing.
For setup improvement, I delay the capture FF clock arrival by adding extra buffers to its clock branch. This gives the
data more time to propagate — it is equivalent to increasing the clock period for that specific path. The improvement
equals the skew I introduce. However I must be careful because adding useful skew to one path can worsen hold
timing on adjacent paths sharing the same clock branch. The CTS tool optimises useful skew globally to maximise
overall slack improvement without creating new violations.

Q: What NDR rules do you apply to clock nets and why?

VLSI Physical Design Master Reference · Shivani Shetkar · Page 71


A: I always apply Non-Default Routing Rules of double-width and double-spacing — 2W/2S — to all clock nets. The
double width reduces wire resistance, which lowers IR drop along the clock net and reduces electromigration risk
since clock nets carry continuous high-frequency switching current. The double spacing increases the distance to
adjacent wires, reducing capacitive coupling and crosstalk noise. This is critical because any noise-induced glitch on
a clock net would cause incorrect data capture across an entire bank of flip-flops. I create the NDR with
create_routing_rule CLK_NDR -multiplier_width 2 -multiplier_spacing 2 in ICC-II.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 72


Section 6: Routing

Q: Describe the routing flow in ICC-II.

A: I run route_opt which performs the complete routing flow in one command. Internally it executes: global routing
which assigns nets to GCell channels without exact tracks; track assignment which assigns exact metal tracks within
each GCell; detailed routing which assigns exact layer and coordinates while enforcing all DRC rules;
search-and-repair which is an iterative pass to fix remaining DRC violations; via optimisation which replaces single
vias with double vias for EM and yield; and signal integrity optimisation which adjusts routing to reduce crosstalk on
critical nets. After routing I run check_routes to verify zero DRC violations before sign-off.

Q: What is the antenna effect and how do you fix it?

A: The antenna effect occurs during plasma etching in fabrication. Long metal wires act as antennas and accumulate
charge from the plasma. If the gate oxide at the end of the wire has no discharge path — because the drain or
source is not yet connected at that fabrication step — the accumulated charge tunnels through the thin gate oxide
and damages it permanently. The antenna ratio is the cumulative wire and via area above the gate divided by the
gate oxide area, and the foundry specifies a maximum allowed ratio. I fix antenna violations in three ways: inserting
an antenna diode at the gate pin which provides a safe discharge path; adding a wire jumper to route the wire up to a
higher metal layer which resets the antenna counter; or configuring the router to limit wire length per layer.

Q: What is crosstalk and how does it impact timing?

A: Crosstalk is capacitive coupling between adjacent parallel wires. When an aggressor net switches, it induces a
voltage change on the victim net through their mutual coupling capacitance. The induced noise is approximately Cc
divided by Cc plus Cvictim, multiplied by the aggressor's voltage swing. For timing, the most damaging case is when
the aggressor and victim switch in opposite directions — the coupling opposes the victim's transition, making it
slower. This is called out-of-phase crosstalk delay and it adds to the setup path delay. I fix it by increasing spacing
between aggressor and victim, adding shield wires, or rerouting onto different layers.

Q: What is DRC and what are the most common violations?

A: DRC — Design Rule Check — verifies that all physical shapes in the layout satisfy the foundry's manufacturing
constraints. Common violations include: minimum spacing where two wires on the same layer are too close together;
minimum width where a wire is too narrow; enclosure where a via is not surrounded by enough metal on all sides;
minimum area where a metal shape is too small; and antenna violations. I run Calibre DRC using the
foundry-supplied rule deck to identify all violations. The tool produces a results file I open in the RVE GUI to navigate
to each violation location in the layout and fix it.

Q: What is LVS and what errors can it report?

A: LVS — Layout versus Schematic — extracts the netlist from the GDS layout and compares it to the reference
netlist from synthesis. It verifies that the physical layout represents the correct circuit. LVS can report: shorts — two
nodes connected in layout that should not be, such as a VDD-to-VSS short which is critical; opens — a missing
connection; extra devices present in layout but not in the netlist; missing devices in netlist but not in layout; and
parameter mismatches where a transistor has wrong dimensions. I run Calibre LVS for production sign-off.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 73


Section 7: Static Timing Analysis — Deep Dive

Q: Explain the setup timing check with the full equation.

A: For a register-to-register path, the data arrival time equals the launch clock edge time plus the launch clock
latency plus the data path delay — which includes Tcq of the launch FF plus all combinational cell delays and wire
delays. The data required time equals the capture clock edge time plus the capture clock latency minus the setup
time Tsu minus the clock uncertainty. Setup slack equals required time minus arrival time, and it must be zero or
positive. In equation form: Slack = Tclk + Tcap_latency - Tlaunch_latency - Tdata - Tsu - Tuncertainty. Positive slack
means the path passes. Negative slack is a violation that must be fixed.

Q: What are WNS and TNS? How do you use them to track ECO progress?

A: WNS is the Worst Negative Slack — the single most negative slack value across all timing paths. It represents the
hardest path to fix. TNS is the Total Negative Slack — the sum of all negative slacks across all failing paths. TNS
gives a measure of the overall design health. When tracking ECO progress, I monitor both. Fixing the WNS path
improves WNS directly. But if TNS is large, it means many paths are failing and I need to work systematically
through all violating endpoints, not just the single worst path. Timing is closed when WNS reaches zero or above and
TNS equals zero.

Q: What is MMMC and why is single-corner STA not sufficient?

A: MMMC stands for Multi-Mode Multi-Corner analysis. It runs STA simultaneously across all operating modes —
such as functional, test, and low-power — and all PVT corners. Single-corner analysis is insufficient because a path
that passes at the typical corner may fail at the slow-slow corner where transistors are slower and temperature is
higher. Similarly, hold violations that are hidden at the slow corner may appear at the fast-fast corner where
minimum path delays are shorter. For sign-off I always use at minimum a func_setup view with the SS slow-slow
corner for setup checking and a func_hold view with the FF fast-fast corner for hold checking.

Q: What is OCV and AOCV? How do they affect timing analysis?

A: OCV — On-Chip Variation — models the fact that cells on different parts of the die experience slightly different
process, voltage, and temperature conditions. With flat OCV, I apply a derating factor to all cells: late derating — for
example 1.05 — makes cells on the launch path appear 5% slower, while early derating — 0.95 — makes cells on
the capture path appear 5% faster. This is pessimistic because deep paths with many logic stages statistically
average out their variations. AOCV, Advanced OCV, addresses this by applying depth-based and distance-based
derating: shallow paths get larger derating and deep paths get smaller derating. This is more accurate and recovers
unnecessary pessimism, allowing tighter timing closure.

Q: What is CPPR and why must it always be enabled?

A: CPPR stands for Clock Path Pessimism Removal. In OCV analysis, the launch and capture clock paths both
share a common segment from the clock root to their divergence point. Without CPPR, this shared segment is
derated twice — once as late for the launch path and once as early for the capture path. This is physically impossible
because the same wire segment cannot simultaneously be both fast and slow. CPPR removes this double-derating
by crediting back the difference. The credit is typically 10 to 30 picoseconds. Not enabling CPPR forces unnecessary
ECO work to fix violations that are actually false pessimism. I always enable it with set_app_var
timing_remove_clock_reconvergence_pessimism true.

Q: How do you approach timing closure when WNS is -200ps?

VLSI Physical Design Master Reference · Shivani Shetkar · Page 74


A: First I run report_qor to see WNS, TNS, and the number of failing endpoints. Then I take the worst path with
report_timing -path_type full_clock to see the complete path breakdown. I identify whether the delay is dominated by
cell delay or wire delay. For cell delay, I upsize the driver or swap it to LVT. For wire delay, I move cells closer
together or insert a buffer to break the long wire. I also check the clock skew contribution — if the capture clock
arrives earlier than the launch clock I can apply useful skew to delay it. After each ECO I re-route changed nets,
re-extract SPEF, and re-run PrimeTime to verify improvement. I continue iterating until WNS reaches zero and TNS
equals zero.

Q: What is SI-aware STA and when is it needed?

A: SI-aware STA models the effect of crosstalk on timing delay. Standard STA ignores coupling capacitors between
adjacent nets, using only the self-capacitance from SPEF. SI-aware STA includes coupling caps and computes the
worst-case scenario where an aggressor net switches in the opposite direction to the victim, adding extra delay to
the victim's transition. This is critical at 28nm and below where coupling capacitance can be 30 to 50% of total net
capacitance. I enable it in PrimeTime with set_delay_calculation -si_mode ARNOLDI and identify the worst crosstalk
paths with report_si_bottleneck -cost_type delta_delay.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 75


Section 8: Low Power Design

Q: What is UPF and what does it define?

A: UPF stands for Unified Power Format, standardised as IEEE 1801. It is a specification language that defines the
power intent of a multi-voltage design. UPF defines: power domains which group logic blocks sharing the same
supply; supply ports and nets which describe the voltage sources; power states for each domain such as ON, OFF,
or RETENTION; isolation strategies which specify what happens to outputs when a domain powers down; level
shifter strategies for signals crossing between domains at different voltages; and retention strategies for flip-flops
that must preserve their state across a power-down. In my Opensparc FPU project I worked with two power domains
at 14nm using UPF.

Q: What is a level shifter and when is it needed?

A: A level shifter converts a logic signal from the voltage level of one power domain to that of another. It is needed at
every signal crossing between domains operating at different supply voltages. For example if a 0.8V domain drives a
signal into a 1.0V domain, the 0.8V HIGH level may not meet the 1.0V domain's input high threshold VIH. The level
shifter uses transistors powered by both supply domains to correctly shift the signal level. There are two types: LH
for low-to-high conversion, which uses PMOS transistors from the higher supply, and HL for high-to-low, which uses
the lower supply. The level shifter must be powered by both domains simultaneously.

Q: What is an isolation cell and why is it needed?

A: An isolation cell is required at the output of a power domain that can be switched off. When the domain is
powered down, its flip-flops lose their state and their outputs float to an undefined voltage. If these floating signals
propagate into the always-on domain, they create X-propagation which corrupts the logic of the live domain. The
isolation cell clamps the output to a defined safe value — either logic 0 or logic 1 — when activated by an isolation
enable signal. The isolation cell is powered by the always-on domain so it remains functional even when the source
domain is off. A critical rule is that isolation must be activated BEFORE the domain is powered off and deactivated
AFTER power is stable.

Q: What is power gating and how is it implemented physically?

A: Power gating cuts the power supply to an idle block to eliminate its leakage current. It is implemented using
header switches — large PMOS transistors inserted between the primary VDD and the virtual VVDD of the block.
When the sleep signal makes the PMOS gate HIGH, the switch is OFF and the block has no power. When sleep is
LOW the switch is ON and VVDD equals VDD. In physical design, the header switches form a row at the top of the
power-gated cell region. They must be sized to handle the peak current of the entire domain — typically one switch
per 50 to 100 microns of standard cell row. Decoupling caps near the switches absorb the inrush current during
wake-up to prevent VDD bounce.

Q: Explain DVFS and its impact on physical design.

A: DVFS — Dynamic Voltage and Frequency Scaling — reduces both the supply voltage and clock frequency when
maximum performance is not needed. Since dynamic power scales as VDD squared times frequency, reducing both
by 20% saves approximately 49% of dynamic power. From a physical design perspective, the design must meet
timing at ALL voltage-frequency operating points, not just the maximum. This means MMMC must include corners
for all VF operating points. The power grid must deliver adequate current at the highest frequency without IR drop
violations, and the timing must not fail at lower voltages where cells are slower. In my experience the binding
constraint is usually the lowest-voltage highest-frequency corner.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 76


Section 9: DRC, LVS & Physical Verification

Q: What tool do you use for DRC and LVS sign-off?

A: For production sign-off, Calibre from Siemens EDA is the industry standard. The foundry provides an official
Calibre DRC rule deck in SVRF format that encodes all their manufacturing constraints. I run calibre -drc -hier -turbo
with the GDS and the rule deck. The results are viewed in the Calibre RVE GUI which shows violations by rule
name, layer, and exact coordinates. For LVS I run calibre -lvs -hier. In my open-source work and for learning I use
KLayout with the Skywater 130nm DRC script and Netgen for LVS, which are free and available on GitHub.

Q: What is the difference between DRC and ERC?

A: DRC — Design Rule Check — verifies geometric and physical manufacturing constraints: minimum wire width,
minimum spacing, enclosure rules, area rules, density rules, and antenna rules. These ensure the foundry can
reliably print and etch the designed patterns. ERC — Electrical Rule Check — verifies circuit-level electrical
correctness: no floating gate terminals that would have undefined states, no disconnected power or ground pins,
proper guard ring spacing to prevent latch-up, and correct N-well connections. DRC can pass while ERC fails — for
example a cell with its VDD pin not connected to the power grid may be DRC-clean but is an ERC failure.

Q: What is metal fill and why is it required?

A: Metal fill consists of small floating metal shapes inserted into areas of the die where a metal layer has insufficient
density. It is required because the CMP — Chemical Mechanical Planarisation — step in fabrication polishes each
metal layer to be flat. CMP works well only when the metal density is uniform. If density is too low in some areas,
those areas polish faster — called dishing — making them thinner than nominal, increasing resistance and causing
reliability failures. If density is too high, neighbouring areas are over-polished. Foundries specify a min and max
density window — typically 20% to 80% — for each layer. ICC-II inserts fill automatically with create_metal_fill.

Q: What is formal verification and when is it run?

A: Formal verification uses mathematical proof — specifically SAT solving and BDD-based equivalence checking —
to prove that two circuit representations are functionally identical. It is run at every stage where the netlist is modified.
After synthesis: the gate netlist must be equivalent to the RTL. After DFT insertion: the scan-inserted netlist must be
functionally equivalent to the pre-DFT netlist. After every ECO: the post-ECO netlist must match the pre-ECO netlist.
Before tape-out: the final post-route netlist must match the reference. I use Synopsys Formality for this. Unlike
simulation which only checks specific scenarios, formal verification covers all possible inputs.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 77


Section 10: Design For Test (DFT)

Q: Explain scan chain architecture and the test sequence.

A: In scan design, all standard flip-flops are replaced with scan flip-flops. A scan FF has an extra scan input SI and a
scan enable SE. When SE is HIGH — scan mode — the MUX passes SI to the FF input, forming a long shift register
chain. When SE is LOW — functional mode — the FF captures its D input normally. The test sequence has three
steps: shift-in — SE=1, apply N clock pulses to load the test pattern into all N FFs in the chain; capture — SE=0,
apply one functional clock — all FFs simultaneously capture the logic response of the combinational logic; shift-out
— SE=1, shift the captured response out to scan_out and compare to the expected pattern to detect faults.

Q: What is the PD engineer's responsibility in DFT?

A: The PD engineer has several DFT-related responsibilities. First, after placement, I run scan reordering to re-stitch
the scan chain in geometrical order, reducing scan routing wirelength by 30 to 50%. Second, I ensure the scan input
and scan output pins are connected to IO pads accessible from the tester. Third, I must verify that the DFT test mode
timing is satisfied — for at-speed testing, the test clock must meet the same timing as the functional clock, so the
clock routing must support functional-frequency operation. Fourth, for MBIST controllers I must place them physically
adjacent to their target SRAM to minimise routing to the memory ports.

Q: What is MBIST and how does it differ from scan-based SRAM testing?

A: MBIST — Memory Built-In Self Test — is a dedicated hardware controller that tests embedded SRAMs by
applying March algorithms directly through the memory's normal read-write interface. It differs from scan-based
testing in that SRAM cells are not scan-accessible — you cannot shift a test pattern into individual memory bit cells
through a scan chain because there are millions of them and they are accessed through address/data ports, not
individual pins. The MBIST controller writes and reads specific patterns — for example the March C minus algorithm
— to detect stuck-at, transition, and coupling faults in every bit cell. The result is a single PASS or FAIL output.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 78


Section 11: Clock Domain Crossing (CDC)

Q: What is metastability and how does a two-flop synchroniser help?

A: Metastability occurs when a flip-flop samples a signal at the exact moment it is transitioning, violating setup or
hold time. The FF enters an unstable equilibrium state where its output is neither HIGH nor LOW but somewhere in
between. Given enough time it will resolve to a valid state, but if the resolution takes longer than one clock period,
the metastable state propagates and corrupts downstream logic. A two-flop synchroniser addresses this by giving
the first FF a full clock period to resolve before the second FF samples it. The first FF may go metastable, but the
probability of it remaining metastable for a full clock period is exponentially small. The second FF then samples a
valid resolved value.

Q: Why can't you use a two-flop synchroniser for a multi-bit bus?

A: A two-flop synchroniser works only for single-bit signals because it samples at an arbitrary time with respect to the
source domain. For a multi-bit bus, each bit may be sampled at a different phase of its transition, so some bits are
captured with their new values and some with their old values, creating a corrupted intermediate combination that
was never a valid state. For example on a 4-bit counter going from 0111 to 1000, all four bits change and the
synchroniser might capture 1111 — a value that never existed. The correct solutions are Gray-coded pointers —
which change only one bit at a time — or an asynchronous FIFO for bulk data transfer.

Q: What SDC constraints are needed for CDC paths?

A: CDC crossing paths must be false-pathed in the SDC with set_false_path -from [get_clocks clk_a] -to [get_clocks
clk_b]. This tells STA not to perform timing checks across the crossing because the clocks are asynchronous and
have no timing relationship. Without this, STA would report false violations on these paths since it would try to
enforce setup and hold constraints between clocks that have no fixed phase. The actual safety of the crossing is
ensured by the synchroniser circuit, not by timing constraints. I also run SpyGlass CDC or Cadence JasperGold
CDC to verify all crossing signals have proper synchronisation structures.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 79


Section 12: Advanced Nodes & FinFET

Q: What is a FinFET and how does it differ from a planar MOSFET?

A: A FinFET uses a vertical fin of silicon as the channel, and the gate wraps around three sides of the fin — top and
both sidewalls. This tri-gate structure gives the gate much better electrostatic control over the channel compared to a
planar MOSFET where the gate only controls one face. Better gate control means the transistor can be turned off
much more decisively at short gate lengths, dramatically reducing leakage. The subthreshold slope approaches 65
mV per decade, close to the theoretical ideal of 60. Drive strength in FinFET is quantised — it is set by the number of
fins rather than by continuous width sizing. FinFET is used at 20nm and below.

Q: You worked at 14nm. What DRC challenges did you face at that node?

A: At 14nm the DRC deck is significantly more complex than at 32nm. The minimum metal pitch on M1 is
approximately 48nm compared to about 90nm at 32nm, leaving very little margin for error. I encountered more
end-of-line spacing rules, via enclosure rules with directional requirements, and multi-patterning colour constraints
where adjacent wires on the same layer must be assigned to different lithography masks. Fin direction rules required
all standard cells to align their fins in the same global orientation. The antenna ratio limits are also stricter at 14nm
because the gate oxide is thinner. Managing all of these required very careful cell placement near macro edges
where routing channels are narrow.

Q: What is multi-patterning and how does it affect routing?

A: Multi-patterning is a technique used at sub-20nm nodes where the minimum feature pitch is smaller than what a
single lithography exposure can print. The layout for one metal layer is split across multiple masks — for example
LELE uses two litho-etch-litho-etch cycles with two separate masks. Adjacent wires on the same layer must be
placed on different masks, which the router refers to as different colours. A colouring conflict occurs when three
mutually adjacent wires exist because they cannot all be assigned valid different colours with only two masks. The
router must avoid these conflicts during routing. At 14nm I needed to be aware of M3 and M4 patterning constraints
during routing to avoid colouring DRC violations.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 80


Section 13: Behavioural & Project-Based Questions

Q: Tell me about your most challenging project in physical design.

A: My most challenging project was the Opensparc FPU implementation at 14nm with two power domains. The FPU
had a clock period of 1.66ns and two voltage domains requiring level shifters and isolation cells at the boundaries.
The challenge was closing timing at 14nm where the DRC rules are much stricter and routing resources are scarcer.
I had significant congestion near the power domain boundary because of the level shifter cells which are physically
larger than standard cells. I resolved it by increasing the halo around that region, spreading cells into adjacent areas,
and reordering the power domain boundary in the floorplan to give more routing clearance. I also had hold violations
after CTS in the domain with higher latency which I fixed by inserting delay buffers during clock_opt.

Q: Describe a timing violation you faced and how you solved it.

A: In my RISC-V project at 32nm I had a setup WNS of negative 120ps on a path through the ALU. Using
report_timing with full_clock path type, I identified that 65% of the slack violation came from a long wire between the
adder output register and the result mux. The wire was approximately 800 microns — it had been placed far away
because of a macro in between. I first tried inserting a buffer to break the wire, which recovered 60ps. Then I moved
the mux cell closer to the adder register using set_cell_location, reducing wire delay further and recovering another
70ps. After re-routing and re-extracting SPEF, the final slack was positive 15ps — timing closed.

Q: How do you handle pressure when timing is not closing near tapeout?

A: I prioritise systematically. First I check whether any violations are false positives by verifying CPPR is enabled and
AOCV derating is correctly configured — sometimes false pessimism accounts for some of the reported violations.
Then I focus on paths that contribute most to TNS, not just the single worst WNS path, because fixing many
moderate violations often reduces TNS faster than chasing one very hard path. I communicate clearly with the team
— reporting the current WNS, TNS, and my fix rate per day — so the project manager can make an informed
decision about schedule. I also flag any violations that may require a floorplan change early, because those take the
most time to implement and re-verify.

Q: Why did you choose Physical Design over RTL design or verification?

A: I am drawn to Physical Design because it combines multiple disciplines simultaneously. Every decision I make —
cell placement, clock routing, power grid — directly impacts timing, power, area, and reliability all at once. I enjoy the
problem-solving aspect: tracking down why a specific path is failing, understanding whether it is cell delay, wire
delay, or skew, and applying the targeted fix. During my training at Maven Silicon I had the opportunity to work
through the full RTL-to-GDS flow on four projects, and I found the physical implementation stage the most engaging.
The work is very concrete — you can visualise the circuit physically and see exactly what is happening.

Q: Where do you see yourself in 3 years in VLSI Physical Design?

A: In three years I want to be working as a mid-level Physical Design Engineer independently leading blocks of 500K
to 1M cells through full PD flow including sign-off. I want to deepen my expertise in advanced nodes — specifically
7nm and below — and become proficient in power grid analysis using Voltus or RedHawk. I am actively building my
skills in Calibre DRC and LVS and developing an OpenLane portfolio on GitHub to demonstrate hands-on capability
beyond my training projects. Longer term I am interested in physical design methodology — developing scripts and
flows that improve engineering efficiency across a team.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 81


Section 14: Rapid-Fire Technical Questions (Short Answers)

These are quick one-to-two sentence answers for common short questions in interviews:

Q: What is the difference between a latch and a flip-flop?

A: A latch is level-sensitive — it is transparent when its enable is active and captures continuously. A flip-flop is
edge-triggered — it captures only at the active clock edge and holds value otherwise. Flip-flops are used almost
exclusively in synchronous digital design.

Q: What is clock jitter?

A: Clock jitter is the cycle-to-cycle variation in clock period caused by PLL phase noise and power supply noise. It is
modelled as part of clock uncertainty in STA.

Q: What is a via and what DRC rules apply to it?

A: A via is a vertical metal connection between two adjacent metal layers. DRC rules require: minimum enclosure
(metal must surround the via on all sides), minimum via-to-via spacing, and minimum area of the via itself.

Q: What is the difference between setup margin and hold margin?

A: Setup margin is how much extra time the data arrives before the setup window closes — positive = margin to
spare. Hold margin is how much extra time the data waits after the capture edge before the hold window expires.

Q: What is filler cell insertion and why is it done?

A: Filler cells are inserted in gaps between standard cells to complete the N-well continuity and supply rail continuity
across rows. Without fillers, the N-well breaks and power rails are interrupted.

Q: What is the difference between global routing and detailed routing?

A: Global routing assigns each net to a sequence of GCell channels without exact tracks — it is fast and used for
congestion estimation. Detailed routing assigns exact layer, track, and coordinates while enforcing all DRC rules.

Q: What is SPEF and how is it used?

A: SPEF — Standard Parasitic Exchange Format — contains extracted resistance and capacitance of every routed
net. It is back-annotated to PrimeTime to replace estimated wire delays with actual post-route values for accurate
sign-off STA.

Q: What is Fusion Compiler?

A: Fusion Compiler is Synopsys's unified synthesis-to-implementation tool that integrates Design Compiler and
ICC-II into a single flow, enabling concurrent optimisation of synthesis and physical design for better QoR.

Q: What is the difference between functional power and test power?

A: Functional power is the power consumed during normal chip operation. Test power is the power during scan shift,
which is often higher because all flip-flops toggle every shift clock — activity factor approaches 1.0. This can damage
the chip if not managed with scan power reduction techniques.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 82


Q: What is ECO and what types exist?

A: ECO — Engineering Change Order — is a targeted minimal change to a placed-and-routed design to fix a
problem. Types include timing ECO (fix setup/hold), functional ECO (fix logic bug), power ECO (fix IR drop or EM),
and DRC ECO (fix geometric violations).

Q: What is a half-cycle path?

A: A path between a FF triggered on the rising edge and one triggered on the falling edge. Only half the clock period
is available for the combinational logic, making timing closure twice as hard as full-cycle paths.

Q: Why is utilisation limited to 70–75%?

A: The remaining 25–30% of core area is needed for routing tracks, buffer insertion during CTS and ECO, hold fix
buffers, and metal fill. Higher utilisation leaves insufficient routing headroom, causing DRC violations.

Q: What is a clock mesh versus a clock tree?

A: A clock tree is a hierarchy of buffers with balanced paths to each sink — skew of 50–150ps typical. A clock mesh
is a grid of metal wires shorted at crossings, giving very low skew under 5ps but consuming much more power due to
high capacitance. Meshes are used in CPUs and GPUs.

Q: What is the purpose of decoupling capacitor cells?

A: Decap cells provide a local charge reservoir between VDD and VSS. During simultaneous switching, cells draw a
large instantaneous current. The decap absorbs this surge, preventing a large voltage droop on the local VDD —
reducing dynamic IR drop.

Q: What is PBA versus GBA in STA?

A: GBA — Graph-Based Analysis — tags nodes with worst-case arrivals from all paths simultaneously. Fast but
conservative. PBA — Path-Based Analysis — traces each individual path with actual input slews. Slower but more
accurate. PBA can recover 10–50ps of false pessimism on specific critical paths.

VLSI Physical Design Master Reference · Shivani Shetkar · Page 83


Section 15: Questions to Ask Your Interviewer

Always prepare 3–5 questions to ask. This demonstrates genuine interest and technical depth. Pick the ones most
relevant to the company and role:

Technical depth questions:


• What technology node are your current projects targeting — and do you foresee moving to advanced nodes like
3nm or 2nm in the next product cycle?
• What is your primary PnR tool — ICC-II or Innovus — and how mature is your internal flow and scripting
infrastructure?
• How do you handle multi-power domain designs in terms of UPF verification and power-gating sequencing?
• What does the sign-off process look like here — do you run MMMC with AOCV, and how many corners do you
sign off on?
• What is the typical block size a junior-to-mid engineer owns end-to-end here?

Team and growth questions:


• How is the physical design team structured — do engineers own full blocks or work in specialist groups like CTS,
routing, and sign-off?
• What does the onboarding process look like — is there a mentor assigned, and how long before engineers are
running their own blocks?
• How do you approach knowledge sharing — do you have regular design reviews or internal tech talks?
• What would success look like in this role after 6 months?

Technology and tools:


• Are you primarily a Synopsys shop (ICC-II, PrimeTime, Formality) or do you use Cadence tools as well?
• How automated is your PD flow — do you have a mature run-script infrastructure or is it largely manual?
• Do your engineers write their own Tcl automation scripts, or is that handled by a CAD/methodology team?

VLSI Physical Design Master Reference · Shivani Shetkar · Page 84

You might also like