VLSI PD Master Reference Guide
VLSI PD Master Reference Guide
CMOS Basics · Synthesis · Floorplan · Placement · CTS · Routing · STA · Low Power · DRC/LVS · FinFET · DFT · CDC ·
ECO · Scripting · Power Analysis
Prepared for: Shivani Shetkar · ASIC Physical Design Engineer · 14nm & 32nm
This is the complete VLSI Physical Design Master Reference — combining foundational study notes (Chapters
1–15), advanced topics (CDC, SI scripting, power analysis, mixed-signal, FinFET), a personalised job-readiness
skills plan, and 130 interview questions with full spoken answers. Use it to study, revise before interviews, and as
an ongoing reference during your UK job search.
Table of Contents
Ch 1 VLSI & CMOS Fundamentals Transistors · NMOS/PMOS · CMOS · Logic gates · Flip-flops
Ch 2 Complete ASIC Design Flow RTL to GDS · All stages · Key file formats
Ch 6 Clock Tree Synthesis Skew · Latency · ICG · Useful skew · Hold violations
Ch 9 Low Power Design UPF · Power gating · Level shifters · Retention · DVFS
Ch 12 Design For Test (DFT) Scan chains · ATPG · MBIST · Boundary Scan · Reordering
A1 Clock Domain Crossing (CDC) Metastability · 2-flop sync · Gray code · Async FIFO
A2 Advanced STA GBA vs PBA · SI-aware STA · Half-cycle paths · CPPR deep dive
A7 Skills & Job Readiness UK interview topics · Skill gaps · Learning plan · GitHub portfolio
S7 STA Deep Dive Setup/hold eqs, MMMC, OCV, CPPR, closure — 7 Q&A;
S8 Low Power UPF, level shifters, isolation, power gating, DVFS — 5 Q&A;
S9 DRC, LVS & Sign-off Calibre, ERC, metal fill, formal verification — 4 Q&A;
VLSI (Very Large Scale Integration) integrates millions to billions of transistors on a single silicon chip. Moore's Law
(1965) observed that transistor count doubles every ~2 years. Modern chips like Apple M2 contain 20 billion
transistors on a 5nm process node.
Intrinsic Si Pure silicon. 4 valence electrons, crystal lattice. Very few free carriers at room temperature.
Poor conductor.
N-type doping Add phosphorus or arsenic (5 valence electrons) → extra free electron → n-type. Used for
NMOS channel.
P-type doping Add boron (3 valence electrons) → missing electron = 'hole' (positive carrier) → p-type.
Used for PMOS channel.
P-N junction Interface between P and N silicon. Allows current in one direction (diode). Basis of all
transistors.
The MOSFET (Metal-Oxide-Semiconductor Field-Effect Transistor) is the fundamental switch. Understanding it is the
foundation of ALL Physical Design.
NMOS Cross-Section
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Gate (G)
n+ ■channel■ n+
■■■■■ ■■■■■
■■■■■■■■■■■■■■■■■■■■
p-substrate (Body B)
Turn ON Vgs > +Vt (gate HIGH) Vgs < -|Vt| (gate LOW)
Speed Faster (electron mobility ~2-3× hole mobility) Slower for same W
Cut-off (OFF) Vgs < Vt. Channel absent. ID = leakage only. Transistor is OFF.
Linear (triode) Vgs > Vt, Vds < (Vgs-Vt). Full channel. Acts like a resistor. ID = µCox(W/L)[(Vgs-Vt)Vds -
Vds²/2].
Saturation (ON) Vgs > Vt, Vds ≥ (Vgs-Vt). Channel pinched off at drain. ID = ½µCox(W/L)(Vgs-Vt)². Max
current.
Subthreshold Vgs slightly < Vt. Weak inversion. Exponential leakage: ID ∝ exp(Vgs/nVT). Source of static
power.
■ PD relevance: larger W → more saturation current → faster switching → shorter cell delay. Driving a cell with low Vt (LVT)
reduces delay but increases leakage. PD engineers choose cell size AND Vt variant to close timing while minimising power.
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
VDD
■■■■■■■ Vout
GND
■ Key CMOS advantage: static power ≈ 0. In steady state, one transistor is always OFF, blocking the DC path from VDD to
GND. Power is consumed only during switching transitions.
A B ■ Y = NOT(A·B) A B ■ Y = NOT(A+B)
0 0 ■ 1 0 0 ■ 1
0 1 ■ 1 0 1 ■ 0
1 0 ■ 1 1 0 ■ 0
1 1 ■ 0 1 1 ■ 0
NAND2 2 series 2 parallel 4 NMOS drives output LOW when both A=B=1
NOR2 2 parallel 2 series 4 PMOS drives output HIGH when both A=B=0
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
D: ■■[ 0 ][ 1 ][ 0 ]■■
Q: ■■■■■■■■[ 0 ][ 1 ][ 0 ]■■
↑ ↑ ↑
|← Tsu →|← Th →|
D: ■■■■■■■■■[stable]■■■■■■■■■■■
CLK: ↑ edge
Hold time (Th) Minimum time D must stay stable AFTER the clock edge. Violation → wrong data captured.
Enforced by hold STA check.
Clock-to-Q (Tcq) Propagation delay from clock edge to Q output valid. ~50–200ps. Adds to the launch-side
data arrival time.
Metastability If Tsu or Th is violated, the FF enters an undefined intermediate voltage state. It eventually
resolves, but the time is unbounded — and may exceed one clock period, causing system
failure.
■ The ENTIRE purpose of Static Timing Analysis (STA) is to ensure: (1) data arrives at every FF before the setup window
closes, AND (2) data does not arrive so fast it violates hold. Every timing fix in PD serves this goal.
SPECIFICATION
PLACEMENT (place_opt)
ROUTING (route_opt)
.sdc Synopsys Design Constraints. Timing constraints (clocks, delays, exceptions). Used by ALL
tools.
.lef Library Exchange Format. Cell physical abstract (pin positions, obstruction layers) +
technology rules (routing layers, pitch).
.ndm New Data Model (Synopsys ICC-II). Unified .lib + .lef + .gds in one file.
.def Design Exchange Format. Snapshot of physical state: die, placements, routing.
.gds / .oasis Binary layout polygon data by layer. Final deliverable to foundry.
.upf Unified Power Format (IEEE 1801). Power domains, isolation, level shifters for multi-VDD
designs.
The standard cell library is the set of pre-characterised logic primitives used by synthesis and PD. Key properties:
• All cells have the SAME HEIGHT (e.g., 7.5µm at 14nm) — they tile perfectly in placement rows
• Variable WIDTH — proportional to function complexity and drive strength
• Timing characterised at multiple PVT corners (SS, TT, FF) → multiple .lib files
• Multiple drive strengths: X1, X2, X4, X8 — same logic, different W/L ratio
• Multiple Vt variants: HVT, SVT, LVT — same logic, different threshold voltage
NAND2_X2_SVT: 2-input NAND, drive strength X2, standard Vt. Characterised at SS (0.72V, 125°C): rise delay =
0.15ns @ (slew=0.05ns, load=0.02pF). Characterised at FF (0.88V, -40°C): rise delay = 0.06ns. The PD tool selects
the worst-case corner for each timing check.
Phase 1: ELABORATION
SDC is the universal language for timing constraints — used by synthesis, PnR, and STA sign-off.
Clock Definition
# Define 500 MHz clock (period = 2ns)
Timing Exceptions
# FALSE PATH: never a real functional path
■ ALWAYS pair setup and hold for multicycle paths. set_multicycle_path 2 -setup without setting hold = incorrect hold analysis.
The hold is by default checked 1 cycle before the last setup edge.
# 1. Library setup
read_verilog design.v
elaborate TOP
current_design TOP
# 3. Apply constraints
source [Link]
# 5. Key reports
# 6. Write outputs
write_sdc [Link]
100K cells. All-SVT leakage = 2.0mW. HVT leakage ≈ 0.4× SVT. LVT leakage ≈ 2.5× SVT. Mix: 60% HVT + 30%
SVT + 10% LVT. Leakage = 60K×0.8µW×0.4 + 30K×0.8µW×1.0 + 10K×0.8µW×2.5 = 19.2 + 24.0 + 20.0 = 0.632mW
→ 68% reduction from 2.0mW!
Floorplanning is the most influential physical design step. A bad floorplan can make timing closure impossible
regardless of other optimisations.
BAD: GOOD:
■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■
■ ■ ■ ■ ■ ■
■ ■ ■ ■ cells ■ ■
Synthesis: std cell area = 800,000 µm². SRAMs = 200,000 µm². Target util = 65%. Core for std cells = 800,000/0.65
= 1,230,769 µm². Total core = 1,230,769 + 200,000 = 1,430,769 µm². Aspect ratio 1:1 → core = 1196µm × 1196µm ≈
1.2mm × 1.2mm. Add IO ring (60µm each side) → die = 1320µm × 1320µm = 1.74mm².
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■
■ ■ ■ctrl■ Cell ■ ■
■ ■ and routing) ■ ■
■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Rules:
PDN Hierarchy:
■ Via arrays
■ Contacts
Transistor source/drain
compile_pg
Static IR drop DC component from leakage current. Always present. Fix: add straps.
Dynamic IR drop Transient voltage droop during simultaneous switching. Peak can be 3× static. Fix: decap
cells.
Hot spot Region with many switching cells drawing simultaneous current. Identified by IR map
colours.
Decap cell Capacitor cell placed near hot spots. Provides local charge reservoir to absorb dynamic
current demand.
■ IR drop causes timing failure indirectly: cell receives lower VDD → slower delay → setup slack worsens. A 5% IR drop at 0.8V
(40mV) can add 4–8% delay to affected cells.
EM = gradual metal atom displacement from high current density → voids (opens) or hillocks (shorts) over years.
where MTF = mean time to failure; J = current density; Ea ≈ 0.9eV (Cu); T = temperature (K).
Jmax limit Foundry specifies max J per layer at temperature. Typical: M1 = 1mA/µm, M8 = 8mA/µm.
Power vs signal EM Power nets: use average current. Signal nets: use RMS current (accounts for duty cycle).
Via EM Current squeeze through small via cross-section. Most vulnerable location. Use double/quad
vias.
Fix: wider wire R ∝ 1/width. Double width → half current density → exponentially longer lifetime.
Fix: redundant via 2×1 or 2×2 via array distributes current across multiple vias.
■ EM is a 10-YEAR reliability concern. Chips must survive 10 years at 125°C junction temperature. Synopsys Voltus and
Cadence RedHawk perform EM sign-off. Calibre PERC also checks EM rules.
Placement assigns physical (x,y) coordinates to every standard cell. It is timing-driven — cells on critical paths are
placed closer together to minimise wire delay. Placement happens in three progressive stages.
1. GLOBAL PLACEMENT
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■
■ all cells ■ ■ ■ ■■ ■ ■ ■ ■
■ stacked at ■ ■ ■■ ■ ■■ ■ ■
■ (0,0) ■ ■■■ ■ ■ ■■ ■ ■■ ■
■ ■ ■ ■ ■ ■■ ■ ■ ■
■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■
2. LEGALISATION
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
3. DETAILED PLACEMENT
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Placement row Horizontal strip of uniform height. All cells in one row are the same height.
Site (x-grid) Minimum x-coordinate unit. Cells placed on integer multiples of site pitch.
Row orientation Alternate rows flipped (FS) so adjacent rows share VDD/VSS rails. Reduces N-well area.
Double-height cell Spans 2 rows. Used for drive strength > 12 or power switch cells. Must start at even row.
Cell flip Mirror cell horizontally when output is on the left — reduces wire length to fanout cells on the
left side.
In TDP (Timing-Driven Placement), the placer assigns positions with timing as the primary objective, not just minimum
wirelength. Critical nets are weighted to pull connected cells together.
3. Placer minimises:
■ ICC-II place_opt runs timing analysis DURING placement using virtual RC models, and iteratively updates net weights to guide
cells into timing-optimal positions.
Congestion = routing demand exceeds routing capacity in a local area. Identified by the GRC (Global Routing
Congestion) map.
GCell overflow > 0 means: some nets CANNOT be routed in that tile.
After synthesis, scan chains are in synthesis order (alphabetical/hierarchical). After placement, these FFs may be
scattered across the die — creating very long scan wires.
■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■
■FF2 ■ ■ ■FF4■■FF5■■FF6 ■
■ ■ ■ ■ (adjacent, short) ■
■ FF3■■■■■■■■■■■■■FF5 ■ ■
set_scan_configuration -chain_count 16
compile_scan
■ Scan reordering reduces scan routing wirelength by 30–50%. Always run AFTER placement, BEFORE routing. Skipping this
step wastes routing resources on long scan wires.
Without CTS, a single clock wire drives thousands of flip-flops directly. The wire resistance and capacitance cause
different arrival times at different FFs — this is clock skew. Skew can easily be several nanoseconds, making timing
closure impossible.
■ / \
(FF2 clock / \ / \
0.8ns ■ ■ ■ ■
■ Clock power is 20–40% of total dynamic power. CTS must balance low skew, low insertion delay, and low power
simultaneously.
Clock Skew Difference in clock arrival time at two FFs. Skew = |Lat_FF1 - Lat_FF2|. Target: < 50–150ps
at 500MHz+.
Insertion Delay Total delay through the clock tree buffers (clock port → FF clock pin). Typical: 300ps–2ns.
The CTS tool builds the tree to equalise this across all FFs.
Source Latency Delay from external clock source to the chip clock port (models PCB trace + PLL). Set via
set_clock_latency -source.
Clock Slew Rise/fall transition time at FF clock pin. Slow slew → high power, jitter susceptibility. Target
< 100–200ps. Controlled by buffer sizing.
Clock Jitter Cycle-to-cycle variation in clock period. Caused by PLL phase noise. Modelled as clock
uncertainty in STA.
Half-cycle path Path between opposite-edge FFs (rising launch, falling capture). Only half a period
available. Very tight timing constraint.
Clock domain Set of FFs driven by the same clock. A chip may have 2–20+ independent clock domains.
CLK
■■ BUF ■■ FF4
CLK
■■■■■■■■■
FF FF FF FF
# ■■ Reports ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■ ■
■ ■
■ clock_opt in ICC-II automatically inserts hold buffers during post-CTS optimisation. Hold must be fixed BEFORE routing —
fixing holds after routing requires re-routing changed nets which disrupts the existing routing.
Useful skew intentionally makes the clock arrive at different times at launch and capture FFs, to improve timing
between a specific pair.
→ 0.10ns improvement!
Setup slack with useful skew = Tclk + (Tcap_lat - Tlaunch_lat) - Tdata - Tsu -
Tuncert
Clock gating stops the clock from reaching a register bank when it is idle. Since P_dynamic ∝ f, stopping the clock
eliminates all dynamic power for idle cells.
EN ■■■ D Q ■■■
CLK ■■■■■■■■■■■■■■
EN=1: clock passes through → FFs receive clock → can capture new data
EN=0: clock blocked → FFs hold state → zero dynamic power for those FFs
ICG enable timing The EN signal must meet setup and hold timing at the ICG input relative to CLK. STA must
check ICG enable paths.
ICG placement ICGs should be placed close to their sink FF cluster to minimise clock tree imbalance.
Auto clock gating Design Compiler inserts ICGs automatically during compile_ultra -gate_clock when register
banks are found.
Glitch prevention The latch in the ICG prevents EN glitches from creating spurious clock pulses — this is why
a plain AND gate is NOT used.
Routing Flow:
1. GLOBAL ROUTING
2. TRACK ASSIGNMENT
3. DETAILED ROUTING
5. VIA OPTIMISATION
Design rules are geometric constraints that ensure the foundry can reliably manufacture the layout. Violating them
causes manufacturing failures. The foundry provides a DRC deck (SVRF language for Calibre).
Minimum Width Each metal shape must be at least W_min wide. Prevents resistive/open connections. e.g.,
(W.M1) M1 min-width = 0.064µm at 14nm.
Minimum Enclosure Metal must surround each via by at least E_min on all sides. Ensures electrical contact
(ENC.V1) despite lithography variation.
Minimum Area Metal shapes must meet a minimum area to avoid thin, high-resistance connections.
(MA.M1)
EOL (End-of-Line) Wire ends need extra spacing to adjacent wires — wire tips are more vulnerable to
Spacing lithography rounding.
Metal Density Each layer must have density between min% and max% for CMP planarisation uniformity.
Antenna Rule Max ratio of wire area to connected gate oxide area. Exceeding it causes gate damage
during plasma etching.
Via Enclosure Metal must extend beyond via edges by a minimum amount on all four sides.
Multi-patterning Adjacent wires on the same patterned layer must have different lithography mask 'colours'.
Colour
LVS verifies the physical layout represents the correct circuit. Calibre extracts transistors and connections from GDS,
then compares to the reference netlist.
LVS Flow:
■ ■
Extracted Netlist ■
■ ■
■■■■■■■■■■■■■■■■■
PASS FAIL
■ ■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■GATE
■■■■■■■[+++++++++++++++++]■■■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■GATE OXIDE
Antenna Ratio = (Wire Area + Via Area above gate) / Gate Oxide Area
Antenna Ratio = Σ(wire area + via area on layers above gate) / gate oxide area
check_antenna
Crosstalk is capacitive coupling between adjacent parallel metal wires. When an aggressor net switches, it induces
voltage noise on the victim net.
■ Cc (coupling capacitor)
SI fixes:
• Increase wire spacing between aggressor and victim (reduces Cc)
• Add shielding wires (VDD or VSS) between aggressor and victim (Cc couples to supply, not victim)
• Reroute aggressor or victim to a different layer or path
• Upsize the victim driver (lower output impedance reduces Cc × Rvictim time constant)
• Apply NDR (wider wire, wider spacing) to critical nets
■ Crosstalk is the dominant sign-off challenge at 28nm and below. At 14nm, coupling capacitance can be 30–50% of total wire
capacitance. Always run PrimeTime SI (set_delay_calculation -si_mode ARNOLDI) for final sign-off.
2W/2S NDR Double minimum width AND double minimum spacing. Applied to all clock nets. Reduces
wire resistance (EM), reduces coupling capacitance (SI).
Shielding Route VDD or VSS alongside a critical net. Eliminates coupling from adjacent aggressors.
Where applied Clock nets: always 2W/2S. PLL output: shielded. High-speed data buses: 2W/2S on
selected layers.
create_routing_rule CLK_NDR \
-multiplier_width 2 -multiplier_spacing 2
STA is an exhaustive, formal method of verifying that every timing path in the design meets its setup and hold
requirements — without simulation. The tool builds a directed timing graph, computes delay along EVERY path
simultaneously, and checks constraints. This covers all 2^N possible input combinations implicitly.
Coverage Exhaustive — all paths simultaneously Only paths exercised by test vectors
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
CLK CLK
■ ■
■ T_launch_lat ■ T_capture_lat
▼ ▼
[FF1 Q]■■■logic1■■■logic2■■■logic3■■[FF2 D]
The hold check prevents new data from arriving at a FF before it has safely captured the previous data. Hold is
checked at the SAME clock edge (0-cycle difference), using MINIMUM delays.
Launch latency = 0.50ns. Capture latency = 0.55ns. Min data path delay = 0.05ns (very short — adjacent FFs, no
logic). Th = 0.03ns. Hold uncertainty = 0.05ns. | Hold Arrival = 0.50 + 0.05 = 0.55ns. Hold Required = 0.55 + 0.03 +
0.05 = 0.63ns. Hold Slack = 0.55 - 0.63 = -0.08ns → FAIL ✗ Fix: insert a delay buffer of ≥ 0.08ns on the data path.
■ Hold violations are INDEPENDENT of clock frequency. They exist even at 1 MHz. They only appear after CTS when real clock
latency is known. Short paths between adjacent FFs (placed close together after placement) are most vulnerable.
FF→FF path Register to register — most common and critical. Constrained by clock period.
PI→PO path Purely combinational — input to output. Constrained by both input/output delays.
WNS (Worst Negative Most negative slack across ALL paths. The hardest single path to fix. Must reach 0.
Slack)
TNS (Total Negative Sum of all negative slacks. Zero = all paths pass. Large TNS = many violations.
Slack)
FEP (Failing Endpoint) FF or output with at least one failing path. Count reduces as ECOs are applied.
MMMC analyses all operating modes and PVT corners simultaneously. This is the MANDATORY approach for
production sign-off — single-corner analysis misses real failures.
MMMC Concept:
create_mode func_mode
create_mode test_mode
update_timing -full
Even within one PVT corner, cells on different parts of the chip experience different conditions (process gradients, IR
drop, temperature). OCV models this pessimism.
OCV Concept:
This models the worst case: launch side is slow (data arrives late)
AND capture side is fast (clock arrives early = less time for data).
# Flat OCVM
set_ocvm_mode advanced
In OCV analysis, both launch and capture clock paths are derated. But they share a common segment from the clock
root to their divergence point. This shared segment is derated twice — once as late (launch) and once as early
(capture). This is physically impossible — the same wire CANNOT be both late AND early.
CPPR Example:
CLK (root)
= typically 10–30ps
■ Always enable CPPR in production sign-off: set_app_var timing_remove_clock_reconvergence_pessimism true. Not enabling
CPPR typically wastes 10–30ps — forcing unnecessary cell upsizing.
Yes
Cell upsize BUF_X1 → BUF_X4: more Area + power Driver has high fanout load
drive strength → faster increase
LVT swap HVT → LVT: lower Vt → Leakage power Quick wins on near-critical paths
faster switching increase
Buffer Split long wire → reduces Area increase, 2 Long wire is dominant delay source
insertion wire delay new cells
Useful skew Delay capture clock → data May worsen Clock latency imbalance available
has more time adjacent hold
Cell Move cells closer → shorter Local Large wire delay between critical cells
relocation wire congestion may
worsen
α (activity factor) Fraction of cycles a node switches. Clock: α≈1.0. Data bus: α≈0.1–0.3. Idle register: α≈0.
VDD² lever Most powerful: reduce VDD by 20% → save 36% dynamic power. Foundation of DVFS.
I_leakage at 7nm Exponential with temp. Can be 40–60% of total power at idle. HVT cells have 5–10× less
leakage than LVT.
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■
■ core ■ ■ block ■
■ ■■■■■■ ■
■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■
Level Shifter (LS) Converts signal from low-VDD domain to high-VDD domain or vice versa. Must be powered
by BOTH supply domains.
ELS (Enable Level Combines level shifting + isolation in one cell. Most efficient at domain crossings.
Shifter)
Isolation Cell When a domain powers off, its outputs float. Isolation cells clamp output to safe value (0 or
1). Controlled by iso_enable signal.
Retention Register DFF with shadow latch backed by always-on supply. SAVE: copies main FF to shadow.
RESTORE: copies back after power-on.
Footer switch (NMOS) NMOS between virtual VVSS and primary VSS. Alternative to header. NMOS smaller than
PMOS.
# Power states
set_isolation ISO_LP \
■ Isolation cells must be activated BEFORE the domain powers off, and deactivated AFTER power-on is stable. Violating this
sequence causes X-propagation (unknown values) into the always-on domain.
P_dynamic = α · C · VDD² · f
Reducing VDD by 20% and f by 20%: P reduces to (0.8)² × 0.8 = 0.512 → 49% savings!
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
PD implications of DVFS:
Before delivering GDS to the foundry, ALL of the following must be clean. Missing even one sign-off check = no
tape-out.
■ DRC: Zero violations in Calibre DRC using the foundry's official rule deck
■ LVS: Layout netlist matches reference netlist exactly (Calibre LVS)
■ ERC: No floating gates, no open power/ground connections, no latch-up risk
■ Antenna: All antenna ratios within foundry limits OR waived with diodes
■ Metal Density: All metal layers within min/max density range for CMP uniformity
■ STA Sign-off: WNS ≥ 0 and TNS = 0 across ALL MMMC views, PrimeTime SI with SPEF, AOCV, CPPR
■ IR Drop: Static < 5% VDD; Dynamic < 10% VDD in all operating modes
■ EM Sign-off: All wires/vias within foundry Jmax EM current density limits
■ Formal Verification: Post-PD netlist == pre-PD netlist (Synopsys Formality or Cadence Conformal)
■ Power Analysis: Total chip power within product specification
■ GDS Quality: Correct cell names, layer mapping, scale, seal ring, pad frame, fill density
10.2 Calibre DRC — Flow and Debugging
W.M2 (width) M2 wire too narrow (e.g., after antenna fix shortened the wire). Fix: widen.
ENC.V1 (enclosure) Via not fully enclosed by M1 or M2 metal. Fix: extend metal in the failing direction.
DENSITY.M3 (density) Metal density too low/high. Fix: add/remove metal fill patterns.
EOL.M1 (end-of-line) Wire end too close to adjacent wire. Fix: increase spacing at wire end.
Floating Gate ERC Gate terminal not connected to any net. Will have undefined logic state. Fix: connect to VDD
or VSS pull-up/pull-down.
Open Power ERC Cell VDD or VSS pin not connected to power grid. Fix: fix PDN connectivity in ICC-II.
Latch-up ERC N-well to P-diffusion spacing too small. Latch-up can latch VDD to GND permanently —
destroying the chip.
DFM — Redundant Via Replace single vias with 2×1 or 2×2 via arrays. Most impactful yield improvement. Reduces
open-via failure rate.
DFM — Critical Area Regions where a particle defect would most likely cause a short or open. Minimise by
spreading wires and avoiding narrow spaces.
Formal verification uses mathematical proof to confirm two circuit representations are functionally IDENTICAL. Run
after every netlist modification:
• After synthesis: RTL vs gate netlist
• After DFT insertion: pre-scan vs post-scan netlist
• After every ECO: pre-ECO vs post-ECO netlist
• Before tape-out: final post-route netlist vs reference netlist
# Synopsys Formality
set_top r:/WORK/TOP
set_top i:/WORK/TOP
verify
■■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■ ■■■■■■■■■■■■
n+ n+ ■■■■■■■■■■■■
S D ↑ ↑ ↑
FinFET advantage:
• Drive strength quantisation: you CANNOT continuously size cells in FinFET. Cell size = number of fins (1-fin,
2-fin, 4-fin). PnR selects from discrete library entries only.
• Tighter DRC rules: metal pitch at 14nm ≈ 48nm vs 90nm at 28nm. DRC deck has 500+ rules vs ~100 at 130nm.
• Fin direction: fins run vertically across the full cell height. Gate runs horizontally over fins. Fin orientation is fixed
— all NFET and PFET fins must follow the global fin direction.
• Local interconnect (LI/M0): at TSMC N5/N3, additional routing layers below M1 (LISD, M0, LIG) provide local
connections between gate, source, and drain without using M1 routing tracks.
• N-well proximity effects: at advanced nodes, neighbouring transistors' N-wells interact, causing Vt variation. Strict
spacing rules between N-well regions.
■ Your Opensparc FPU project was at 14nm. At this node, every routing track counts. The minimum M1 pitch is ~48nm — a
single M1 wire is barely wider than a few atoms! This is why DRC decks are so large and complex at FinFET nodes.
11.3 Multi-Patterning
At sub-20nm, a single lithography exposure cannot print the minimum pitch. Multi-patterning decomposes one layout
layer into multiple masks to achieve sub-resolution pitch.
LELE 2 20nm–14n Router must colour-assign all wires; avoid unresolvable conflicts
m
SADP 1+spacer 14nm–7nm Spacer width controls final pitch; core mask must be correct
s
SAQP 2+spacer 7nm–5nm Two SADP rounds; very restrictive design rules
s
EUV 1 7nm and Single-mask; fewer MP constraints; EUV source power challenges
below
Manufacturing defects are unavoidable. Without DFT, defective chips reach customers. DFT adds controllability and
observability to every node so test patterns can detect faults.
Transition fault Node fails to switch within expected time. Detected by at-speed testing.
Controllability Ability to set any node to 0 or 1 via primary inputs (high = easier to test).
Observability Ability to observe any internal node at a primary output (high = faults detectable).
SE ■■■■ CK■■■
SCAN_IN ■■■ SFF1 ■■■ SFF2 ■■■ ... ■■■ SFFN ■■■ SCAN_OUT
Test Sequence:
1. SHIFT IN: SE=1, shift test pattern into all N FFs (N clocks)
1,000,000 FFs, 100 scan chains → 10,000 FFs per chain. Each pattern: 10,000 shift clocks + 1 capture clock. At
100MHz shift frequency: 10,001 × 10ns = 100µs per pattern. 50,000 test patterns → 5 seconds test time per chip.
More chains = shorter test time but more IO pins needed.
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■■■■■■■■■■■■■■■■■■■■■■■■■
■ ▼ ■ ■■■■■■■ ■ ■
■ ▼ ■ ■ ■
ICC-II commands:
set_scan_configuration -chain_count 16
■ Run scan reorder AFTER place_opt, BEFORE route_opt. Reduces scan routing wirelength by 30–50%. Skipping this causes
unnecessary routing congestion from long scan wires.
MBIST (Memory Built-In Self Test): SRAMs cannot be tested efficiently via scan chains. MBIST adds a dedicated
controller that applies March algorithms to test every memory cell. PD task: place MBIST controller close to its SRAM
(short routing to memory ports).
An ECO (Engineering Change Order) is a minimal targeted modification to a placed-and-routed design to fix a specific
problem without re-running the full PnR flow. ECOs preserve surrounding layout.
Functional ECO Fix a logic bug Post-tapeout High — logic must change
discovered late respin
DRC ECO Fix DRC violations After DRC run Low — geometry only
from Calibre
Cell upsizing BUF_X1→BUF_X4: more Area+power Driver has high fanout load
drive → faster
Buffer Split long wire → reduce 2 new cells Wire delay dominant
insertion RC delay
Useful skew Delay capture clock May worsen Available skew margin exists
arrival hold
Cell relocation Cells closer → shorter Local Large wire between critical cells
wire congestion
Hold buffer Add delay cell on data path Most common; auto-done by ICC-II clock_opt
insertion (adds ~20–50ps each)
HVT cell swap Swap LVT/SVT to HVT: slower Also saves leakage
= more delay
Useful skew (delay Delay launch FF's clock → Must check setup impact on adjacent paths
launch) data launches later
create_lib [Link] \
-technology [Link] \
read_verilog netlist.v
link_block
read_sdc [Link]
read_upf [Link]
# ■■ 3. Floorplan ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
initialize_floorplan \
-core_utilization 0.70 \
-core_offset {2 2 2 2}
compile_pg
# ■■ 5. Placement ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
place_opt
check_legality
report_congestion
# ■■ 6. CTS ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
clock_opt
report_clock_qor
# ■■ 7. Routing ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
route_opt
check_routes
report_route_drc
# ■■ 9. Outputs ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
# 1. Library
# 2. Read design
read_verilog design_final.v
link_design TOP
# 3. MMMC
create_mode func_mode
# 4. Constraints
read_sdc [Link]
# 5. Parasitics
# 6. OCV + CPPR
# 7. SI
update_timing -full
report_qor
report_power
read_verilog design.v
elaborate TOP
current_design TOP
report_qor
write_sdc [Link]
# Calibre DRC
# Calibre LVS
Setup Slack:
Hold Slack:
Dynamic Power:
Leakage Power:
IR Drop:
Antenna Ratio:
Ratio = Σ(wire+via area above gate) / gate oxide area [< foundry limit]
Crosstalk Noise:
Core Area:
Congestion Overflow:
CPPR Credit:
NLDM Delay:
Wire RC delay:
Core utilisation 60–70% for complex SoCs; 50% for first tapeouts; max 75% before congestion risk
IR drop limit Static < 5% VDD; Dynamic < 10% VDD; e.g., <40mV for VDD=0.8V
Scan chain length 500–2000 FFs per chain; more chains = faster test but more IO
Metal fill density Foundry typically requires 20–80% density per metal layer
DFM — redundant via Replace all single vias with double vias wherever space allows
EM limit (signal) 1–5 mA/µm wire width (check foundry spec per layer and temperature)
LVT leakage vs SVT LVT has ~2–4× more leakage than SVT at same cell function
FO4 delay Buffer driving 4× its own gate load. Standard benchmark ≈ 50–100ps at 14nm
AOCV Advanced On-Chip Variation — depth/distance-based OCV. More accurate than flat OCVM.
ATPG Automatic Test Pattern Generation — creates test vectors for fault detection.
BIST Built-In Self Test — dedicated hardware to test memories or logic blocks autonomously.
BSC Boundary Scan Cell — scan cell at IO pad for JTAG board-level test (IEEE 1149.1).
CDC Clock Domain Crossing — signal between different clock domains. Needs synchroniser.
CMP Chemical Mechanical Planarisation — fab process that polishes each metal layer flat.
CPPR Clock Path Pessimism Removal — removes double-derating of shared clock path.
DFM Design For Manufacturability — yield improvement beyond basic DRC compliance.
DFT Design For Test — adds testability (scan, BIST, JTAG) for manufacturing testing.
DRC Design Rule Check — geometric verification against foundry manufacturing rules.
DVFS Dynamic Voltage and Frequency Scaling — adjusts VDD and f to save power.
ECSM Effective Current Source Model — advanced transistor model for SI analysis.
ELS Enable Level Shifter — combined level shift + isolation cell at domain boundaries.
EM Electromigration — metal atom migration from high current density. Long-term failure.
ERC Electrical Rule Check — floating gates, open power pins, latch-up spacing.
EUV Extreme Ultraviolet Lithography — 13.5nm wavelength; single-pattern at 7nm and below.
FEP Failing Endpoint — FF or output with at least one negative timing arc.
FinFET Fin Field-Effect Transistor — 3D transistor; gate wraps 3 sides of vertical fin.
FO4 Fan-out of 4 — buffer driving 4× its own input capacitance. Standard delay benchmark.
GDS GDSII — binary layout polygon file. Final deliverable to foundry for mask making.
HVT High Threshold Voltage — slower cell with less leakage. Used on non-critical paths.
ICG Integrated Clock Gate — glitch-free clock gating cell (latch + AND gate).
IR Drop Voltage drop across power grid resistance. Reduces effective VDD at cells.
JTAG Joint Test Action Group (IEEE 1149.1) — boundary scan standard.
LEF Library Exchange Format — cell physical abstracts + technology layer rules.
LVT Low Threshold Voltage — faster cell with more leakage. Used on critical paths.
MBIST Memory BIST — tests SRAMs using March algorithms via dedicated controller.
MMMC Multi-Mode Multi-Corner — simultaneous STA across all modes and PVT corners.
MTF Mean Time to Failure — EM reliability metric. Must meet 10-year product lifetime.
NDM New Data Model — Synopsys ICC-II unified library format (.lib + .lef + .gds).
NDR Non-Default Routing Rule — custom width/spacing. Applied to clock nets (2W/2S).
OCV On-Chip Variation — spatial PVT variation modelled by derating cell delays.
PBA Path-Based Analysis — cell-by-cell accurate STA for worst-case paths (vs graph-based GBA).
PDN Power Distribution Network — VDD/VSS grid delivering power to all cells.
SADP Self-Aligned Double Patterning — spacer technique for 2× pitch reduction at 14nm/10nm.
SAQP Self-Aligned Quad Patterning — two SADP iterations for 4× pitch reduction at 7nm/5nm.
SDC Synopsys Design Constraints — timing constraint language used by all EDA tools.
SFF Scan Flip-Flop — DFF with extra scan-in and scan-enable inputs for DFT.
SI Signal Integrity — analysis of crosstalk noise and delay from capacitive coupling.
SPEF Standard Parasitic Exchange Format — extracted R+C for STA back-annotation.
STA Static Timing Analysis — formal exhaustive timing verification without simulation.
TAP Test Access Port — JTAG controller FSM (TDI, TDO, TMS, TCK).
TNS Total Negative Slack — sum of all failing path slacks. Zero = timing closed.
UPF Unified Power Format (IEEE 1801) — describes power domains, isolation, level shifters.
WNS Worst Negative Slack — most negative single path slack. Must reach ≥ 0.
CDC occurs when a signal passes from a flip-flop in one clock domain to a flip-flop in a different, asynchronous clock
domain. The receiving FF may sample the signal at the exact moment it is transitioning — violating setup or hold time
— causing metastability. This is one of the most common sources of hard-to-debug silicon failures.
■ ■
clk_a clk_b
■ CDC bugs do NOT appear in RTL simulation (clocks are ideal) or in STA (false paths are set across domains). They only
manifest on real silicon, intermittently, and are extremely hard to reproduce and debug.
The most common CDC fix: add two flip-flops in the destination domain before using the signal. The first FF may go
metastable, but has a full clock period to resolve before the second FF samples it.
Two-Flop Synchroniser:
Domain A ■ Domain B
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■ ■ ■ ■ ■
Verilog implementation:
if (!rst_b_n) begin
end
end
(minimise wire length between them) and use low-Vt, fast FFs.
Two-flop synchroniser only works for SINGLE-BIT signals. For multi-bit buses, multiple bits transition simultaneously,
and a synchroniser may capture them in different combinations — corrupting data.
Gray code counter Increment a counter where only ONE bit changes per count step. Apply a 2-flop sync to the
Gray-coded count. Only 1 bit ever transitions → safe to synchronise. Decode Gray back to
binary after the synchroniser. Used for FIFO pointers.
Async FIFO Most robust multi-bit CDC solution. Write pointer in write domain (clk_a). Read pointer in
read domain (clk_b). Both pointers Gray-coded before crossing. FIFO flags (full, empty)
computed by comparing synchronised pointers.
Handshake protocol Req/ack protocol: sender asserts REQ, waits for ACK from receiver. Each signal is
individually synchronised. Slow (2–4 latency cycles) but correct for single transfers.
Enable pulse For single-cycle enable pulses: stretch the pulse to be wider than the destination clock
synchroniser period before synchronising. Or use a toggle synchroniser.
↑ ↑ ↑ ↑ ↑ ↑
set_false_path in STA CDC paths must be false-pathed in SDC: set_false_path -from [get_clocks clk_a] -to
[get_clocks clk_b]. Otherwise STA gives false timing violations on crossing paths.
CDC static analysis Synopsys SpyGlass CDC, Cadence JasperGold CDC — formally identify all crossing
tools signals, classify them (single-bit, multi-bit), verify synchroniser presence.
Metastability MTBF MTBF = exp(Tw/τ) / (fc × fa × Td). Must be > 100 years for reliable operation. PD must
calc ensure synchroniser FFs are fast (use LVT) and physically close.
STA tools use two modes of delay calculation. Understanding the difference is important for both accuracy and
efficient ECO closure.
Method Tag each node with worst-case arrival/required Trace each specific path end-to-end with actual
times from all incoming paths input slew at each stage
Speed Very fast — one pass through the timing graph Slow — must re-compute for each path
individually
Accuracy Conservative — may report pessimistic More accurate — computes actual slew/delay
violations per path
Usage General timing analysis, all paths pt_eco_opt, final sign-off on critical paths
■ PBA can recover 10–50ps of false pessimism on critical paths compared to GBA. pt_eco_opt uses PBA internally — this is
why PT-suggested ECOs are sometimes more aggressive than what you'd estimate manually from GBA reports.
Standard STA without SI uses extracted SPEF parasitics but ignores coupling capacitors. SI-aware STA (PrimeTime
SI) models crosstalk delay on every net.
2. Enable SI calculation
4. Reports
A half-cycle path connects a FF that launches on the rising clock edge to one that captures on the FALLING edge (or
vice versa). Only half the clock period is available.
■■■■ ■■■■
↑ ↓
Data from FF1 must travel to FF2 in only HALF a clock period!
For a 2ns clock: only 1ns for the data path (vs 2ns for normal paths)
STA handling:
■ In PrimeTime, half-cycle paths are reported in a separate path group. Check for them with: report_timing -path_group
half_cycle_path. They are harder to close — treat them as double-frequency paths.
CLK_ROOT
■■■■■■■■■■■■■■■
■ SEGMENT ■
Without CPPR:
Tcl (Tool Command Language) is the scripting language inside every Synopsys and Cadence EDA tool. Writing your
own scripts is what separates junior from senior PD engineers.
# ■■ Variables ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
# ■■ Arithmetic ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
# ■■ Lists ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
set my_list {a b c d e}
# ■■ Conditionals ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
if {$slack < 0} {
} else {
puts "PASS"
# ■■ Loops ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
return $count
return $cells
# Find cells on paths with slack < 0.05 and swap SVT → LVT
set changed 0
incr changed
slack_wns_max_delay]"}
slack_tns_max_delay]"}
close $fh
proc report_hold_violators {} {
Power analysis determines whether the chip's power delivery system is adequate and whether the chip will overheat.
Two primary analysis types:
Static IR drop Leakage current only DC resistance of grid Voltus static mode
set_db rail_analysis_config {
-mode time_domain
-method dynamic_vectorbased
[all_registers]
# 4. Run analysis
# 5. Reports
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Action: add VDD strap at (350um, 420um) and add DCAP cells nearby
EM Report interpretation:
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Junction temperature affects transistor speed, leakage, and reliability. High temperature → more leakage → more
power → higher temperature (thermal runaway risk).
where R_theta_JA = junction-to-ambient thermal resistance (°C/W). Typical package: 20–50°C/W. Chip with 2W and
R_theta=40°C/W at 25°C ambient: Tj = 25 + 2×40 = 105°C.
Hotspot Local region of the chip with peak power density. Usually in high-activity CPU cores or
arithmetic units.
Thermal gradient Temperature difference across the die. Large gradients → Vt variation → timing variation
(timing closure becomes temperature-dependent).
Thermal-aware Place high-power cells away from each other to distribute heat. Avoid concentrated hot
placement spots.
Chip-package Thermal bumps (for flip-chip packages) placed directly under hotspots to improve heat
co-design extraction.
Modern chips have tens of millions of flip-flops. Without compression, test data volume and test time would be
unacceptably large. Scan compression reduces the number of scan channels needed by a compression ratio (e.g.,
100:1).
... ■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■
Stuck-at ATPG detects manufacturing defects that fix a node permanently. But some defects cause a node to switch
correctly but too slowly. These are only detected by running at the actual functional clock frequency.
Launch-on-Shift (LoS) The last shift clock (at functional speed) acts as the launch clock. Capture happens one
functional cycle later. Easy to implement but has poor coverage for slow-to-rise faults.
Launch-on-Capture Separate launch and capture clocks. Launch at functional speed, capture one cycle later.
(LoC) Better coverage but needs special clock handling.
Test coverage target >95% transition fault coverage for production. Requires careful at-speed clock routing — the
test clock must meet functional timing constraints.
PD impact At-speed test mode is a separate SDC mode in MMMC. Clock must transition at functional
frequency during test. Clock routing in test mode must meet functional timing.
IJTAG extends IEEE 1149.1 (JTAG) to provide standardised access to on-chip instruments such as BIST controllers,
PLL configuration registers, and embedded sensors.
SIB (Segment A 1-bit scan element that enables/disables access to a subtree of instruments. Allows
Insertion Bit) efficient navigation to any instrument without shifting through unrelated instruments.
PDL (Procedural Describes test procedures for accessing instruments in a retargetable way.
Description Language)
PD relevance IJTAG/JTAG cells must be placed in the IO ring. Their timing (to TCK) must be analysed in
the JTAG clock domain corner in MMMC.
Modern SoCs combine digital logic with analog circuits (ADCs, DACs, PLLs, SerDes, RF). Digital switching creates
substrate noise that disturbs analog circuits. The PD engineer must create physical isolation strategies.
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■
■ ■ ■ guard rings)■ ■ ■
■ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Key rules:
Substrate noise Digital switching induces current into the silicon substrate. This couples to analog circuits,
appearing as noise on sensitive nodes (ADC input, PLL VCO).
Deep N-well An N-well implanted deep under the P-substrate isolates a region from substrate noise.
Used under sensitive analog circuits. Requires process support.
Guard ring (P+) A ring of heavily-doped P+ contacts connected to VSS surrounding analog. Intercepts noise
current before it reaches analog. Width: typically 2–10µm.
Guard ring (N+) A ring of N+ contacts connected to VDD surrounding NMOS circuits. Prevents latch-up and
reduces noise coupling.
Latch-up A parasitic PNPN thyristor structure between PMOS source/body and NMOS source/body. If
triggered, it latches VDD to GND permanently — destroying the chip. Prevented by proper
guard ring spacing and N-well distances.
PD rule Never route high-activity digital nets directly over or under analog supply rails. Use metal
shielding layers between digital routing and analog supply.
Based on typical UK VLSI physical design engineer interviews (ARM, Imagination Technologies, Sifive, MediaTek
UK, Qualcomm UK, Cadence, Synopsys, FTDI, Dialog, Nordic, Renesas UK):
Floorplan / macro placement Very frequent (85%) Good Add PDN detail
IR drop and PDN analysis Very frequent (85%) Moderate LEARN NOW
FinFET/advanced node DRC Moderate (50%) Some (14nm exp) Mention your exp
Why it matters: Calibre is the industry standard. Not knowing it is the #1 reason graduates fail PD interviews.
How to learn: FREE: Download Skywater 130nm PDK ([Link]/google/skywater-pdk). Install KLayout (free,
[Link]). Run DRC on sample GDS. Read 10 DRC rules. Install Netgen (free). Run LVS on a sample design.
PAID: Mentor Calibre student license (check university access). Week 1–2: understand all 10 common violation
types. Week 3: fix them manually.
Why it matters: Frequently asked in UK interviews: 'Describe your PDN analysis experience. What was your IR
drop?'
How to learn: Action: in your existing ICC-II projects, run check_pg_connectivity and report_pg_supply_violation.
Then use report_rail if Voltus available. At minimum, learn the theory: V=IR, IR drop targets (5% static, 10%
dynamic), fix strategies (add straps, decap). Practice drawing PDN hierarchy from memory.
How to learn: Theory: understand metastability, MTBF, two-flop synchroniser, Gray code, async FIFO. Tool:
SpyGlass CDC (if available). Free alternative: write CDC scenarios in RTL and verify using open-source tools
(Verilator + checks). Interview prep: be able to draw a 2-flop synchroniser from memory and explain why 2 FFs.
Tcl Scripting
Why it matters: Senior engineers are expected to automate common tasks. Having 2–3 scripts on GitHub
demonstrates practical skill.
How to learn: Practice: write ONE script per week. Start with: (1) a PrimeTime script that reads SPEF, runs timing,
and logs WNS/TNS to a file, (2) an ICC-II script that identifies the 10 most congested GCells and reports their
locations, (3) a Tcl utility proc library you can reuse across projects. Resource: Tcl Tutorial at
[Link]/man/tcl8.6/tutorial. Then upload all scripts to GitHub with clear README.
Why it matters: UK employers want EVIDENCE of hands-on work. A GitHub portfolio separates you from other
candidates.
How to learn: Projects to build (in order): (1) 8-bit ALU: RTL → GDS in OpenLane. Show timing report, congestion
map, DRC clean. (2) RISC-V (picorv32): run through OpenLane. Document every step. (3) Multi-power domain: add
UPF to a design. Show isolation cells inserted. Each project: 1-page PDF writeup + screenshots + GitHub README.
Install: pip install openlane (Docker). Free Skywater PDK.
Why it matters: Modern sign-off always uses MMMC + AOCV or POCV. Single-corner STA is no longer acceptable.
How to learn: Practice: in PrimeTime, create a 3-view MMMC setup (func_setup SS, func_hold FF, test_setup SS).
Enable AOCV with set_ocvm_mode advanced. Enable CPPR. Compare WNS with and without AOCV — see how
much pessimism is removed. Resource: PrimeTime User Guide Chapter 9 (MMMC) — free via Synopsys
SolvNetPlus.
A strong GitHub profile is increasingly important for UK semiconductor roles. Here is a prioritised list of what to build,
in order of impact:
2 (Must have) Timing Analysis Deep-Dive Take a design, introduce a setup violation, document1finding
week + ECO fix in PrimeTime
3 (Strong) Tcl Automation Script Library5+ reusable procs: timing summary, congestion report,
1 week
LVT swap automation
4 (Strong) Multi-Power Domain Design UPF-annotated design, show isolation cells + level shifters
2 weeks in ICC-II
5 (Good) DRC Analysis Report Run KLayout DRC, document 5 violations, explain each
1 week
+ fix strategy
6 (Bonus) RISC-V through OpenLane picorv32 full flow, timing closure steps, before/after ECO
2 weeks
comparison
130 Q&A; across all topics · Beginner to Advanced · Real interview style
This section contains 130 interview questions exactly as asked in UK VLSI Physical Design interviews (ARM,
Imagination, MediaTek, Qualcomm, Cadence, Synopsys, Nordic, Dialog). Each answer is written in first-person so
you can speak it aloud directly. Questions are organised from basic to advanced within each topic.
A: A MOSFET is a voltage-controlled switch. It has four terminals: Gate, Drain, Source, and Body. In an NMOS
transistor, when the gate voltage exceeds the threshold voltage Vt, an inversion layer forms between drain and
source — creating a conducting channel. Current then flows from drain to source. When gate voltage is below Vt, no
channel forms and the device is OFF. This switching behaviour is the basis of all digital logic.
A: NMOS uses an n-type channel and turns ON when the gate is HIGH (Vgs > Vt). It is a strong pull-down device.
PMOS uses a p-type channel and turns ON when the gate is LOW (Vgs < -|Vt|). It is a strong pull-up device. NMOS
is approximately 2–3× faster than PMOS for the same width because electron mobility is higher than hole mobility. In
CMOS design, NMOS and PMOS are paired in complementary networks — PMOS as pull-up, NMOS as pull-down.
A: CMOS consumes near-zero static power because in any steady state, either the PMOS or the NMOS is OFF,
blocking the DC path from VDD to GND. NMOS-only logic requires a pull-up resistor, which always draws current.
CMOS also provides full voltage swing (output reaches exactly VDD or GND), giving large noise margins. The
combination of low power, full swing, and scalability makes CMOS the dominant technology for all digital ICs.
Q: What is threshold voltage and what are HVT, SVT, and LVT cells?
A: Threshold voltage Vt is the minimum gate-to-source voltage required to create a conducting channel in a
MOSFET. In standard cell libraries, three variants are offered: HVT (High Vt) cells have a higher threshold, so they
are slower but have very low leakage — used on non-critical paths to save power. SVT (Standard Vt) is the balanced
default. LVT (Low Vt) has lower threshold, making it faster but with more leakage — used only on critical timing
paths. The PD tool assigns Vt variants based on path slack to optimise power while meeting timing.
A: The CMOS inverter consists of one PMOS and one NMOS transistor. The PMOS source connects to VDD and
the NMOS source to GND. Both gates connect to the input. Both drains connect to the output. When input is LOW:
NMOS is OFF, PMOS is ON — output is pulled to VDD (logic 1). When input is HIGH: PMOS is OFF, NMOS is ON
— output is pulled to GND (logic 0). In steady state, one device is always OFF, so no DC current flows — this is why
CMOS has near-zero static power.
A: A D flip-flop is an edge-triggered memory element. On every rising clock edge, it captures the value at its D input
and presents it at Q. Setup time Tsu is the minimum time the D input must be stable before the clock edge to ensure
correct capture. Hold time Th is the minimum time D must remain stable after the clock edge. If setup is violated, the
FF may not capture the correct value and could go metastable — entering an undefined intermediate voltage state. If
hold is violated, the new data overwrites the just-captured data before it is safely stored.
Q: What is logic synthesis and what are its inputs and outputs?
A: Logic synthesis converts RTL (Verilog or VHDL) into a technology-mapped gate-level netlist. The inputs are: the
RTL source files, SDC timing constraints specifying the target frequency and other requirements, and the standard
cell library (.db or .lib files characterised for the target PVT corner. The outputs are: a gate-level Verilog netlist, an
updated SDC file, and reports for timing, area, and power. In my work I used Synopsys Design Compiler and DC
NXT.
Q: What is an SDC file and what are the key commands in it?
A: SDC stands for Synopsys Design Constraints. It is the universal timing constraint language used by synthesis,
place-and-route, and PrimeTime. Key commands are: create_clock to define the clock frequency;
set_clock_uncertainty to model jitter and skew margin; set_input_delay and set_output_delay to constrain interface
paths; set_false_path for paths that are never sensitised such as CDC crossings and async resets;
set_multicycle_path for paths that need more than one cycle; set_max_transition for slew limits; and set_driving_cell
and set_load for interface modelling.
Q: What is a false path and a multicycle path? When do you use them?
A: A false path is a timing path that is never exercised in real operation, so STA should ignore it. Common examples
are: asynchronous reset paths, clock domain crossing paths between unrelated clocks, and configuration bits that
are set once at startup. I set these with set_false_path in SDC. A multicycle path is a path where the logic
intentionally takes more than one clock cycle to settle because the logic is too complex or the operation is not
needed every cycle — like a divider or multiplier. I set set_multicycle_path N -setup AND always pair it with
set_multicycle_path N-1 -hold, otherwise the hold check is performed at the wrong clock edge.
A: compile_ultra is Design Compiler's highest-quality compilation mode. Compared to basic compile, it uses more
sophisticated algorithms: it performs restructuring to reduce logic levels on critical paths, applies timing-driven
technology mapping, does more aggressive multi-Vt optimisation, and can perform register retiming to move
flip-flops across combinational logic to balance path delays. Key flags I use are -gate_clock to auto-insert clock
gating for power saving, -scan to prepare the netlist for DFT, and -no_autoungroup to preserve hierarchy for easier
ECOs later.
Q: How do you check if synthesis met timing? What do you look for in the report?
A: I run report_qor which gives a summary of the worst negative slack and total negative slack. The key metrics are
WNS — Worst Negative Slack — which must be zero or positive for timing closure, and TNS — Total Negative Slack
— which must be zero meaning no paths are failing. I also run report_timing -delay_type max to see the actual worst
setup path, which shows the complete breakdown of each stage: clock latency, each cell's delay, each wire's delay,
and the final slack. If WNS is negative I look at which cell or wire is contributing most delay and address that first.
A: I start by estimating the die size using the formula: Core Area = Standard Cell Area from synthesis divided by the
target utilisation, which I typically set at 65–70%. I add the macro areas on top. Then I determine the aspect ratio
based on package constraints and dominant macro shapes. For macro placement, I follow the rule of placing macros
along the die edges to leave the centre open for standard cell routing. I group related macros together — for
example, placing an SRAM next to its controller. I align all macro corners to the routing grid and add halos of
typically 10–15µm around each macro. In my RISC-V project at 32nm, I placed hard macros containing
approximately 25,000 cells using this methodology.
A: Utilisation is the fraction of the core area occupied by standard cells. For example, 70% utilisation means 70% of
the core area is used by cells, leaving 30% as white space for routing, buffer insertion, and hold fixing. If utilisation is
too high — above 75–80% — there is insufficient routing headroom. This causes congestion, where the router
cannot fit all required wires, leading to DRC violations and unrouted nets. The fix is to increase the core size, which
reduces utilisation. I typically target 65–70% for complex designs.
A: The PDN is the hierarchy of metal structures that delivers VDD and VSS to every transistor in the design. Starting
from the top: the chip has bond pads or flip-chip bumps connected to the package. Inside the chip, a thick core ring
on the top metal layers (typically M8 or M9) runs around the core perimeter. From the ring, horizontal and vertical
power straps on intermediate metals (M5 to M8) carry current through the core. At the standard cell level, thin VDD
and VSS rails on M1 run inside every placement row. I create the PDN in ICC-II using create_pg_ring and
create_pg_mesh commands, followed by connect_pg_net to connect all cell power pins.
A: IR drop is the voltage loss along the resistance of the power grid, following Ohm's law V = I × R. A cell receiving
less than nominal VDD operates more slowly, which translates into timing violations. The acceptable limit is typically
less than 5% of VDD for static IR drop — so less than 40mV on a 0.8V supply. I fix IR drop by first identifying the
hot-spot location from the Voltus IR map. The most effective fix is to add more VDD or VSS metal straps in the
affected area, reducing the grid resistance. I also insert decoupling capacitor cells near high-switching regions to
absorb dynamic current surges. If a single area has very high activity, spreading the cells also helps.
A: Electromigration is the gradual physical displacement of metal atoms caused by momentum transfer from
conducting electrons at high current density. Over years of operation, this causes voids — open circuits — or hillocks
— short circuits — in the metal wires. It is governed by Black's equation where the mean time to failure is inversely
proportional to current density squared. The foundry specifies a maximum current density Jmax for each metal layer.
I avoid EM violations by ensuring no wire exceeds Jmax. Fixes include widening the wire (more width = lower current
density), using double or quad via arrays instead of single vias, and applying NDR rules with double width on clock
nets that carry continuous high-frequency switching current.
A: Placement in ICC-II happens in three stages. First, global placement uses an analytical force-directed method to
minimise total estimated wirelength across all nets. Cells may overlap at this stage. Second, legalisation moves cells
to the nearest legal placement row and site, resolving all overlaps while preserving the global placement distribution.
Third, detailed placement performs local perturbations — cell swapping, sliding, and flipping — to improve timing and
congestion beyond what global placement achieves. I run all three with the single command place_opt, which also
runs timing analysis throughout to guide cells on critical paths closer together.
A: Congestion occurs when the routing demand in a local area exceeds the available routing capacity. I identify it
using the GRC — Global Routing Congestion — map, where red tiles indicate overflow. Overflow means some nets
cannot be routed within that tile, which will cause DRC violations. To fix congestion I first try reducing the local cell
density target — setting max_density to 60% in the congested region so place_opt spreads cells out. I add
placement blockages near macros to prevent cells from crowding into areas where routing channels are obstructed.
If congestion is severe I may need to revisit the floorplan — move a macro to open a routing channel, or increase the
core size. I verify the fix by re-running global routing and checking that overflow is zero.
Q: What are placement blockages and when do you use each type?
A: There are three main types. A hard blockage prevents any standard cell from being placed in a region — I use
these under hard macros, IO pads, and analog blocks. A soft blockage tells the placer to prefer not to place cells
there, but allows it during legalisation if there is no other space — I use these near macro edges. A partial or density
blockage limits cell density to a percentage — for example 50% — in a region, which reduces congestion near
macros without completely blocking placement. I also use buffer blockages specifically to prevent clock tree buffers
from being placed inside macro halos, which is important for clean CTS.
A: After synthesis, the scan chain connects flip-flops in their synthesis order, which is typically alphabetical or
hierarchical. After placement, these FFs may be scattered across the die, creating very long scan routing wires that
consume routing resources and worsen congestion. Scan reordering re-stitches the scan chain after placement in
geometrical nearest-neighbour order, so each FF's scan output drives the physically closest FF's scan input. This
typically reduces scan routing wirelength by 30 to 50%. I always run compile_scan in ICC-II after place_opt and
before routing.
A: CTS — Clock Tree Synthesis — builds a balanced buffered tree to distribute the clock to all flip-flops with minimal
skew. Without CTS, a single clock wire driving thousands of FFs would have different arrival times at each FF due to
wire RC. The skew could be several nanoseconds — larger than the combinational logic delay — making timing
closure impossible. CTS inserts a hierarchy of buffers to equalise the clock arrival time at all sinks. I run CTS in
ICC-II using clock_opt, which performs CTS followed by post-CTS optimisation to fix hold violations introduced by
the real clock latency.
A: Clock skew is the difference in clock arrival time at two flip-flops. For example if FF1 receives the clock at 0.50ns
and FF2 at 0.55ns, the skew is 50ps. My target is less than 50 to 150ps for high-frequency designs. Insertion delay is
the total propagation delay through the clock tree buffers from the clock port to the FF clock pin — typically 300ps to
2ns. Clock uncertainty is modelled in SDC with set_clock_uncertainty and accounts for jitter from the PLL plus the
residual skew margin after CTS. I typically use 100ps setup uncertainty and 50ps hold uncertainty.
Q: Why do hold violations appear after CTS and how do you fix them?
A: Before CTS, the STA tool assumes ideal clocks with zero latency. After CTS, real clock latency values are
back-annotated to all flip-flops. For short data paths between adjacent flip-flops, the data may arrive at the capture
FF before the capture clock edge has passed — causing a hold violation. This is because the data path is so short
relative to the clock latency difference. Hold violations are fixed by inserting delay buffers on the short data path,
which adds propagation delay to ensure data does not arrive too early. ICC-II clock_opt does this automatically
during post-CTS optimisation. I also swap cells to HVT to add more delay on the shortest paths.
Q: What is an Integrated Clock Gate (ICG) and why is a plain AND gate not used?
A: An ICG is a glitch-free clock gating cell that stops the clock from reaching idle flip-flops, eliminating their dynamic
power consumption. A plain AND gate cannot be used because if the enable signal changes while the clock is HIGH,
it creates a spurious short clock pulse — a glitch — that could cause incorrect FF captures. The ICG contains a D
latch that samples the enable signal on the LOW phase of the clock and holds it stable throughout the HIGH phase
before passing it to the AND gate. This ensures the enable is only presented at the safe time. In my projects, clock
gating coverage was over 80%, which is typical for low-power designs.
A: Useful skew is the intentional introduction of a skew between launch and capture flip-flops to improve path timing.
For setup improvement, I delay the capture FF clock arrival by adding extra buffers to its clock branch. This gives the
data more time to propagate — it is equivalent to increasing the clock period for that specific path. The improvement
equals the skew I introduce. However I must be careful because adding useful skew to one path can worsen hold
timing on adjacent paths sharing the same clock branch. The CTS tool optimises useful skew globally to maximise
overall slack improvement without creating new violations.
A: I run route_opt which performs the complete routing flow in one command. Internally it executes: global routing
which assigns nets to GCell channels without exact tracks; track assignment which assigns exact metal tracks within
each GCell; detailed routing which assigns exact layer and coordinates while enforcing all DRC rules;
search-and-repair which is an iterative pass to fix remaining DRC violations; via optimisation which replaces single
vias with double vias for EM and yield; and signal integrity optimisation which adjusts routing to reduce crosstalk on
critical nets. After routing I run check_routes to verify zero DRC violations before sign-off.
A: The antenna effect occurs during plasma etching in fabrication. Long metal wires act as antennas and accumulate
charge from the plasma. If the gate oxide at the end of the wire has no discharge path — because the drain or
source is not yet connected at that fabrication step — the accumulated charge tunnels through the thin gate oxide
and damages it permanently. The antenna ratio is the cumulative wire and via area above the gate divided by the
gate oxide area, and the foundry specifies a maximum allowed ratio. I fix antenna violations in three ways: inserting
an antenna diode at the gate pin which provides a safe discharge path; adding a wire jumper to route the wire up to a
higher metal layer which resets the antenna counter; or configuring the router to limit wire length per layer.
A: Crosstalk is capacitive coupling between adjacent parallel wires. When an aggressor net switches, it induces a
voltage change on the victim net through their mutual coupling capacitance. The induced noise is approximately Cc
divided by Cc plus Cvictim, multiplied by the aggressor's voltage swing. For timing, the most damaging case is when
the aggressor and victim switch in opposite directions — the coupling opposes the victim's transition, making it
slower. This is called out-of-phase crosstalk delay and it adds to the setup path delay. I fix it by increasing spacing
between aggressor and victim, adding shield wires, or rerouting onto different layers.
A: DRC — Design Rule Check — verifies that all physical shapes in the layout satisfy the foundry's manufacturing
constraints. Common violations include: minimum spacing where two wires on the same layer are too close together;
minimum width where a wire is too narrow; enclosure where a via is not surrounded by enough metal on all sides;
minimum area where a metal shape is too small; and antenna violations. I run Calibre DRC using the
foundry-supplied rule deck to identify all violations. The tool produces a results file I open in the RVE GUI to navigate
to each violation location in the layout and fix it.
A: LVS — Layout versus Schematic — extracts the netlist from the GDS layout and compares it to the reference
netlist from synthesis. It verifies that the physical layout represents the correct circuit. LVS can report: shorts — two
nodes connected in layout that should not be, such as a VDD-to-VSS short which is critical; opens — a missing
connection; extra devices present in layout but not in the netlist; missing devices in netlist but not in layout; and
parameter mismatches where a transistor has wrong dimensions. I run Calibre LVS for production sign-off.
A: For a register-to-register path, the data arrival time equals the launch clock edge time plus the launch clock
latency plus the data path delay — which includes Tcq of the launch FF plus all combinational cell delays and wire
delays. The data required time equals the capture clock edge time plus the capture clock latency minus the setup
time Tsu minus the clock uncertainty. Setup slack equals required time minus arrival time, and it must be zero or
positive. In equation form: Slack = Tclk + Tcap_latency - Tlaunch_latency - Tdata - Tsu - Tuncertainty. Positive slack
means the path passes. Negative slack is a violation that must be fixed.
Q: What are WNS and TNS? How do you use them to track ECO progress?
A: WNS is the Worst Negative Slack — the single most negative slack value across all timing paths. It represents the
hardest path to fix. TNS is the Total Negative Slack — the sum of all negative slacks across all failing paths. TNS
gives a measure of the overall design health. When tracking ECO progress, I monitor both. Fixing the WNS path
improves WNS directly. But if TNS is large, it means many paths are failing and I need to work systematically
through all violating endpoints, not just the single worst path. Timing is closed when WNS reaches zero or above and
TNS equals zero.
A: MMMC stands for Multi-Mode Multi-Corner analysis. It runs STA simultaneously across all operating modes —
such as functional, test, and low-power — and all PVT corners. Single-corner analysis is insufficient because a path
that passes at the typical corner may fail at the slow-slow corner where transistors are slower and temperature is
higher. Similarly, hold violations that are hidden at the slow corner may appear at the fast-fast corner where
minimum path delays are shorter. For sign-off I always use at minimum a func_setup view with the SS slow-slow
corner for setup checking and a func_hold view with the FF fast-fast corner for hold checking.
A: OCV — On-Chip Variation — models the fact that cells on different parts of the die experience slightly different
process, voltage, and temperature conditions. With flat OCV, I apply a derating factor to all cells: late derating — for
example 1.05 — makes cells on the launch path appear 5% slower, while early derating — 0.95 — makes cells on
the capture path appear 5% faster. This is pessimistic because deep paths with many logic stages statistically
average out their variations. AOCV, Advanced OCV, addresses this by applying depth-based and distance-based
derating: shallow paths get larger derating and deep paths get smaller derating. This is more accurate and recovers
unnecessary pessimism, allowing tighter timing closure.
A: CPPR stands for Clock Path Pessimism Removal. In OCV analysis, the launch and capture clock paths both
share a common segment from the clock root to their divergence point. Without CPPR, this shared segment is
derated twice — once as late for the launch path and once as early for the capture path. This is physically impossible
because the same wire segment cannot simultaneously be both fast and slow. CPPR removes this double-derating
by crediting back the difference. The credit is typically 10 to 30 picoseconds. Not enabling CPPR forces unnecessary
ECO work to fix violations that are actually false pessimism. I always enable it with set_app_var
timing_remove_clock_reconvergence_pessimism true.
A: SI-aware STA models the effect of crosstalk on timing delay. Standard STA ignores coupling capacitors between
adjacent nets, using only the self-capacitance from SPEF. SI-aware STA includes coupling caps and computes the
worst-case scenario where an aggressor net switches in the opposite direction to the victim, adding extra delay to
the victim's transition. This is critical at 28nm and below where coupling capacitance can be 30 to 50% of total net
capacitance. I enable it in PrimeTime with set_delay_calculation -si_mode ARNOLDI and identify the worst crosstalk
paths with report_si_bottleneck -cost_type delta_delay.
A: UPF stands for Unified Power Format, standardised as IEEE 1801. It is a specification language that defines the
power intent of a multi-voltage design. UPF defines: power domains which group logic blocks sharing the same
supply; supply ports and nets which describe the voltage sources; power states for each domain such as ON, OFF,
or RETENTION; isolation strategies which specify what happens to outputs when a domain powers down; level
shifter strategies for signals crossing between domains at different voltages; and retention strategies for flip-flops
that must preserve their state across a power-down. In my Opensparc FPU project I worked with two power domains
at 14nm using UPF.
A: A level shifter converts a logic signal from the voltage level of one power domain to that of another. It is needed at
every signal crossing between domains operating at different supply voltages. For example if a 0.8V domain drives a
signal into a 1.0V domain, the 0.8V HIGH level may not meet the 1.0V domain's input high threshold VIH. The level
shifter uses transistors powered by both supply domains to correctly shift the signal level. There are two types: LH
for low-to-high conversion, which uses PMOS transistors from the higher supply, and HL for high-to-low, which uses
the lower supply. The level shifter must be powered by both domains simultaneously.
A: An isolation cell is required at the output of a power domain that can be switched off. When the domain is
powered down, its flip-flops lose their state and their outputs float to an undefined voltage. If these floating signals
propagate into the always-on domain, they create X-propagation which corrupts the logic of the live domain. The
isolation cell clamps the output to a defined safe value — either logic 0 or logic 1 — when activated by an isolation
enable signal. The isolation cell is powered by the always-on domain so it remains functional even when the source
domain is off. A critical rule is that isolation must be activated BEFORE the domain is powered off and deactivated
AFTER power is stable.
A: Power gating cuts the power supply to an idle block to eliminate its leakage current. It is implemented using
header switches — large PMOS transistors inserted between the primary VDD and the virtual VVDD of the block.
When the sleep signal makes the PMOS gate HIGH, the switch is OFF and the block has no power. When sleep is
LOW the switch is ON and VVDD equals VDD. In physical design, the header switches form a row at the top of the
power-gated cell region. They must be sized to handle the peak current of the entire domain — typically one switch
per 50 to 100 microns of standard cell row. Decoupling caps near the switches absorb the inrush current during
wake-up to prevent VDD bounce.
A: DVFS — Dynamic Voltage and Frequency Scaling — reduces both the supply voltage and clock frequency when
maximum performance is not needed. Since dynamic power scales as VDD squared times frequency, reducing both
by 20% saves approximately 49% of dynamic power. From a physical design perspective, the design must meet
timing at ALL voltage-frequency operating points, not just the maximum. This means MMMC must include corners
for all VF operating points. The power grid must deliver adequate current at the highest frequency without IR drop
violations, and the timing must not fail at lower voltages where cells are slower. In my experience the binding
constraint is usually the lowest-voltage highest-frequency corner.
A: For production sign-off, Calibre from Siemens EDA is the industry standard. The foundry provides an official
Calibre DRC rule deck in SVRF format that encodes all their manufacturing constraints. I run calibre -drc -hier -turbo
with the GDS and the rule deck. The results are viewed in the Calibre RVE GUI which shows violations by rule
name, layer, and exact coordinates. For LVS I run calibre -lvs -hier. In my open-source work and for learning I use
KLayout with the Skywater 130nm DRC script and Netgen for LVS, which are free and available on GitHub.
A: DRC — Design Rule Check — verifies geometric and physical manufacturing constraints: minimum wire width,
minimum spacing, enclosure rules, area rules, density rules, and antenna rules. These ensure the foundry can
reliably print and etch the designed patterns. ERC — Electrical Rule Check — verifies circuit-level electrical
correctness: no floating gate terminals that would have undefined states, no disconnected power or ground pins,
proper guard ring spacing to prevent latch-up, and correct N-well connections. DRC can pass while ERC fails — for
example a cell with its VDD pin not connected to the power grid may be DRC-clean but is an ERC failure.
A: Metal fill consists of small floating metal shapes inserted into areas of the die where a metal layer has insufficient
density. It is required because the CMP — Chemical Mechanical Planarisation — step in fabrication polishes each
metal layer to be flat. CMP works well only when the metal density is uniform. If density is too low in some areas,
those areas polish faster — called dishing — making them thinner than nominal, increasing resistance and causing
reliability failures. If density is too high, neighbouring areas are over-polished. Foundries specify a min and max
density window — typically 20% to 80% — for each layer. ICC-II inserts fill automatically with create_metal_fill.
A: Formal verification uses mathematical proof — specifically SAT solving and BDD-based equivalence checking —
to prove that two circuit representations are functionally identical. It is run at every stage where the netlist is modified.
After synthesis: the gate netlist must be equivalent to the RTL. After DFT insertion: the scan-inserted netlist must be
functionally equivalent to the pre-DFT netlist. After every ECO: the post-ECO netlist must match the pre-ECO netlist.
Before tape-out: the final post-route netlist must match the reference. I use Synopsys Formality for this. Unlike
simulation which only checks specific scenarios, formal verification covers all possible inputs.
A: In scan design, all standard flip-flops are replaced with scan flip-flops. A scan FF has an extra scan input SI and a
scan enable SE. When SE is HIGH — scan mode — the MUX passes SI to the FF input, forming a long shift register
chain. When SE is LOW — functional mode — the FF captures its D input normally. The test sequence has three
steps: shift-in — SE=1, apply N clock pulses to load the test pattern into all N FFs in the chain; capture — SE=0,
apply one functional clock — all FFs simultaneously capture the logic response of the combinational logic; shift-out
— SE=1, shift the captured response out to scan_out and compare to the expected pattern to detect faults.
A: The PD engineer has several DFT-related responsibilities. First, after placement, I run scan reordering to re-stitch
the scan chain in geometrical order, reducing scan routing wirelength by 30 to 50%. Second, I ensure the scan input
and scan output pins are connected to IO pads accessible from the tester. Third, I must verify that the DFT test mode
timing is satisfied — for at-speed testing, the test clock must meet the same timing as the functional clock, so the
clock routing must support functional-frequency operation. Fourth, for MBIST controllers I must place them physically
adjacent to their target SRAM to minimise routing to the memory ports.
Q: What is MBIST and how does it differ from scan-based SRAM testing?
A: MBIST — Memory Built-In Self Test — is a dedicated hardware controller that tests embedded SRAMs by
applying March algorithms directly through the memory's normal read-write interface. It differs from scan-based
testing in that SRAM cells are not scan-accessible — you cannot shift a test pattern into individual memory bit cells
through a scan chain because there are millions of them and they are accessed through address/data ports, not
individual pins. The MBIST controller writes and reads specific patterns — for example the March C minus algorithm
— to detect stuck-at, transition, and coupling faults in every bit cell. The result is a single PASS or FAIL output.
A: Metastability occurs when a flip-flop samples a signal at the exact moment it is transitioning, violating setup or
hold time. The FF enters an unstable equilibrium state where its output is neither HIGH nor LOW but somewhere in
between. Given enough time it will resolve to a valid state, but if the resolution takes longer than one clock period,
the metastable state propagates and corrupts downstream logic. A two-flop synchroniser addresses this by giving
the first FF a full clock period to resolve before the second FF samples it. The first FF may go metastable, but the
probability of it remaining metastable for a full clock period is exponentially small. The second FF then samples a
valid resolved value.
A: A two-flop synchroniser works only for single-bit signals because it samples at an arbitrary time with respect to the
source domain. For a multi-bit bus, each bit may be sampled at a different phase of its transition, so some bits are
captured with their new values and some with their old values, creating a corrupted intermediate combination that
was never a valid state. For example on a 4-bit counter going from 0111 to 1000, all four bits change and the
synchroniser might capture 1111 — a value that never existed. The correct solutions are Gray-coded pointers —
which change only one bit at a time — or an asynchronous FIFO for bulk data transfer.
A: CDC crossing paths must be false-pathed in the SDC with set_false_path -from [get_clocks clk_a] -to [get_clocks
clk_b]. This tells STA not to perform timing checks across the crossing because the clocks are asynchronous and
have no timing relationship. Without this, STA would report false violations on these paths since it would try to
enforce setup and hold constraints between clocks that have no fixed phase. The actual safety of the crossing is
ensured by the synchroniser circuit, not by timing constraints. I also run SpyGlass CDC or Cadence JasperGold
CDC to verify all crossing signals have proper synchronisation structures.
A: A FinFET uses a vertical fin of silicon as the channel, and the gate wraps around three sides of the fin — top and
both sidewalls. This tri-gate structure gives the gate much better electrostatic control over the channel compared to a
planar MOSFET where the gate only controls one face. Better gate control means the transistor can be turned off
much more decisively at short gate lengths, dramatically reducing leakage. The subthreshold slope approaches 65
mV per decade, close to the theoretical ideal of 60. Drive strength in FinFET is quantised — it is set by the number of
fins rather than by continuous width sizing. FinFET is used at 20nm and below.
Q: You worked at 14nm. What DRC challenges did you face at that node?
A: At 14nm the DRC deck is significantly more complex than at 32nm. The minimum metal pitch on M1 is
approximately 48nm compared to about 90nm at 32nm, leaving very little margin for error. I encountered more
end-of-line spacing rules, via enclosure rules with directional requirements, and multi-patterning colour constraints
where adjacent wires on the same layer must be assigned to different lithography masks. Fin direction rules required
all standard cells to align their fins in the same global orientation. The antenna ratio limits are also stricter at 14nm
because the gate oxide is thinner. Managing all of these required very careful cell placement near macro edges
where routing channels are narrow.
A: Multi-patterning is a technique used at sub-20nm nodes where the minimum feature pitch is smaller than what a
single lithography exposure can print. The layout for one metal layer is split across multiple masks — for example
LELE uses two litho-etch-litho-etch cycles with two separate masks. Adjacent wires on the same layer must be
placed on different masks, which the router refers to as different colours. A colouring conflict occurs when three
mutually adjacent wires exist because they cannot all be assigned valid different colours with only two masks. The
router must avoid these conflicts during routing. At 14nm I needed to be aware of M3 and M4 patterning constraints
during routing to avoid colouring DRC violations.
A: My most challenging project was the Opensparc FPU implementation at 14nm with two power domains. The FPU
had a clock period of 1.66ns and two voltage domains requiring level shifters and isolation cells at the boundaries.
The challenge was closing timing at 14nm where the DRC rules are much stricter and routing resources are scarcer.
I had significant congestion near the power domain boundary because of the level shifter cells which are physically
larger than standard cells. I resolved it by increasing the halo around that region, spreading cells into adjacent areas,
and reordering the power domain boundary in the floorplan to give more routing clearance. I also had hold violations
after CTS in the domain with higher latency which I fixed by inserting delay buffers during clock_opt.
Q: Describe a timing violation you faced and how you solved it.
A: In my RISC-V project at 32nm I had a setup WNS of negative 120ps on a path through the ALU. Using
report_timing with full_clock path type, I identified that 65% of the slack violation came from a long wire between the
adder output register and the result mux. The wire was approximately 800 microns — it had been placed far away
because of a macro in between. I first tried inserting a buffer to break the wire, which recovered 60ps. Then I moved
the mux cell closer to the adder register using set_cell_location, reducing wire delay further and recovering another
70ps. After re-routing and re-extracting SPEF, the final slack was positive 15ps — timing closed.
Q: How do you handle pressure when timing is not closing near tapeout?
A: I prioritise systematically. First I check whether any violations are false positives by verifying CPPR is enabled and
AOCV derating is correctly configured — sometimes false pessimism accounts for some of the reported violations.
Then I focus on paths that contribute most to TNS, not just the single worst WNS path, because fixing many
moderate violations often reduces TNS faster than chasing one very hard path. I communicate clearly with the team
— reporting the current WNS, TNS, and my fix rate per day — so the project manager can make an informed
decision about schedule. I also flag any violations that may require a floorplan change early, because those take the
most time to implement and re-verify.
Q: Why did you choose Physical Design over RTL design or verification?
A: I am drawn to Physical Design because it combines multiple disciplines simultaneously. Every decision I make —
cell placement, clock routing, power grid — directly impacts timing, power, area, and reliability all at once. I enjoy the
problem-solving aspect: tracking down why a specific path is failing, understanding whether it is cell delay, wire
delay, or skew, and applying the targeted fix. During my training at Maven Silicon I had the opportunity to work
through the full RTL-to-GDS flow on four projects, and I found the physical implementation stage the most engaging.
The work is very concrete — you can visualise the circuit physically and see exactly what is happening.
A: In three years I want to be working as a mid-level Physical Design Engineer independently leading blocks of 500K
to 1M cells through full PD flow including sign-off. I want to deepen my expertise in advanced nodes — specifically
7nm and below — and become proficient in power grid analysis using Voltus or RedHawk. I am actively building my
skills in Calibre DRC and LVS and developing an OpenLane portfolio on GitHub to demonstrate hands-on capability
beyond my training projects. Longer term I am interested in physical design methodology — developing scripts and
flows that improve engineering efficiency across a team.
These are quick one-to-two sentence answers for common short questions in interviews:
A: A latch is level-sensitive — it is transparent when its enable is active and captures continuously. A flip-flop is
edge-triggered — it captures only at the active clock edge and holds value otherwise. Flip-flops are used almost
exclusively in synchronous digital design.
A: Clock jitter is the cycle-to-cycle variation in clock period caused by PLL phase noise and power supply noise. It is
modelled as part of clock uncertainty in STA.
A: A via is a vertical metal connection between two adjacent metal layers. DRC rules require: minimum enclosure
(metal must surround the via on all sides), minimum via-to-via spacing, and minimum area of the via itself.
A: Setup margin is how much extra time the data arrives before the setup window closes — positive = margin to
spare. Hold margin is how much extra time the data waits after the capture edge before the hold window expires.
A: Filler cells are inserted in gaps between standard cells to complete the N-well continuity and supply rail continuity
across rows. Without fillers, the N-well breaks and power rails are interrupted.
A: Global routing assigns each net to a sequence of GCell channels without exact tracks — it is fast and used for
congestion estimation. Detailed routing assigns exact layer, track, and coordinates while enforcing all DRC rules.
A: SPEF — Standard Parasitic Exchange Format — contains extracted resistance and capacitance of every routed
net. It is back-annotated to PrimeTime to replace estimated wire delays with actual post-route values for accurate
sign-off STA.
A: Fusion Compiler is Synopsys's unified synthesis-to-implementation tool that integrates Design Compiler and
ICC-II into a single flow, enabling concurrent optimisation of synthesis and physical design for better QoR.
A: Functional power is the power consumed during normal chip operation. Test power is the power during scan shift,
which is often higher because all flip-flops toggle every shift clock — activity factor approaches 1.0. This can damage
the chip if not managed with scan power reduction techniques.
A: ECO — Engineering Change Order — is a targeted minimal change to a placed-and-routed design to fix a
problem. Types include timing ECO (fix setup/hold), functional ECO (fix logic bug), power ECO (fix IR drop or EM),
and DRC ECO (fix geometric violations).
A: A path between a FF triggered on the rising edge and one triggered on the falling edge. Only half the clock period
is available for the combinational logic, making timing closure twice as hard as full-cycle paths.
A: The remaining 25–30% of core area is needed for routing tracks, buffer insertion during CTS and ECO, hold fix
buffers, and metal fill. Higher utilisation leaves insufficient routing headroom, causing DRC violations.
A: A clock tree is a hierarchy of buffers with balanced paths to each sink — skew of 50–150ps typical. A clock mesh
is a grid of metal wires shorted at crossings, giving very low skew under 5ps but consuming much more power due to
high capacitance. Meshes are used in CPUs and GPUs.
A: Decap cells provide a local charge reservoir between VDD and VSS. During simultaneous switching, cells draw a
large instantaneous current. The decap absorbs this surge, preventing a large voltage droop on the local VDD —
reducing dynamic IR drop.
A: GBA — Graph-Based Analysis — tags nodes with worst-case arrivals from all paths simultaneously. Fast but
conservative. PBA — Path-Based Analysis — traces each individual path with actual input slews. Slower but more
accurate. PBA can recover 10–50ps of false pessimism on specific critical paths.
Always prepare 3–5 questions to ask. This demonstrates genuine interest and technical depth. Pick the ones most
relevant to the company and role: