Chapter 8 Virtual Machines & Dynamic
Translation
ISA + Environment = Virtual Machine
ISA alone محتاج- مش كافيI/O.
The ABI (Application Binary Interface)
Binary program =
[ISA instructions] +
[System call interface] +
[Initial state/data]
Contract:
Which instructions available
Which syscalls possible (I/O)
Process creation state
OS implements the virtual machine - يقرأ الـbinary، يعملenvironment، ينفذcode.
Security Model
Trust boundary:
User code ←[syscall]→ OS kernel ←[driver]→ Hardware
ISA instructions: direct علىhardware (constrained by privilege).
I/O: must go through OS (via syscall).
Attack surface: syscall interface = primary target.
Supporting Multiple VMs
Same OS، different versions:
Solaris 10 can run:
- Solaris 10 binaries
- Solaris 9 binaries
- SunOS 4 binaries (BSD)
- Linux binaries (!)
How? OS detects binary format → emulates expected syscalls.
Security risk: old syscall interfaces might have vulnerabilities.
OS-Level VM (Hypervisor)
[Guest OS 1] [Guest OS 2] [Guest OS 3]
[Hypervisor]
[Hardware]
Each guest thinks it owns the machine.
Hypervisor shares physical resources:
CPU time-slicing
Memory partitioning
I/O virtualization
Examples: VMware، Xen، KVM، Hyper-V.
Security boundary: guest isolation critical - escape = compromise all guests.
Partial Software ISA Implementation
Trap-and-emulate للـrare/expensive operations:
1. Rare Instructions
Decimal arithmetic on VAX
→ μVax: trap to software emulation
Trade-off: slow emulation vs die area.
2. Exceptional Cases
IEEE FP denormals
→ Most FPUs: trap to software handler
Common case: hardware fast path.
Rare case: software slow path.
3. Forward Compatibility
SPARC v7 CPU running v8 binary:
- v8 multiply instruction → undefined opcode
- Trap handler: emulate in software
Old hardware runs new binaries - no recompile.
Emulation - Pure Interpretation
Memory layout:
[Emulator Code] [Emulator Data: Guest Memory Image]
Main loop:
while (!stop) {
inst = GuestCode[PC];
PC += 4;
decode(inst);
execute(inst);
}
Decode Example
void execute(uint32_t inst) {
uint8_t opcode = inst >> 26;
switch (opcode) {
case 0x00: // ADD
uint8_t rd = (inst >> 11) & 0x1F;
uint8_t rs = (inst >> 21) & 0x1F;
uint8_t rt = (inst >> 16) & 0x1F;
GPR[rd] = GPR[rs] + GPR[rt];
break;
case 0x23: // LW
// ...
}
}
Performance
~100× slower than native execution (RISC-on-RISC).
Why so slow?
1. Fetch guest instruction (memory read)
2. Decode (switch table، shifts، masks)
3. Access guest register file (array lookup)
4. Execute (host ALU)
5. Update guest state
6. Loop back
.guest instruction = 20+ host instructions كل
Security
Isolation perfect - guest code never executes directly.
Attack surface: emulator bugs - memory corruption، type confusion.
Binary Translation - Static
Compile-time: translate guest ISA → native ISA.
[Guest Binary] → [Translator] → [Native Binary]
Example:
Guest (MIPS):
lw $t0, 0($a0)
addi $t1, $t0, 5
Native (x86):
mov eax, [esi]
add eax, 5
Optimizations
Unlike emulation، can optimize:
1. Register allocation - guest regs → native regs
2. Dead code elimination - remove unused ISA side-effects
3. Instruction scheduling - reorder للـnative pipeline
4. Inlining - expand function calls
Result: 2-10× faster than emulation.
Problems
1. Indirect Jumps
Guest code:
jr $t0 // jump to register
Native code:
??? // where to jump?
Solution: PC mapping table.
PC_Map[guest_addr] = native_addr
native code:
mov eax, guest_t0
mov ebx, [PC_Map + eax*4]
jmp ebx
Cost: table lookup على كلindirect jump.
Optimization: inline للـcommon targets (call/return).
2. Self-Modifying Code
Guest code:
sw $t0, label // write to code segment
label:
add $t1, $t2, $t3
Problem: native translation of label now stale.
Solutions:
1. Interpreter fallback - detect write، mark page as "interpret only"
2. Invalidate translations - flush affected native code
3. Write-protect code pages - trap on write
Cost: any write to code = expensive.
Modern ISAs: self-modifying code discouraged - JIT compilers use separate RWX pages.
3. Precise Exceptions
Guest:
inst 1
inst 2 ← exception here
inst 3
Native (reordered):
inst 3
inst 1
inst 2 ← exception here
Guest expects: exception at inst 2، inst 1 complete، inst 3 not started.
Reality: inst 3 already executed!
Solution: track micro-architectural state، restore on exception.
Binary Translation Architecture
[Guest Binary] ────────────────┐
│ │
[Translate] ────→ [Native Code]
│ │
[PC Mapping Table] ←───────────┘
│
[Emulator] ← self-modified pages
Native code için indirect jumps use PC table.
Emulator checks PC table - لوhit native code، jump back.
IBM AS/400 - High-Level ISA
User Applications
↓
[High-Level Architecture Interface]
↓
[Binary Translator] ← Software layer
↓
[Hardware: PowerPC core]
System/38 (1978): memory-memory ISA، never directly executed.
AS/400 evolution:
48-bit CISC → vertical microcode → hardware
Later: binary translation to PowerPC
Advantage: ISA stability - hardware changes، ISA constant.
Virtualization from day 1 - every application already abstract.
Dynamic Translation (JIT)
Runtime: translate + cache + optimize based on runtime info.
Disk: [Bytecode/Guest ISA]
↓ load
Runtime: [Interpreter] + [Code Cache] + [Translator]
Execution Flow
1. Start: interpret bytecode
2. Hot code detected (loop/function executed N times)
3. Translate to native، optimize
4. Cache translation
5. Execute native code
6. Miss in cache? → translate more
Examples: Java JIT، JavaScript V8، Transmeta Crusoe.
Optimization Levels
Tier 0: Interpreter (slow، no overhead)
Tier 1: Quick translation (minimal optimization)
Tier 2: Optimizing compiler (heavy optimization)
Trade-off: compilation time vs execution speedup.
Strategy: start Tier 1، promote to Tier 2 لوvery hot.
Transmeta Crusoe (2000)
x86 ISA → internal VLIW via software "Code Morphing".
x86 Binary
↓
[Code Morphing Software] ← runs on boot
↓
[VLIW Engine]
VLIW Format
64-bit: 2 RISC ops
128-bit: 4 RISC ops
Native ISA hidden - software layer translates.
Advantage: hardware simple (no x86 decode complexity).
System Architecture
[x86 BIOS] [x86 OS] [x86 Apps]
↓
[Code Morph Software] ← فيDRAM portion
↓
[Translation Cache: VLIW]
↓
[VLIW Processor]
Boot ROM: compressed Code Morph Software.
System DRAM partitioned:
x86-visible: OS/apps think this is all memory
Hidden: Code Morph workspace + translation cache
Translation Example
x86:
addl %eax, (%esp) // load، add، implicit flags
addl %ebx, (%esp) // load again، add
movl %esi, (%ebp)
subl %ecx, 5
RISC ops (intermediate):
ld %r30, [%esp]
add.c %eax, %eax, %r30 // .c = set condition codes
ld %r31, [%esp]
add.c %ebx, %ebx, %r31
ld %esi, [%ebp]
sub.c %ecx, %ecx, 5
Optimized:
ld %r30, [%esp] // load once
add %eax, %eax, %r30 // no .c (not used)
add %ebx, %ebx, %r30 // reuse r30
ld %esi, [%ebp]
sub.c %ecx, %ecx, 5 // only this .c needed
VLIW scheduled:
ld %r30,[%esp]; sub.c %ecx,%ecx,5
ld %esi,[%ebp]; add %eax,%eax,%r30; add %ebx,%ebx,%r30
Optimization: memory access مشترك، redundant flags removed، parallel issue.
Translation Overhead
Highly-optimizing compiler = expensive.
Strategy:
1. Interpret initially (zero overhead)
2. Quick translate at threshold (e.g., 100 executions)
3. Optimize heavily at higher threshold (e.g., 10000 executions)
Instrumentation: translations count execution، track branch directions.
x86 Compatibility Issues
1. Instruction Ordering
x86 (in-order):
addl %eax, (%esp)
addl %ebx, (%esp)
VLIW (reordered):
ld %esi,[%ebp]; add %eax,...; add %ebx,...
Exception ممكن يحصلout-of-order - wrong x86 PC.
2. Precise State
Solution: Shadow Registers
Working registers: r0-r31 (VLIW uses)
Shadow registers: s0-s31 (x86 architectural state)
At translation block boundary:
commit:
s0 = r0
s1 = r1
...
On exception:
rollback:
r0 = s0
r1 = s1
...
PC = block_start
re-execute using interpreter
3. Self-Modifying Code
x86 write to code page
↓
Page marked as translated?
↓ Yes
Trap to Code Morph Software
↓
Invalidate translations for page
Mark page as "translate-on-execute"
Cost: first write expensive، subsequent writes tolerable.
x86 legacy: self-modifying code common فيold code (rare today).
Security Considerations
Attack surface 1: Code Morph Software
Vulnerability فيtranslator = game over
- Memory corruption
- Type confusion
- Integer overflow
Mitigation: Code Morph Software signed، integrity-checked.
Attack surface 2: Translation Cache
Flush translation cache
Monitor victim translation behavior
→ Learn control flow، data access patterns
Side-channel: translation events visible via timing.
Attack surface 3: Shadow Registers
Fault injection during commit
→ Corrupt architectural state
Mitigation: ECC on shadow registers.
JIT Compilation - Security
Code Injection via JIT
Attacker-controlled input
↓
JIT compiler
↓
Native code (RWX page)
Problem: malicious input → malicious native code.
Example: JavaScript JIT، Java JIT.
Mitigations:
1. Type checking before compilation
2. Sandbox generated code (NaCl، WASM)
3. W^X: code pages either RW or RX، never both
4. JIT spray prevention: limit gadgets فيgenerated code
JIT Spraying
Attacker controls data
↓
JIT emits data as immediate values
↓
mov eax, 0x90909090 // NOP sled
mov ebx, 0x90909090
mov ecx, 0x90909090
↓
Execute from middle of instruction
→ ROP gadgets!
Defense: randomize immediate encoding، constant blinding.
Spectre & JIT
JIT-compiled code:
if (index < bound) { // bounds check
load array[index]
}
Speculative execution:
Predict branch taken
Load array[out_of_bounds]
→ Cache side-channel leak
JIT = perfect Spectre target - attacker controls input → controls speculation.
Mitigation:
1. Insert lfence after bounds check
2. Speculative load hardening (SLH)
3. Index masking
Hypervisor Types
Type 1 (Bare-Metal)
[Guest OS 1] [Guest OS 2]
[Hypervisor]
[Hardware]
Examples: VMware ESXi، Xen، Hyper-V.
Security: hypervisor = TCB (Trusted Computing Base) - must be minimal.
Type 2 (Hosted)
[Guest OS]
[Hypervisor/VMM]
[Host OS]
[Hardware]
Examples: VirtualBox، VMware Workstation.
Security: attack surface = host OS + hypervisor.
VM Escape Attacks
Attacker in Guest
↓
Exploit hypervisor bug
↓
Escape to Host
↓
Compromise all guests
Common vulnerabilities:
Device emulation bugs (e.g., virtual NIC)
DMA attacks (guest DMA to hypervisor memory)
Hypercall interface bugs
Defense:
1. Minimize hypervisor TCB
2. Hardware virtualization (VT-x، AMD-V) - trap in hardware
3. IOMMU - isolate guest DMA
4. Formal verification (seL4)
Hardware Virtualization Support
VT-x (Intel) / AMD-V
Two modes:
Root mode: hypervisor runs
Non-root mode: guest OS runs
VMCS (Virtual Machine Control Structure): saved guest state.
VM Entry (hypervisor → guest):
Load VMCS
Switch to non-root mode
VM Exit (guest → hypervisor):
Trap on privileged operation
Save guest state to VMCS
Switch to root mode
Performance: hardware trap faster than software emulation.
Security: guest can't escape non-root mode - hardware enforced.
Nested Paging (EPT / NPT)
Problem: guest page table translation.
Guest virtual addr
↓ Guest page table
Guest physical addr
↓ Hypervisor page table (shadow PT - slow!)
Host physical addr
Solution: Hardware nested page tables
Guest VA → Guest PA (guest PT) → Host PA (EPT)
Single walk - hardware combines both.
Security: guest can't access hypervisor memory - EPT enforces isolation.
Case Study: Rowhammer via VM
Attacker VM:
while (1) {
access(row A)
access(row B)
}
↓
Bit flip in adjacent row C
↓
Row C belongs to hypervisor!
↓
Modify page tables → escape VM
Defense:
1. ECC memory (detects flips)
2. Memory refresh rate increase
3. TRR (Target Row Refresh)
4. Isolate VMs to separate DRAM chips
Performance Virtualization
Para-virtualization
Guest OS knows it's virtualized - uses hypercalls instead of privileged instructions.
Guest OS:
// Instead of: cli (disable interrupts)
hypercall(DISABLE_INTERRUPTS)
Advantage: faster than trap-and-emulate.
Disadvantage: requires OS modification.
Example: Xen PV guests.
Hardware Acceleration
Pass-through devices: guest directly accesses hardware.
[Guest] → [Hardware NIC]
(no hypervisor in data path)
SR-IOV: single physical device → multiple virtual functions.
Security: IOMMU mandatory - prevent DMA attacks.
Hypervisor & VM Security
Hypervisor FSM Vulnerabilities
VM state machine:
States:
VM_RUNNING → VM_EXIT_PENDING → VM_STOPPED → EMULATE_DEVICE →
VM_ENTRY_PENDING → VM_RUNNING
Critical transition: VM exit handling
Guest executes privileged instruction:
1. Hardware traps (VM exit)
2. FSM: VM_RUNNING → VM_EXIT_PENDING
3. Save guest state to VMCS
4. Load hypervisor state
5. FSM: VM_EXIT_PENDING → VM_STOPPED
6. Jump to hypervisor handler
Glitch attack on VM exit:
Voltage drop during step 3-4:
→ Guest state partially saved
→ Hypervisor state partially loaded
→ Mixed context: guest+hypervisor
→ Guest registers leaked to hypervisor
→ Or: skip privilege checks entirely
Example: VMLAUNCH glitching
Normal FSM:
VMLAUNCH instruction
→ Check VMCS validity (microcode)
→ If invalid: #GP exception
→ If valid: enter guest
Glitched FSM:
Voltage drop during validity check
→ Comparison result flips: invalid→valid
→ Enter guest with malformed VMCS
→ Guest controls hypervisor page tables
→ VM escape
Binary Translator FSM Attacks
Translation state machine:
INTERPRET → PROFILE → DETECT_HOT → TRANSLATE → CACHE → EXECUTE_NATIVE
↑ ↓
└──────────────── INVALIDATE ←──────────────────────────┘
State corruption attack:
Normal: INTERPRET → TRANSLATE
Glitched: INTERPRET → EXECUTE_NATIVE
What happens:
Guest bytecode treated as native code
→ Execute arbitrary host instructions
→ Privilege escalation
Cache poisoning:
Translation cache: [guest_PC] → [native_code_ptr]
Fault injection in cache write:
Corrupt native_code_ptr
→ Points to attacker-controlled memory
→ Guest PC maps to malicious native code
→ Code reuse attack
Microcode Atomicity in Virtualization
VT-x VMENTER microcode sequence (simplified):
μop0: Check VMCS pointer valid
μop1: Check guest state fields
μop2: Save host CR3, RSP, RIP
μop3: Load guest CR3
μop4: Load guest RSP
μop5: Load guest RIP
μop6: Switch to non-root mode
μop7: Flush TLB
μop8: Jump to guest RIP
Interrupt during microcode:
NMI arrives at μop3:
Host CR3 saved, guest CR3 not yet loaded
→ Page tables inconsistent
→ NMI handler runs with mixed context
L1 Terminal Fault (CVE-2018-3620):
VMENTER microcode:
μop1: Load guest page tables
μop2: Flush L1 cache ← NMI here!
μop3: Enter guest mode
NMI handler executes:
- L1 not flushed yet
- L1 contains hypervisor secrets
- Speculative execution in NMI handler
- Guest page tables active
- Speculative load: L1[secret] → leak via cache timing
Intel mitigation:
Mark VMENTER/VMEXIT as:
- Restartable atomic sections
- NMI delivered only at safe points
- Full L1 flush before resuming guest
AEX-Notify attack (SGX):
EENTER microcode (enter enclave):
μop10: Load enclave TCS (Thread Control Structure)
μop11: Clear debug registers
μop12: Initialize enclave stack
μop13: ...
AEX (Async Enclave Exit) on interrupt:
If interrupt at μop11:
→ Debug registers partially cleared
→ Enclave state partially initialized
→ Host can observe intermediate state
→ Leak enclave secrets
JIT Compiler FSM Security
Tiered compilation FSM:
INTERPRET → [threshold] → TIER1_JIT → [threshold] → TIER2_JIT
↑ ↓
└──────────── DEOPTIMIZE ←──────────────────────────┘
State corruption: skip tier
Normal: INTERPRET → TIER1 → TIER2
Glitched: INTERPRET → TIER2
Problem:
TIER2 assumes type profiling done in TIER1
→ Missing type guards
→ Type confusion
→ Memory corruption
Deoptimization attack:
TIER2 optimized code assumes:
x: Integer (based on profiling)
Runtime: x becomes Object
Normal FSM:
Detect type change → DEOPTIMIZE → INTERPRET → re-profile
Glitched FSM:
Type guard check fails
→ Skip deoptimization
→ Continue with wrong type assumption
→ Treat Object pointer as Integer
→ Arbitrary read/write
V8 example:
function f(x) {
return x + 1; // TIER2 assumes x is Integer
}
// Profile phase: always Integer
for (let i = 0; i < 10000; i++) f(42);
// Attack phase: pass Object
let obj = {valueOf: () => { /* exploit */ }};
f(obj); ← Type guard should deoptimize
Glitched: Skip guard, treat obj as Integer
→ Read obj's vtable pointer as number
→ Leak heap layout
Transmeta Code Morphing FSM
Translation cache lookup:
x86_PC → hash() → index → check_tag → HIT: execute_VLIW
→ MISS: translate → cache → execute
Hash table corruption:
Normal: hash(0x401000) = index 0x50 → VLIW_A
Fault: bit flip in index: 0x50 → 0x51 → VLIW_B
Execute wrong cached translation
→ Arbitrary VLIW code execution
Shadow register glitching:
Commit point (every 16 x86 instructions):
atomic {
r0_shadow = r0;
r1_shadow = r1;
...
r31_shadow = r31;
drain_store_buffer();
}
Voltage glitch during commit:
→ Partial shadow update
→ Exception occurs
→ Exception handler reads inconsistent shadow state
→ Mix of old and new register values
→ Information leak or corruption
Translation corruption:
TRANSLATE state:
1. Parse x86 instruction
2. Generate VLIW μops
3. Optimize
4. Emit VLIW code to cache
Fault at step 4:
→ Partial VLIW code written
→ Cache entry marked valid (bit flip)
→ Next execution: run partial code
→ Undefined behavior
VM/JIT Attack Surface Summary
Component Attack Vector Impact
Hypervisor FSM Voltage glitch VM exit Guest escape
VMCS validity Glitch check logic Malformed guest state
VMENTER microcode NMI during atomicity Host secret leak (L1TF)
JIT tier FSM Skip tier transition Type confusion
Deopt guard Glitch type check Arbitrary R/W
Translation cache Hash collision Code reuse
Shadow registers Commit glitch State leak
Microcode match Pattern corruption Wrong patch applied
Defense layers:
1. Hardware: Voltage sensors, critical path monitors
2. Microcode: Atomic sections, safe interrupt points
3. Software: Type guards, redundant checks
4. Architecture: Flush secrets before guest entry