0% found this document useful (0 votes)
9 views22 pages

Virtual Machines and Dynamic Translation

Chapter 8 discusses virtual machines and dynamic translation, emphasizing the relationship between the Instruction Set Architecture (ISA) and the environment to form a virtual machine. It covers the security model, the role of hypervisors in managing multiple virtual machines, and the challenges of binary translation and emulation, including performance trade-offs and security risks. Additionally, it explores dynamic translation techniques, JIT compilation, and the implications of hardware virtualization support, highlighting the importance of security measures against potential vulnerabilities in hypervisors and virtual machines.

Uploaded by

taha.aimen
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

Virtual Machines and Dynamic Translation

Chapter 8 discusses virtual machines and dynamic translation, emphasizing the relationship between the Instruction Set Architecture (ISA) and the environment to form a virtual machine. It covers the security model, the role of hypervisors in managing multiple virtual machines, and the challenges of binary translation and emulation, including performance trade-offs and security risks. Additionally, it explores dynamic translation techniques, JIT compilation, and the implications of hardware virtualization support, highlighting the importance of security measures against potential vulnerabilities in hypervisors and virtual machines.

Uploaded by

taha.aimen
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 8 Virtual Machines & Dynamic

Translation
ISA + Environment = Virtual Machine
ISA alone ‫ محتاج‬- ‫ مش كافي‬I/O.

The ABI (Application Binary Interface)

Binary program =
[ISA instructions] +
[System call interface] +
[Initial state/data]

Contract:

 Which instructions available


 Which syscalls possible (I/O)
 Process creation state

OS implements the virtual machine - ‫ يقرأ الـ‬binary، ‫ يعمل‬environment، ‫ ينفذ‬code.

Security Model
Trust boundary:

User code ←[syscall]→ OS kernel ←[driver]→ Hardware

ISA instructions: direct ‫ على‬hardware (constrained by privilege).

I/O: must go through OS (via syscall).

Attack surface: syscall interface = primary target.

Supporting Multiple VMs


Same OS، different versions:
Solaris 10 can run:
- Solaris 10 binaries
- Solaris 9 binaries
- SunOS 4 binaries (BSD)
- Linux binaries (!)

How? OS detects binary format → emulates expected syscalls.

Security risk: old syscall interfaces might have vulnerabilities.

OS-Level VM (Hypervisor)

[Guest OS 1] [Guest OS 2] [Guest OS 3]


[Hypervisor]
[Hardware]

Each guest thinks it owns the machine.

Hypervisor shares physical resources:

 CPU time-slicing
 Memory partitioning
 I/O virtualization

Examples: VMware، Xen، KVM، Hyper-V.

Security boundary: guest isolation critical - escape = compromise all guests.

Partial Software ISA Implementation


Trap-and-emulate ‫ للـ‬rare/expensive operations:

1. Rare Instructions

Decimal arithmetic on VAX


→ μVax: trap to software emulation

Trade-off: slow emulation vs die area.

2. Exceptional Cases
IEEE FP denormals
→ Most FPUs: trap to software handler

Common case: hardware fast path.

Rare case: software slow path.

3. Forward Compatibility

SPARC v7 CPU running v8 binary:


- v8 multiply instruction → undefined opcode
- Trap handler: emulate in software

Old hardware runs new binaries - no recompile.

Emulation - Pure Interpretation


Memory layout:
[Emulator Code] [Emulator Data: Guest Memory Image]

Main loop:
while (!stop) {
inst = GuestCode[PC];
PC += 4;
decode(inst);
execute(inst);
}

Decode Example

void execute(uint32_t inst) {


uint8_t opcode = inst >> 26;
switch (opcode) {
case 0x00: // ADD
uint8_t rd = (inst >> 11) & 0x1F;
uint8_t rs = (inst >> 21) & 0x1F;
uint8_t rt = (inst >> 16) & 0x1F;
GPR[rd] = GPR[rs] + GPR[rt];
break;
case 0x23: // LW
// ...
}
}

Performance
~100× slower than native execution (RISC-on-RISC).

Why so slow?

1. Fetch guest instruction (memory read)


2. Decode (switch table، shifts، masks)
3. Access guest register file (array lookup)
4. Execute (host ALU)
5. Update guest state
6. Loop back

.guest instruction = 20+ host instructions ‫كل‬

Security
Isolation perfect - guest code never executes directly.

Attack surface: emulator bugs - memory corruption، type confusion.

Binary Translation - Static


Compile-time: translate guest ISA → native ISA.

[Guest Binary] → [Translator] → [Native Binary]

Example:

Guest (MIPS):
lw $t0, 0($a0)
addi $t1, $t0, 5

Native (x86):
mov eax, [esi]
add eax, 5

Optimizations
Unlike emulation، can optimize:

1. Register allocation - guest regs → native regs


2. Dead code elimination - remove unused ISA side-effects
3. Instruction scheduling - reorder ‫ للـ‬native pipeline
4. Inlining - expand function calls

Result: 2-10× faster than emulation.

Problems
1. Indirect Jumps

Guest code:
jr $t0 // jump to register

Native code:
??? // where to jump?

Solution: PC mapping table.

PC_Map[guest_addr] = native_addr

native code:
mov eax, guest_t0
mov ebx, [PC_Map + eax*4]
jmp ebx

Cost: table lookup ‫ على كل‬indirect jump.

Optimization: inline ‫ للـ‬common targets (call/return).

2. Self-Modifying Code

Guest code:
sw $t0, label // write to code segment
label:
add $t1, $t2, $t3

Problem: native translation of label now stale.

Solutions:
1. Interpreter fallback - detect write، mark page as "interpret only"
2. Invalidate translations - flush affected native code
3. Write-protect code pages - trap on write

Cost: any write to code = expensive.

Modern ISAs: self-modifying code discouraged - JIT compilers use separate RWX pages.

3. Precise Exceptions

Guest:
inst 1
inst 2 ← exception here
inst 3

Native (reordered):
inst 3
inst 1
inst 2 ← exception here

Guest expects: exception at inst 2، inst 1 complete، inst 3 not started.

Reality: inst 3 already executed!

Solution: track micro-architectural state، restore on exception.

Binary Translation Architecture

[Guest Binary] ────────────────┐


│ │
[Translate] ────→ [Native Code]
│ │
[PC Mapping Table] ←───────────┘

[Emulator] ← self-modified pages

Native code için indirect jumps use PC table.

Emulator checks PC table - ‫ لو‬hit native code، jump back.

IBM AS/400 - High-Level ISA


User Applications

[High-Level Architecture Interface]

[Binary Translator] ← Software layer

[Hardware: PowerPC core]

System/38 (1978): memory-memory ISA، never directly executed.

AS/400 evolution:

 48-bit CISC → vertical microcode → hardware


 Later: binary translation to PowerPC

Advantage: ISA stability - hardware changes، ISA constant.

Virtualization from day 1 - every application already abstract.

Dynamic Translation (JIT)


Runtime: translate + cache + optimize based on runtime info.

Disk: [Bytecode/Guest ISA]


↓ load
Runtime: [Interpreter] + [Code Cache] + [Translator]

Execution Flow

1. Start: interpret bytecode


2. Hot code detected (loop/function executed N times)
3. Translate to native، optimize
4. Cache translation
5. Execute native code
6. Miss in cache? → translate more

Examples: Java JIT، JavaScript V8، Transmeta Crusoe.

Optimization Levels
Tier 0: Interpreter (slow، no overhead)
Tier 1: Quick translation (minimal optimization)
Tier 2: Optimizing compiler (heavy optimization)

Trade-off: compilation time vs execution speedup.

Strategy: start Tier 1، promote to Tier 2 ‫ لو‬very hot.

Transmeta Crusoe (2000)


x86 ISA → internal VLIW via software "Code Morphing".

x86 Binary

[Code Morphing Software] ← runs on boot

[VLIW Engine]

VLIW Format

64-bit: 2 RISC ops


128-bit: 4 RISC ops

Native ISA hidden - software layer translates.

Advantage: hardware simple (no x86 decode complexity).

System Architecture

[x86 BIOS] [x86 OS] [x86 Apps]



[Code Morph Software] ← ‫ في‬DRAM portion

[Translation Cache: VLIW]

[VLIW Processor]

Boot ROM: compressed Code Morph Software.

System DRAM partitioned:


 x86-visible: OS/apps think this is all memory
 Hidden: Code Morph workspace + translation cache

Translation Example

x86:
addl %eax, (%esp) // load، add، implicit flags
addl %ebx, (%esp) // load again، add
movl %esi, (%ebp)
subl %ecx, 5

RISC ops (intermediate):


ld %r30, [%esp]
add.c %eax, %eax, %r30 // .c = set condition codes
ld %r31, [%esp]
add.c %ebx, %ebx, %r31
ld %esi, [%ebp]
sub.c %ecx, %ecx, 5

Optimized:
ld %r30, [%esp] // load once
add %eax, %eax, %r30 // no .c (not used)
add %ebx, %ebx, %r30 // reuse r30
ld %esi, [%ebp]
sub.c %ecx, %ecx, 5 // only this .c needed

VLIW scheduled:
ld %r30,[%esp]; sub.c %ecx,%ecx,5
ld %esi,[%ebp]; add %eax,%eax,%r30; add %ebx,%ebx,%r30

Optimization: memory access ‫مشترك‬، redundant flags removed، parallel issue.

Translation Overhead
Highly-optimizing compiler = expensive.

Strategy:

1. Interpret initially (zero overhead)


2. Quick translate at threshold (e.g., 100 executions)
3. Optimize heavily at higher threshold (e.g., 10000 executions)

Instrumentation: translations count execution، track branch directions.


x86 Compatibility Issues
1. Instruction Ordering

x86 (in-order):
addl %eax, (%esp)
addl %ebx, (%esp)

VLIW (reordered):
ld %esi,[%ebp]; add %eax,...; add %ebx,...

Exception ‫ ممكن يحصل‬out-of-order - wrong x86 PC.

2. Precise State

Solution: Shadow Registers

Working registers: r0-r31 (VLIW uses)


Shadow registers: s0-s31 (x86 architectural state)

At translation block boundary:

commit:
s0 = r0
s1 = r1
...

On exception:

rollback:
r0 = s0
r1 = s1
...
PC = block_start
re-execute using interpreter

3. Self-Modifying Code

x86 write to code page



Page marked as translated?
↓ Yes
Trap to Code Morph Software

Invalidate translations for page
Mark page as "translate-on-execute"

Cost: first write expensive، subsequent writes tolerable.

x86 legacy: self-modifying code common ‫ في‬old code (rare today).

Security Considerations
Attack surface 1: Code Morph Software

Vulnerability ‫ في‬translator = game over


- Memory corruption
- Type confusion
- Integer overflow

Mitigation: Code Morph Software signed، integrity-checked.

Attack surface 2: Translation Cache

Flush translation cache


Monitor victim translation behavior
→ Learn control flow، data access patterns

Side-channel: translation events visible via timing.

Attack surface 3: Shadow Registers

Fault injection during commit


→ Corrupt architectural state

Mitigation: ECC on shadow registers.

JIT Compilation - Security


Code Injection via JIT
Attacker-controlled input

JIT compiler

Native code (RWX page)

Problem: malicious input → malicious native code.

Example: JavaScript JIT، Java JIT.

Mitigations:

1. Type checking before compilation


2. Sandbox generated code (NaCl، WASM)
3. W^X: code pages either RW or RX، never both
4. JIT spray prevention: limit gadgets ‫ في‬generated code

JIT Spraying

Attacker controls data



JIT emits data as immediate values

mov eax, 0x90909090 // NOP sled
mov ebx, 0x90909090
mov ecx, 0x90909090

Execute from middle of instruction
→ ROP gadgets!

Defense: randomize immediate encoding، constant blinding.

Spectre & JIT

JIT-compiled code:
if (index < bound) { // bounds check
load array[index]
}

Speculative execution:
Predict branch taken
Load array[out_of_bounds]
→ Cache side-channel leak

JIT = perfect Spectre target - attacker controls input → controls speculation.

Mitigation:

1. Insert lfence after bounds check


2. Speculative load hardening (SLH)
3. Index masking

Hypervisor Types
Type 1 (Bare-Metal)

[Guest OS 1] [Guest OS 2]
[Hypervisor]
[Hardware]

Examples: VMware ESXi، Xen، Hyper-V.

Security: hypervisor = TCB (Trusted Computing Base) - must be minimal.

Type 2 (Hosted)

[Guest OS]
[Hypervisor/VMM]
[Host OS]
[Hardware]

Examples: VirtualBox، VMware Workstation.

Security: attack surface = host OS + hypervisor.

VM Escape Attacks
Attacker in Guest

Exploit hypervisor bug

Escape to Host

Compromise all guests

Common vulnerabilities:

 Device emulation bugs (e.g., virtual NIC)


 DMA attacks (guest DMA to hypervisor memory)
 Hypercall interface bugs

Defense:

1. Minimize hypervisor TCB


2. Hardware virtualization (VT-x، AMD-V) - trap in hardware
3. IOMMU - isolate guest DMA
4. Formal verification (seL4)

Hardware Virtualization Support


VT-x (Intel) / AMD-V
Two modes:

 Root mode: hypervisor runs


 Non-root mode: guest OS runs

VMCS (Virtual Machine Control Structure): saved guest state.

VM Entry (hypervisor → guest):


Load VMCS
Switch to non-root mode

VM Exit (guest → hypervisor):


Trap on privileged operation
Save guest state to VMCS
Switch to root mode

Performance: hardware trap faster than software emulation.

Security: guest can't escape non-root mode - hardware enforced.

Nested Paging (EPT / NPT)


Problem: guest page table translation.
Guest virtual addr
↓ Guest page table
Guest physical addr
↓ Hypervisor page table (shadow PT - slow!)
Host physical addr

Solution: Hardware nested page tables

Guest VA → Guest PA (guest PT) → Host PA (EPT)

Single walk - hardware combines both.

Security: guest can't access hypervisor memory - EPT enforces isolation.

Case Study: Rowhammer via VM


Attacker VM:
while (1) {
access(row A)
access(row B)
}

Bit flip in adjacent row C

Row C belongs to hypervisor!

Modify page tables → escape VM

Defense:

1. ECC memory (detects flips)


2. Memory refresh rate increase
3. TRR (Target Row Refresh)
4. Isolate VMs to separate DRAM chips

Performance Virtualization
Para-virtualization
Guest OS knows it's virtualized - uses hypercalls instead of privileged instructions.
Guest OS:
// Instead of: cli (disable interrupts)
hypercall(DISABLE_INTERRUPTS)

Advantage: faster than trap-and-emulate.

Disadvantage: requires OS modification.

Example: Xen PV guests.

Hardware Acceleration
Pass-through devices: guest directly accesses hardware.

[Guest] → [Hardware NIC]


(no hypervisor in data path)

SR-IOV: single physical device → multiple virtual functions.

Security: IOMMU mandatory - prevent DMA attacks.

Hypervisor & VM Security


Hypervisor FSM Vulnerabilities
VM state machine:

States:
VM_RUNNING → VM_EXIT_PENDING → VM_STOPPED → EMULATE_DEVICE →
VM_ENTRY_PENDING → VM_RUNNING

Critical transition: VM exit handling

Guest executes privileged instruction:


1. Hardware traps (VM exit)
2. FSM: VM_RUNNING → VM_EXIT_PENDING
3. Save guest state to VMCS
4. Load hypervisor state
5. FSM: VM_EXIT_PENDING → VM_STOPPED
6. Jump to hypervisor handler
Glitch attack on VM exit:

Voltage drop during step 3-4:


→ Guest state partially saved
→ Hypervisor state partially loaded
→ Mixed context: guest+hypervisor
→ Guest registers leaked to hypervisor
→ Or: skip privilege checks entirely

Example: VMLAUNCH glitching

Normal FSM:
VMLAUNCH instruction
→ Check VMCS validity (microcode)
→ If invalid: #GP exception
→ If valid: enter guest

Glitched FSM:
Voltage drop during validity check
→ Comparison result flips: invalid→valid
→ Enter guest with malformed VMCS
→ Guest controls hypervisor page tables
→ VM escape

Binary Translator FSM Attacks


Translation state machine:

INTERPRET → PROFILE → DETECT_HOT → TRANSLATE → CACHE → EXECUTE_NATIVE


↑ ↓
└──────────────── INVALIDATE ←──────────────────────────┘

State corruption attack:

Normal: INTERPRET → TRANSLATE


Glitched: INTERPRET → EXECUTE_NATIVE

What happens:
Guest bytecode treated as native code
→ Execute arbitrary host instructions
→ Privilege escalation
Cache poisoning:

Translation cache: [guest_PC] → [native_code_ptr]

Fault injection in cache write:


Corrupt native_code_ptr
→ Points to attacker-controlled memory
→ Guest PC maps to malicious native code
→ Code reuse attack

Microcode Atomicity in Virtualization


VT-x VMENTER microcode sequence (simplified):

μop0: Check VMCS pointer valid


μop1: Check guest state fields
μop2: Save host CR3, RSP, RIP
μop3: Load guest CR3
μop4: Load guest RSP
μop5: Load guest RIP
μop6: Switch to non-root mode
μop7: Flush TLB
μop8: Jump to guest RIP

Interrupt during microcode:

NMI arrives at μop3:


Host CR3 saved, guest CR3 not yet loaded
→ Page tables inconsistent
→ NMI handler runs with mixed context

L1 Terminal Fault (CVE-2018-3620):

VMENTER microcode:
μop1: Load guest page tables
μop2: Flush L1 cache ← NMI here!
μop3: Enter guest mode

NMI handler executes:


- L1 not flushed yet
- L1 contains hypervisor secrets
- Speculative execution in NMI handler
- Guest page tables active
- Speculative load: L1[secret] → leak via cache timing

Intel mitigation:

Mark VMENTER/VMEXIT as:


- Restartable atomic sections
- NMI delivered only at safe points
- Full L1 flush before resuming guest

AEX-Notify attack (SGX):

EENTER microcode (enter enclave):


μop10: Load enclave TCS (Thread Control Structure)
μop11: Clear debug registers
μop12: Initialize enclave stack
μop13: ...

AEX (Async Enclave Exit) on interrupt:


If interrupt at μop11:
→ Debug registers partially cleared
→ Enclave state partially initialized
→ Host can observe intermediate state
→ Leak enclave secrets

JIT Compiler FSM Security


Tiered compilation FSM:

INTERPRET → [threshold] → TIER1_JIT → [threshold] → TIER2_JIT


↑ ↓
└──────────── DEOPTIMIZE ←──────────────────────────┘

State corruption: skip tier

Normal: INTERPRET → TIER1 → TIER2


Glitched: INTERPRET → TIER2

Problem:
TIER2 assumes type profiling done in TIER1
→ Missing type guards
→ Type confusion
→ Memory corruption

Deoptimization attack:

TIER2 optimized code assumes:


x: Integer (based on profiling)

Runtime: x becomes Object

Normal FSM:
Detect type change → DEOPTIMIZE → INTERPRET → re-profile

Glitched FSM:
Type guard check fails
→ Skip deoptimization
→ Continue with wrong type assumption
→ Treat Object pointer as Integer
→ Arbitrary read/write

V8 example:

function f(x) {
return x + 1; // TIER2 assumes x is Integer
}

// Profile phase: always Integer


for (let i = 0; i < 10000; i++) f(42);

// Attack phase: pass Object


let obj = {valueOf: () => { /* exploit */ }};
f(obj); ← Type guard should deoptimize

Glitched: Skip guard, treat obj as Integer


→ Read obj's vtable pointer as number
→ Leak heap layout

Transmeta Code Morphing FSM


Translation cache lookup:
x86_PC → hash() → index → check_tag → HIT: execute_VLIW
→ MISS: translate → cache → execute

Hash table corruption:

Normal: hash(0x401000) = index 0x50 → VLIW_A


Fault: bit flip in index: 0x50 → 0x51 → VLIW_B

Execute wrong cached translation


→ Arbitrary VLIW code execution

Shadow register glitching:

Commit point (every 16 x86 instructions):


atomic {
r0_shadow = r0;
r1_shadow = r1;
...
r31_shadow = r31;
drain_store_buffer();
}

Voltage glitch during commit:


→ Partial shadow update
→ Exception occurs
→ Exception handler reads inconsistent shadow state
→ Mix of old and new register values
→ Information leak or corruption

Translation corruption:

TRANSLATE state:
1. Parse x86 instruction
2. Generate VLIW μops
3. Optimize
4. Emit VLIW code to cache

Fault at step 4:
→ Partial VLIW code written
→ Cache entry marked valid (bit flip)
→ Next execution: run partial code
→ Undefined behavior
VM/JIT Attack Surface Summary
Component Attack Vector Impact
Hypervisor FSM Voltage glitch VM exit Guest escape
VMCS validity Glitch check logic Malformed guest state
VMENTER microcode NMI during atomicity Host secret leak (L1TF)
JIT tier FSM Skip tier transition Type confusion
Deopt guard Glitch type check Arbitrary R/W
Translation cache Hash collision Code reuse
Shadow registers Commit glitch State leak
Microcode match Pattern corruption Wrong patch applied

Defense layers:

1. Hardware: Voltage sensors, critical path monitors


2. Microcode: Atomic sections, safe interrupt points
3. Software: Type guards, redundant checks
4. Architecture: Flush secrets before guest entry

You might also like