UNIT – 2
1. Virtual Machines and Virtualization
Virtualization technology benefits the computer and IT industries by enabling users to
share expensive hardware resources by multiplexing VMs on the same set of hardware
hosts. Virtualization is a computer architecture technology by which multiple virtual
machines (VMs) are multiplexed in the same hardware machine. The idea of VMs can be
dated back to the 1960s.
The purpose of a VM is to enhance resource sharing by many users and improve
computer performance in terms of resource utilization and application flexibility.
Hardware resources (CPU, memory, I/O devices, etc.) or software resources (operating
system and software libraries) can be virtualized in various functional layers. This
virtualization technology has been revitalized as the demand for distributed and cloud
computing increased sharply in recent years.
The idea is to separate the hardware from the software to yield better system efficiency.
For example, computer users gained access to much enlarged memory space when the
concept of virtual memory was introduced. Similarly, virtualization techniques can be
applied to enhance the use of compute engines, networks, and storage.
According to a 2009 Gartner Report, virtualization was the top strategic technology
poised to change the computer industry. With sufficient storage, any computer platform
can be installed in another host computer, even if they use processors with different
instruction sets and run with distinct operating systems on the same hardware.
2. IMPLEMENTATION LEVELS OF VIRTUALIZATION
2.1 Levels of Virtualization Implementation
Figure 1: The architecture of a computer system before and after virtualization, where VMM stands for
virtual machine monitor.
A traditional computer runs with a host operating system specially tailored for its
hardware architecture, as shown in Figure 1. After virtualization, different user
applications managed by their own operating systems (guest OS) can run on the same
hardware, independent of the host OS. This is often done by adding additional software,
called a virtualization layer as shown in Figure 2.
This virtualization layer is known as hypervisor or virtual machine monitor (VMM). The
VMs are shown in the upper boxes, where applications run with their own guest OS over
the virtualized CPU, memory, and I/O resources.
The main function of the software layer for virtualization is to virtualize the physical
hardware of a host machine into virtual resources to be used by the VMs, exclusively.
This can be implemented at various operational levels. The virtualization software creates
the abstraction of VMs by interposing a virtualization layer at various levels of a
computer system.
Common virtualization layers include the instruction set architecture (ISA) level,
hardware level, operating system level, library support level, and application level
Figure 2: Virtualization ranging from hardware to applications in five abstraction levels.
The functions of different layers mentioned in Figure 2 are as follows:
At the ISA level, virtualization is performed by emulating a given ISA by the ISA of the
host machine. For example, MIPS binary code can run on an x86-based host machine
with the help of ISA emulation.
Hardware-level virtualization is performed right on top of the bare hardware. On the
one hand, this approach generates a virtual hardware environment for a VM. On the other
hand, the process manages the underlying hardware through virtualization.
OS-level virtualization creates isolated containers on a single physical server and the OS
instances to utilize the hardware and software in data centers. The containers behave like
real servers. OS-level virtualization is commonly used in creating virtual hosting
environments to allocate hardware resources among a large number of mutually
distrusting users.
Virtualization with library interfaces is possible by controlling the communication link
between applications and the rest of a system through API hooks. The software tool
WINE has implemented this approach to support Windows applications on top of UNIX
hosts.
Application-level virtualization is also known as process-level virtualization. The most
popular approach is to deploy high level language (HLL) VMs. In this scenario, the
virtualization layer sits as an application program on top of the operating system, and the
layer exports an abstraction of a VM that can run programs written and compiled to a
particular abstract machine definition.
2.2 VMM Design Requirements and Providers
As mentioned earlier, hardware-level virtualization inserts a layer between real hardware
and traditional operating systems. This layer is commonly called the Virtual Machine
Monitor (VMM) and it manages the hardware resources of a computing system. Each
time programs access the hardware the VMM captures the process. In this sense, the
VMM acts as a traditional OS. One hardware component, such as the CPU, can be
virtualized as several virtual copies. Therefore, several traditional operating systems
which are the same or different can sit on the same set of hardware simultaneously.
There are three requirements for a VMM.
1. A VMM should provide an environment for programs which is essentially identical to
the original machine.
2. Programs run in this environment should show, at worst, only minor decreases in
speed.
3. A VMM should be in complete control of the system resources.
Any program run under a VMM should exhibit a function identical to that which it runs
on the original machine directly.
A VMM should demonstrate efficiency in using the VMs. Compared with a physical
machine, no one prefers a VMM if its efficiency is too low. Traditional emulators and
complete software interpreters (simulators) emulate each instruction by means of
functions or macros. Such a method provides the most flexible solutions for VMMs.
However, emulators or simulators are too slow to be used as real machines. To guarantee
the efficiency of a VMM, a statistically dominant subset of the virtual processor’s
instructions needs to be executed directly by the real processor, with no software
intervention by the VMM.
2.3 Virtualization Support at the OS Level
With the help of VM technology, a new computing mode known as cloud computing is
emerging. Cloud computing is transforming the computing landscape by shifting the
hardware and staffing costs of managing a computational center to third parties, just like
banks. However, cloud computing has at least two challenges. The first is the ability to
use a variable number of physical machines and VM instances depending on the needs of
a problem. For example, a task may need only a single CPU during some phases of
execution but may need hundreds of CPUs at other times. The second challenge concerns
the slow operation of instantiating new VMs. Currently, new VMs originate either as
fresh boots or as replicates of a template VM, unaware of the current application state.
Therefore, to better support cloud computing, a large amount of research and
development should be done.
Reason for Virtualization Support at the OS Level
As mentioned earlier, it is slow to initialize a hardware-level VM because each VM
creates its own image from scratch. In a cloud computing environment, perhaps
thousands of VMs need to be initialized simultaneously. Besides slow operation, storing
the VM images also becomes an issue.
Operating system virtualization inserts a virtualization layer inside an operating system to
partition a machine’s physical resources. It enables multiple isolated VMs within a single
operating system kernel. This kind of VM is often called a virtual execution environment
(VE), Virtual Private System (VPS)
Advantages of OS Extensions
Compared to hardware-level virtualization, the benefits of OS extensions are twofold:
(1) VMs at the operating system level have minimal startup/shutdown costs, low resource
requirements, and high scalability
(2) For an OS-level VM, it is possible for a VM and its host environment to synchronize
state changes when necessary.
Disadvantages of OS Extensions
The main disadvantage of OS extensions is that all the VMs at operating system level on
a single container must have the same kind of guest operating system. That is, although
different OS-level VMs may have different operating system distributions, they must
pertain to the same operating system family. For example, a Windows distribution such
as Windows XP cannot run on a Linux-based container. However, users of cloud
computing have various preferences. Some prefer Windows and others prefer Linux or
other operating systems. Therefore, there is a challenge for OS-level virtualization in
such cases.
2.4 Middleware Support for Virtualization
Library-level virtualization is also known as user-level Application Binary Interface
(ABI) or API emulation. This type of virtualization can create execution environments
for running alien programs on a platform rather than creating a VM to run the entire
operating system. API call interception and remapping are the key functions performed.
The WABI offers middleware to convert Windows system calls to Solaris system calls.
Lxrun is really a system call emulator that enables Linux applications written for x86
hosts to run on UNIX systems. Similarly, Wine offers library support for virtualizing x86
processors to run Windows applications on UNIX hosts. Visual MainWin offers a
compiler support system to develop Windows applications using Visual Studio to run on
some UNIX hosts.
3. VIRTUALIZATION STRUCTURES/TOOLS AND MECHANISMS
In general, there are three typical classes of VM architecture. Figure 1 showed the
architectures of a machine before and after virtualization. Before virtualization, the
operating system manages the hardware. After virtualization, a virtualization layer is
inserted between the hardware and the operating system. In such a case, the virtualization
layer is responsible for converting portions of the real hardware into virtual hardware.
Therefore, different operating systems such as Linux and Windows can run on the same
physical machine, simultaneously.
Depending on the position of the virtualization layer, there are several classes of VM
architectures, namely the hypervisor architecture, paravirtualization, and host-based
virtualization. The hypervisor is also known as the VMM (Virtual Machine Monitor).
They both perform the same virtualization operations.
3.1 Hypervisor and Xen Architecture
The hypervisor software sits directly between the physical hardware and its OS. This
virtualization layer is referred to as either the VMM or the hypervisor. The hypervisor
provides hypercalls for the guest OSes and applications. Depending on the functionality,
a hypervisor can be assumed as a micro-kernel architecture like the Microsoft Hyper-V.
A micro-kernel hypervisor includes only the basic and unchanging functions (such as
physical memory management and processor scheduling). The device drivers and other
changeable components are outside the hypervisor.
Xen is an open source hypervisor program developed by Cambridge University. Xen
is a microkernel hypervisor, which separates the policy from the mechanism. The Xen
hypervisor implements all the mechanisms, leaving the policy to be handled by Domain
0, as shown in Figure 3. Xen does not include any device drivers natively. It just provides
a mechanism by which a guest OS can have direct access to the physical devices. As a
result, the size of the Xen hypervisor is kept rather small. Xen provides a virtual
environment located between the hardware and the OS.
A number of vendors are in the process of developing commercial Xen hypervisors,
among them are Citrix XenServer and Oracle VM.
Figure 3: The Xen architecture’s special domain 0 for control and I/O, and several guest domains for
user applications
The core components of a Xen system are the hypervisor, kernel, and applications. The
organization of the three components is important. Like other virtualization systems,
many guest OSes can run on top of the hypervisor. However, not all guest OSes are
created equal, and one in particular controls the others. The guest OS, which has control
ability, is called Domain 0, and the others are called Domain U. Domain 0 is a privileged
guest OS of Xen. It is first loaded when Xen boots without any file system drivers being
available. Domain 0 is designed to access hardware directly and manage devices.
Therefore, one of the responsibilities of Domain 0 is to allocate and map hardware
resources for the guest domains (the Domain U domains).
3.2 Binary Translation with Full Virtualization
Depending on implementation technologies, hardware virtualization can be classified into
two categories: full virtualization and host-based virtualization.
Full virtualization does not need to modify the host OS. It relies on binary translation to
trap and to virtualize the execution of certain sensitive, nonvirtualizable instructions. The
guest OSes and their applications consist of noncritical and critical instructions.
In a host-based system, both a host OS and a guest OS are used. A virtualization
software layer is built between the host OS and guest OS.
3.3 Para-Virtualization with Compiler Support
Para-virtualization needs to modify the guest operating systems. A para-virtualized VM
provides special APIs requiring substantial OS modifications in user applications.
Performance degradation is a critical issue of a virtualized system. No one wants to use a
VM if it is much slower than using a physical machine. The virtualization layer can be
inserted at different positions in a machine software stack. However, para-virtualization
attempts to reduce the virtualization overhead, and thus improve performance by
modifying only the guest OS kernel.
Figure 4: Para-virtualized VM architecture, which involves modifying the guest OS kernel to replace
nonvirtualizable instructions with hypercalls for the hypervisor or the VMM to carry out the
virtualization process
Figure 5: The use of a para-virtualized guest OS assisted by an intelligent compiler to replace
nonvirtualizable OS instructions by hypercalls
Figure 4 illustrates the concept of a para-virtualized VM architecture. The guest
operating systems are para-virtualized. They are assisted by an intelligent compiler to
replace the nonvirtualizable OS instructions by hypercalls as illustrated in Figure 5. The
traditional x86 processor offers four instruction execution rings: Rings 0, 1, 2, and 3. The
lower the ring number, the higher the privilege of instruction being executed. The OS is
responsible for managing the hardware and the privileged instructions to execute at Ring
0, while user-level applications run at Ring 3.
3.3.1 Para-Virtualization Architecture
When the x86 processor is virtualized, a virtualization layer is inserted between the
hardware and the OS. According to the x86 ring definition, the virtualization layer should
also be installed at Ring 0. Different instructions at Ring 0 may cause some problems. In
Figure 5, we show that para-virtualization replaces nonvirtualizable instructions with
hypercalls that communicate directly with the hypervisor or VMM. However, when the
guest OS kernel is modified for virtualization, it can no longer run on the hardware
directly.
Although para-virtualization reduces the overhead, it has incurred other problems. First,
its compatibility and portability may be in doubt, because it must support the unmodified
OS as well. Second, the cost of maintaining para-virtualized OSes is high, because they
may require deep OS kernel modifications. Finally, the performance advantage of para-
virtualization varies greatly due to workload variations. Compared with full
virtualization, para-virtualization is relatively easy and more practical. The main problem
in full virtualization is its low performance in binary translation. To speed up binary
translation is difficult. Therefore, many virtualization products employ the para-
virtualization architecture. The popular Xen, KVM, and VMware ESX are good
examples.
3.3.2 Para-Virtualization with Compiler Support
Unlike the full virtualization architecture which intercepts and emulates privileged and
sensitive instructions at runtime, para-virtualization handles these instructions at compile
time. The guest OS kernel is modified to replace the privileged and sensitive instructions
with hypercalls to the hypervisor or VMM. Xen assumes such a para-virtualization
architecture.
The guest OS running in a guest domain may run at Ring 1 instead of at Ring 0. This
implies that the guest OS may not be able to execute some privileged and sensitive
instructions. The privileged instructions are implemented by hypercalls to the hypervisor.
After replacing the instructions with hypercalls, the modified guest OS emulates the
behavior of the original guest OS. On an UNIX system, a system call involves an
interrupt or service routine. The hypercalls apply a dedicated service routine in Xen.
4. VIRTUALIZATION OF CPU, MEMORY, AND I/O DEVICES
To support virtualization, processors such as the x86 employ a special running mode and
instructions, known as hardware-assisted virtualization. In this way, the VMM and guest
OS run in different modes and all sensitive instructions of the guest OS and its
applications are trapped in the VMM. To save processor states, mode switching is
completed by hardware. For the x86 architecture, Intel and AMD have proprietary
technologies for hardware-assisted virtualization.
4.1 Hardware Support for Virtualization
Modern operating systems and processors permit multiple processes to run
simultaneously. If there is no protection mechanism in a processor, all instructions from
different processes will access the hardware directly and cause a system crash. Therefore,
all processors have at least two modes, user mode and supervisor mode, to ensure
controlled access of critical hardware. Instructions running in supervisor mode are called
privileged instructions. Other instructions are unprivileged instructions.
The VMware Workstation is a VM software suite for x86 and x86-64 computers. This
software suite allows users to set up multiple x86 and x86-64 virtual computers and to
use one or more of these VMs simultaneously with the host operating system.
4.2 CPU Virtualization
A VM is a duplicate of an existing computer system in which a majority of the VM
instructions are executed on the host processor in native mode. Thus, unprivileged
instructions of VMs run directly on the host machine for higher efficiency. Other critical
instructions should be handled carefully for correctness and stability. The critical
instructions are divided into three categories: privileged instructions, control sensitive
instructions, and behavior-sensitive instructions.
Privileged instructions execute in a privileged mode and will be trapped if executed
outside this mode.
Control-sensitive instructions attempt to change the configuration of resources used.
Behavior-sensitive instructions have different behaviors depending on the configuration
of resources, including the load and store operations over the virtual memory.
A CPU architecture is virtualizable if it supports the ability to run the VM’s privileged
and unprivileged instructions in the CPU’s user mode while the VMM runs in supervisor
mode. When the privileged instructions including control- and behavior-sensitive
instructions of a VM are executed, they are trapped in the VMM. In this case, the VMM
acts as a unified mediator for hardware access from different VMs to guarantee the
correctness and stability of the whole system.
4.2.1 Hardware-Assisted CPU Virtualization
This technique attempts to simplify virtualization because full or para-virtualization is
complicated. Intel and AMD add an additional mode called privilege mode level (some
people call it Ring-1) to x86 processors. Therefore, operating systems can still run at
Ring 0 and the hypervisor can run at Ring -1. All the privileged and sensitive instructions
are trapped in the hypervisor automatically. This technique removes the difficulty of
implementing binary translation of full virtualization. It also lets the operating system run
in VMs without modification.
4.3 Memory Virtualization
Virtual memory virtualization is similar to the virtual memory support provided by
modern operating systems. In a traditional execution environment, the operating system
maintains mappings of virtual memory to machine memory using page tables, which is a
one-stage mapping from virtual memory to machine memory. All modern x86 CPUs
include a memory management unit (MMU) and a translation lookaside buffer (TLB) to
optimize virtual memory performance. However, in a virtual execution environment,
virtual memory virtualization involves sharing the physical system memory in RAM and
dynamically allocating it to the physical memory of the VMs.
That means a two-stage mapping process should be maintained by the guest OS and the
VMM, respectively: virtual memory to physical memory and physical memory to
machine memory. Furthermore, MMU virtualization should be supported, which is
transparent to the guest OS. The guest OS continues to control the mapping of virtual
addresses to the physical memory addresses of VMs. But the guest OS cannot directly
access the actual machine memory. The VMM is responsible for mapping the guest
physical memory to the actual machine memory. Figure 6 shows the two-level memory
mapping procedure.
Figure 6: Two-level memory mapping procedure.
4.4 I/O Virtualization
I/O virtualization involves managing the routing of I/O requests between virtual devices
and the shared physical hardware. At the time of this writing, there are three ways to
implement I/O virtualization: Full Device Emulation, Para-Virtualization, and Direct
I/O.
Full device emulation is the first approach for I/O virtualization. Generally, this
approach emulates well-known, real-world devices. All the functions of a device or bus
infrastructure, such as device enumeration, identification, interrupts, and DMA, are
replicated in software. This software is located in the VMM and acts as a virtual device.
The I/O access requests of the guest OS are trapped in the VMM which interacts with the
I/O devices. The full device emulation approach is shown in Figure 7. A single hardware
device can be shared by multiple VMs that run concurrently. However, software
emulation runs much slower than the hardware it emulates.
Figure 7: Device emulation for I/O virtualization implemented inside the middle layer that maps real I/O
devices into the virtual devices for the guest device driver to use.
The para-virtualization method of I/O virtualization is typically used in Xen. It is also
known as the split driver model consisting of a frontend driver and a backend driver. The
frontend driver is running in Domain U and the backend driver is running in Domain 0.
They interact with each other via a block of shared memory. The frontend driver manages
the I/O requests of the guest OSes and the backend driver is responsible for managing the
real I/O devices and multiplexing the I/O data of different VMs. Although para-I/O-
virtualization achieves better device performance than full device emulation, it comes
with a higher CPU overhead.
Direct I/O virtualization lets the VM access devices directly. It can achieve close-to-
native performance without high CPU costs. However, current direct I/O virtualization
implementations focus on networking for mainframes. There are a lot of challenges for
commodity hardware devices. For example, when a physical device is reclaimed
(required by workload migration) for later reassignment, it may have been set to an
arbitrary state (e.g., DMA to some arbitrary memory locations) that can function
incorrectly or even crash the whole system.
4.5 Virtualization in Multi-Core Processors
Virtualizing a multi-core processor is relatively more complicated than virtualizing a uni-
core processor. Though multicore processors are claimed to have higher performance by
integrating multiple processor cores in a single chip, muti-core virtualiuzation has raised
some new challenges to computer architects, compiler constructors, system designers,
and application programmers. There are mainly two difficulties: Application programs
must be parallelized to use all cores fully, and software must explicitly assign tasks to the
cores, which is a very complex problem.
Concerning the first challenge, new programming models, languages, and libraries are
needed to make parallel programming easier. The second challenge has spawned research
involving scheduling algorithms and resource management policies. Yet these efforts
cannot balance well among performance, complexity, and other issues. What is worse, as
technology scales, a new challenge called dynamic heterogeneity is emerging to mix the
fat CPU core and thin GPU cores on the same chip, which further complicates the multi-
core or many-core resource management.
4.5.1 Physical versus Virtual Processor Cores
Wells, et al. proposed a multicore virtualization method to allow hardware designers to
get an abstraction of the low-level details of the processor cores. This technique alleviates
the burden and inefficiency of managing hardware resources by software. It is located
under the ISA and remains unmodified by the operating system or VMM (hypervisor).
Figure 8 illustrates the technique of a software-visible VCPU moving from one core to
another and temporarily suspending execution of a VCPU when there are no appropriate
cores on which it can run.
Figure 8: Multi-core virtualization method that exposes four VCPUs to the software, when only three
cores are actually present
5. Storage Virtualization
The term “storage virtualization” was widely used before the renaissance of system
virtualization. Yet the term has a different meaning in a system virtualization
environment. Previously, storage virtualization was largely used to describe the
aggregation and repartitioning of disks at very coarse time scales for use by physical
machines. In system virtualization, virtual storage includes the storage managed by
VMMs and guest OSes. Generally, the data stored in this environment can be classified
into two categories: VM images and application data. The VM images are special to
the virtual environment, while application data includes all other data which is the same
as the data in traditional OS environments.
The most important aspects of system virtualization are encapsulation and isolation.
Traditional operating systems and applications running on them can be encapsulated in
VMs. Only one operating system runs in a virtualization while many applications run in
the operating system. System virtualization allows multiple VMs to run on a physical
machine and the VMs are completely isolated. To achieve encapsulation and isolation,
both the system software and the hardware platform, such as CPUs and chipsets, are
rapidly updated. However, storage is lagging. The storage systems become the main
bottleneck of VM deployment.
In virtualization environments, a virtualization layer is inserted between the hardware and
traditional operating systems or a traditional operating system is modified to support
virtualization. This procedure complicates storage operations. On the one hand, storage
management of the guest OS performs as though it is operating in a real hard disk while
the guest OSes cannot access the hard disk directly. On the other hand, many guest OSes
contest the hard disk when many VMs are running on a single physical machine.
Therefore, storage management of the underlying VMM is much more complex than that
of guest OSes (traditional OSes).
Parallax is a distributed storage system customized for virtualization environments.
Content Addressable Storage (CAS) is a solution to reduce the total size of VM images,
and therefore supports a large set of VM-based systems in data centers. Since traditional
storage management techniques do not consider the features of storage in virtualization
environments, Parallax designs a novel architecture in which storage features that have
traditionally been implemented directly on high-end storage arrays and switchers are
relocated into a federation of storage VMs. These storage VMs share the same physical
hosts as the VMs that they serve.
Figure 9: Parallax is a set of per-host storage appliances that share access to a common block device
and presents virtual disks to client VMs.
Figure 9 provides an overview of the Parallax system architecture. It supports all popular
system virtualization techniques, such as paravirtualization and full virtualization. For
each physical machine, Parallax customizes a special storage appliance VM. The storage
appliance VM acts as a block virtualization layer between individual VMs and the
physical storage device. It provides a virtual disk for each VM on the same physical
machine.
*** Students are advised to read Text-Books also for in-depth study of above mentioned
topics for exam point-of-view ***