Cloud-Enabling Technologies Overview
Cloud-Enabling Technologies Overview
Module 3
Cloud-Enabling Technologies
• It likely refers to the set of technologies that make cloud computing
possible.
• These are tools and methods that allow cloud services to work
smoothly.
• They help in building, running, and managing cloud systems.
1. Broadband Networks and Internet Architecture
3. Web technology
4. Multitenant technology
5. Service technology
Broadband Networks and Internet
Architecture
• Broadband networks and robust internet architecture are fundamental to cloud
computing.
• They provide the necessary connectivity and bandwidth to access cloud services from
anywhere in the world.
• High-speed internet connections ensure that data can be transferred quickly and
efficiently between users and cloud servers.
• End-to-end (sender-receiver pair) data flows are divided into packets of a limited size that
are received and processed through network switches and routers, then queued and
forwarded from one intermediary node to the next.
• Each packet carries the necessary location information, such as the Internet Protocol (IP)
or Media Access Control (MAC) address, to be processed and routed at every source,
intermediary, and destination node.
Router-Based Interconnectivity
• A router is a device that is connected to multiple networks through which it
forwards packets.
• Even when successive packets are part of the same data flow, routers process and
forward each packet individually while maintaining the network topology
information that locates the next node on the communication path between the
source and destination nodes.
• Routers manage network traffic and gauge the most efficient hop for packet
delivery, since they are privy to both the packet source and packet destination.
• The communication path that connects a cloud consumer with its cloud provider
may involve multiple ISP networks.
• The Internet’s mesh structure connects Internet hosts (endpoint systems) using
multiple alternative network routes that are determined at runtime.
• Communication can therefore be sustained even during simultaneous network
failures, although using multiple network paths can cause routing fluctuations
and latency.
Technical and Business Considerations
Connectivity Issues
• In traditional, on-premise deployment models, enterprise applications and
various IT solutions are commonly hosted on centralized servers and storage
devices residing in the organization’s own data center.
• End-user devices, such as smartphones and laptops, access the data center
through the corporate network, which provides uninterrupted Internet
connectivity.
• TCP/IP facilitates both Internet access and on-premise data exchange over LANs
• Organizations using this deployment model can directly access the network
traffic to and from the Internet and usually have complete control over and can
safeguard their corporate networks using firewalls and monitoring software.
• These organizations also assume the responsibility of deploying, operating, and
maintaining their IT resources and Internet connectivity.
Fig: The internetworking architecture of a private cloud.
Table: A
comparison of on-
premise and cloud-
based
internetworking
Network Bandwidth and Latency Issues
• End-to-End bandwidth is determined by the transmission capacity of the shared data
links that connect intermediary nodes.
• This type of bandwidth is constantly increasing, as Web acceleration technologies, such
as dynamic caching, compression, and pre-fetching, continue to improve end-user
connectivity.
• Latency is the amount of time it takes a packet to travel from one data node to another.
• Latency increases with every intermediary node on the data packet’s path.
• Transmission queues in the network infrastructure can result in heavy load conditions
that also increase network latency.
• Packet networks with “best effort” quality-of-service (QoS) typically transmit packets
on a first- come/first serve basis.
• Data flows that use congested network paths suffer service-level degradation in the
form of bandwidth reduction, latency increase, or packet loss when traffic is not
prioritized.
• The nature of packet switching allows data packets to choose routes dynamically as
they travel through the Internet’s network infrastructure.
Cloud Carrier and Cloud Provider Selection
Automation
• Data centers have specialized platforms that automate tasks like
provisioning, configuration, patching, and monitoring without supervision.
• Advances in data center management platforms and tools leverage
autonomic computing technologies to enable self-configuration and self-
recovery.
Remote Operation and Management
• Most of the operational and administrative tasks of IT resources in data
centers are commanded through the network’s remote consoles and
management systems.
• Technical personnel are not required to visit the dedicated rooms that house
servers, except to perform highly specific tasks, such as equipment handling
and cabling or hardware-level installation and maintenance.
High Availability
• Since any form of data center outage significantly impacts business continuity
for the organizations that use their services, data centers are designed to
operate with increasingly higher levels of redundancy to sustain availability.
• Data centers usually have redundant, uninterruptable power supplies, cabling,
and environmental control subsystems in anticipation of system failure, along
with communication links and clustered hardware for load balancing.
Security-Aware Design, Operation, and
Management
• Requirements for security, such as physical and logical access controls and data
recovery strategies, need to be thorough and comprehensive for data centers,
since they are centralized structures that store and process business data.
Computing Hardware
• Much of the heavy processing in data centers is often executed by
standardized commodity servers that have substantial computing power and
storage capacity.
• Several computing hardware technologies are integrated into these modular
servers, such as:
rackmount form factor server design composed of standardized racks with
interconnects for power, network, and internal cooling
support for different hardware processing architectures, such as x86-32bits, x86-64,
and RISC
a power-efficient multi-core CPU architecture that houses hundreds of processing cores
in a space as small as a single unit of standardized racks
redundant and hot-swappable components, such as hard disks, power supplies,
network interfaces, and storage controller cards
Storage system
• Data centers have specialized storage systems that maintain enormous amounts of
digital information in order to fulfill considerable storage capacity needs.
• Storage systems usually involve the following technologies:
Hard Disk Arrays – These arrays inherently divide and replicate data among
multiple physical drives, and increase performance and redundancy by including
spare disks.
I/O Caching – This is generally performed through hard disk array controllers,
which enhance disk access times and performance by data caching.
Hot-Swappable Hard Disks – These can be safely removed from arrays without
requiring prior powering down.
Storage Virtualization – This is realized through the use of virtualized hard disks
and storage sharing.
Fast Data Replication Mechanisms – These include snapshotting, which is saving a
virtual machine’s memory into a hypervisor-readable file for future reloading, and
volume cloning, which is copying virtual or physical hard disk volumes and
partitions.
• Networked storage devices usually fall into one of the following categories:
Storage Area Network (SAN) – A high-speed, dedicated network that connects
servers to block-level storage devices.
Network-Attached Storage (NAS) – A file level storage device connected to a
regular IP network, accessible by multiple clients over the network. Appears
to the client as a shared folder rather than a local disk.
NAS, SAN, and other more advanced storage system options provide fault
tolerance in many components through controller redundancy, cooling
redundancy, and hard disk arrays that use RAID storage technology.
Network Hardware
• Data centers require extensive network hardware in order to enable multiple
levels of connectivity.
Carrier and External Networks
Interconnection
• How a data center connects to outside networks using carriers.
• It ensures that data centers can send and receive data from the internet,
other data centers and clients reliably and efficiently.
LAN Fabric
• The network infrastructure within a data center.
• General data traffic inside the data center.
SAN Fabric
• Dedicated storage network connecting servers to storage devices.
Web Technology
• Computers communicate with each others using markup languages and multimedia
packages.
• The World Wide Web is a system of interlinked IT resources that are accessed through
the Internet.
• The two basic components of the Web are the Web browser client and the Web server.
• Other components, such as proxies, caching services, gateways, and load balancers, are
used to improve Web application characteristics such as scalability and security.
• These additional components reside in a layered architecture that is positioned between
the client and the server.
• Three fundamental elements comprise the technology architecture of the
Web:
Uniform Resource Locator (URL) – It is the address of a resource on the
internet. A standard syntax used for creating identifiers that point to Web
based resources, the URL is often structured using a logical network location.
Hypertext Transfer Protocol (HTTP) – Communication protocol used between
the web browser and a web server to exchange information.
Markup Languages (HTML, XML) – Markup languages provide a lightweight
means of expressing Web- centric data and metadata. The two primary
markup languages are HTML (which is used to express the presentation of
Web pages) and XML (which allows for the definition of vocabularies used to
associate meaning to Web-based data via metadata).
• For example, a Web browser can request to execute an action like read,
write, update, or delete on a Web resource on the Internet, and
proceed to identify and locate the Web resource through its URL.
• The request is sent using HTTP to the resource host, which is also
identified by a URL.
• The Web server locates the Web resource and performs the requested
operation, which is followed by a response being sent back to the client.
• The response may be comprised of content that includes HTML and
XML statements.
• Web resources are represented as hypermedia as opposed to hypertext,
meaning media such as graphics, audio, video, plain text, and URLs can
be referenced collectively in a single document.
• Some types of hypermedia resources cannot be rendered without
additional software or Web browser plug-ins.
Web Applications
• A distributed application that uses Web-based technologies (and
generally relies on Web browsers for the presentation of user-
interfaces) is typically considered a Web application.
• These applications can be found in all kinds of cloud-based
environments due to their high accessibility.
• Figure presents a common architectural abstraction for Web
applications that is based on the basic three tier model.
• The first tier is called the presentation layer, which represents the
user-interface.
• The middle tier is the application layer that implements application
logic, while the third tier is the data layer that is comprised of
persistent data stores.
• The presentation layer has components on both the client and server-side. Web
servers receive client requests and retrieve requested resources directly as static
Web content and indirectly as dynamic Web content, which is generated
according to the application logic.
• Web servers interact with application servers in order to execute the requested
application logic, which then typically involves interaction with one or more
underlying databases.
• PaaS ready-made environments enable cloud consumers to develop and deploy
Web applications. Typical PaaS offerings have separate instances of the Web
server, application server, and data storage server environments.
Multitenant Technology
REST Services
Service agents
Service middleware
Web Services
• Web services include Web Service Description Language (WSDL), XML Schema Definition
Language (XML Schema), Simple Object Access Protocol (SOAP), Universal Description,
Discovery and Integration (UDDI).
• WSDL- describe web services. Acts like a contract between the service provider and the
client.
• XML Schema- describes the structure of an XML document. Messages exchanged by web
services must be expressed using XML
• SOAP- Protocol that allows applications to communicate over the internet using XML
messages.
• UDDI- Acts as a registry to publish and discover web services.
REST Services
• REST- Representational State Transfer
• It is an architectural style for designing web services.
• Use HTTP protocol to allow system to communicate.
• Six REST design constraints are – Client-Server, Stateless, Cache, Interface/Uniform
Service Agents
• Are event driven programs designed to intercept messages at runtime. There
are active and passive service agents.
• Active service agents perform an action upon intercepting and reading the
contents of a message passive on the other hand do not change message
contents
Service Middleware
• A bridge that connects different applications or services so they can work
together.
• It helps in integration- allows different systems to communicate and share
data smoothly.
• 2 common middleware platforms are Enterprise service bus (ESB) and
Orchestration platform
Resource Provisioning Techniques
• Physical resources can be assigned to the VMs using two types of provisioning
approaches like static and dynamic.
• In static approach, VMs are created with specific volume of resources and the
capacity of the VM does not change in its lifetime.
• In dynamic approach, the resource capacity per VM can be adjusted dynamically to
match work-load fluctuations.
Static Approach
• Static provisioning is suitable for applications which have predictable and generally
unchanging workload demands.
• In this approach, once a VM is created it is expected to run for long time without
incurring any further resource allocation decision overhead on the system.
• Resource-allocation decision is taken only once and that too at the beginning when
user’s application starts running.
• Thus, this approach provides room for a little more time to take decision regarding
resource allocation since that does not impact negatively on the performance of the
system.
• This provisioning approach fails to deal with un-anticipated changes in resource
demands.
• When resource demand crosses the limit specified in SLA document it causes trouble
for the consumers.
• Again from provider’s point of view, some resources remain unutilized forever since
provider arranges for sufficient volume of resources to avoid SLA violation.
• So this method has drawback from the viewpoint of both provider as well as for
consumer.
Dynamic Approach
• With dynamic provisioning, the resources are allocated and de-allocated as per
requirement during run-time.
• This on-demand resource provisioning provides elasticity to the system.
• Providers no more need to keep a certain volume of resources unutilized for each
and every system separately, rather they maintain a common resource pool and
allocate resources from that when it is required.
• Resources are removed from VMs when they are no more required and returned
to the pool.
• With this dynamic approach, the processes of billing also become as pay-per-
usage basis.
• Dynamic provisioning technique is more appropriate for cloud computing where
application’s demand for resources is most likely to change or vary during the
execution.
• But this provisioning approach needs the ability of integrating newly-acquired
resources into the existing infrastructure.
• This gives provisioning elasticity to the system.
• Dynamic provisioning allows system to adapt in changed conditions at the cost of
bearing run- time resource allocation decision overhead.
• This overhead leads some amount of delay in system but this can be minimized
by putting upper limit on the complexity of provisioning algorithms.
Comparison between static and
dynamic approaches
Open Cloud Services
• Open-source cloud community generally focusses on private cloud computing arena.
1. Eucalyptus
Eucalyptus is an open-source Infrastructure-as-a-Service (IaaS) facility for building private or hybrid cloud
computing environment.
It is a linux- based development that enables cloud features while having installed over distributed
computing resources.
The name ‘Eucalyptus’ is an acronym for ‘Elastic Utility Computing Architecture for Linking Your Programs To
Useful Systems’.
Eucalyptus started as a research project at the University of California, United States and the company
‘Eucalyptus Systems’ was formed in the year of 2009 in order to support the commercialization of the
Eucalyptus cloud.
In the same year, the Ubuntu 9.04 distribution of Linux OS was included Eucalyptus software into it.
Eucalyptus Systems went into an agreement with Amazon during March 2012, which allowed them to make
it compatible with Amazon Cloud.
This permits transferring of instances between Eucalyptus private cloud and Amazon public cloud making
them a combination for building hybrid cloud environment.
Such interoperable pairing allows application developers to maintain the private cloud part (deployed as
Eucalyptus) as a sandbox for executing prominent codes.
Eucalyptus also offers a storage cloud API emulating as Amazon’s storage service (Amazon S3) API.
2. OpenNebula
• OpenNebula is an open-source Infrastructure-as–a-Service (IaaS) implementation for building public, private and
hybrid clouds.
• Nebula is a Latin word which means as ‘cloud’. OpenNebula started as a research project in the year of 2005 and its
first release was made during March 2008.
• By March 2010, the prime authors of OpenNebula founded C12G Labs with the aim of providing value-added
professional services to OpenNebula and the cloud is currently managed by them.
• OpenNebula is freely available, subject to the requirements of the Apache License version 2. Like Eucalyptus,
OpenNebula is also compatible with Amazon cloud.
• Consequently, the distributions of Ubuntu and Red Hat Enterprise later included OpenNebula integrating into
them.
3. Nimbus
• Nimbus is an open-source IaaS cloud solution compatible with Amazon’s cloud services.
• It was developed at University of Chicago in United States and implemented the Amazon cloud’s APIs.
• The solution was specifically developed to support the scientific community.
• The Nimbus project has been created by an international collaboration of open-source contributors and
institutions.
• Nimbus code is licensed under the terms of the Apache License version 2.
4. OpenStack
• OpenStack is another free and open-source IaaS solution.
• In July 2010, U.S.-based IaaS cloud service provider [Link] and NASA jointly launched
the initiative for an open-source cloud solution called ‘OpenStack’ to produce a ubiquitous IaaS
solution for public and private clouds.
• NASA donated some parts of the Nebula Cloud Platform technology that it developed. Since
then, more than 200 companies (including AT&T, AMD, Dell, Cisco, HP, IBM, Oracle, Red Hat)
have contributed in the project.
• The project was later taken over and promoted by the OpenStack Foundation, a non-profit
organization founded in 2012 for promoting OpenStack solution. All the code of OpenStack is
freely available under the Apache 2.0 license.
5. Apache CloudStack
• Apache CloudStack is another open-source IaaS cloud solution. CloudStack was initially
developed by [Link], a software company based in California (United States) which was
later acquired by Citrix Systems, another USA based software firm, during 2011.
• By next year, Citrix Systems handed it over to the Apache Software Foundation and soon after
this CloudStack made its first stable release.
• In addition to its own APIs, CloudStack also supported AWS (Amazon Web Services) APIs which
facilitated hybrid cloud deployment.
Eucalyptus Architecture
Components of Architecture
• Node Controller is the lifecycle of instances running
on each node. Interacts with the operating system,
hypervisor, and Cluster Controller. It controls the
working of VM instances on the host machine.
• Cluster Controller manages one or more Node
Controller and Cloud Controller simultaneously. It
gathers information and schedules VM execution.
• Storage Controller (Walrus) Allows the creation of
snapshots of volumes. Persistent block storage over
VM instances. Walrus Storage Controller is a simple
file storage system. It stores images and snapshots.
Stores and serves files using S3(Simple Storage
Service) APIs.
• Cloud Controller Front-end for the entire
architecture. It acts as a Complaint Web Services to
client tools on one side and interacts with the rest of
the components on the other side.
Important Features are:-
Images: A good example is the Eucalyptus Operation Modes Of Eucalyptus
Managed Mode: Numerous security groups to
Machine Image which is a module software
users as the network is large. Each security group
bundled and uploaded to the Cloud.
is assigned a set or a subset of IP addresses.
Instances: When we run the picture and utilize it,
Ingress rules are applied through the security
it turns into an instance. groups specified by the user. The network is
Networking: It can be further subdivided into isolated by VLAN between Cluster Controller and
three modes: Static mode(allocates IP address to Node Controller. Assigns two IP addresses on
instances), System mode (assigns a MAC address each virtual machine.
and imputes the instance’s network interface to Managed (No VLAN) Node: The root user on the
the physical network via NC), and Managed mode virtual machine can snoop into other virtual
machines running on the same network layer. It
(achieves local network of instances).
Access Control: It is utilized to give limitations to does not provide VM network isolation.
System Mode: Simplest of all modes, least
clients. number of features. A MAC address is assigned to
Elastic Block Storage: It gives block-level storage a virtual machine instance and attached to Node
volumes to connect to an instance. Controller's bridge Ethernet device.
Auto-scaling and Load Adjusting: It is utilized to Static Mode: Similar to system mode but has
make or obliterate cases or administrations more control over the assignment of IP address.
dependent on necessities. MAC address/IP address pair is mapped to static
entry within the DHCP server. The next set of
MAC/IP addresses is mapped.
Open-Nebula Architecture
Open-Nebula architecture
• To control a VM’s life cycle, the Open-Nebula core coordinates with the following three areas of
management:
1) Image and storage technologies — to prepare disk images
2) The network fabric — to provide the virtual network environment
3) Hypervisors — to create and control VMs
Components of Open-Nebula
• Based on the existing infrastructure, Open-Nebula provides various services and resources. You can view
the components in Figure 3.
• APIs and interfaces: These are used to manage and monitor Open-Nebula components. To manage
physical and virtual resources, they work as an interface.
• Users and groups: These support authentication, and authorize individual users and groups with the
individual permissions.
• Hosts and VM resources: These are a key aspect of a heterogeneous cloud that is managed and
monitored, e.g., Xen, VMware.
• Storage components: These are the basis for centralized or decentralized template repositories.
• Network components: These can be managed flexibly. Naturally, there is support for VLANs and Open
vSwitch.
Nimbus
• Workspace Service- Allows clients to manage and administer VMs.
• Workspace Resource Manager- Implements VM instance creation on a site and
management.
• Workspace Pilot- Provides virtualization with significant changes to the site
configurations.
• Workspace Control- Implements VM instance management such as start,stop and
pause VM. It also provides image management and setup networks and provide IP
assignment.
Parallel Computing and
Programming Paradigms
Before taking a toll on Parallel Computing, first, let's take a look at the background of
computations of computer software and why it failed for the modern era.
Computer software was written conventionally for serial computing. This meant
that to solve a problem, an algorithm divides the problem into smaller instructions.
These discrete instructions are then executed on the Central Processing Unit of a
computer one by one.
Only after one instruction is finished, next one starts.
Parallel Computing
• It is the use of multiple processing elements simultaneously for solving any
problem.
• Problems are broken down into instructions and are solved concurrently as
each resource that has been applied to work is working at the same time.
• Advantages of Parallel Computing over Serial Computing are as follows:
It saves time and money as many resources working together will reduce the
time and cut potential costs.
It can be impractical to solve larger problems on Serial Computing.
It can take advantage of non-local resources when the local resources are
finite.
Serial Computing 'wastes' the potential computing power, thus Parallel
Computing makes better work of the hardware.
• As a simple illustration of the Map and Reduce functions, Figure shows the pseudo-
code and the algorithm and illustrates the process steps using the widely used
“Wordcount” example.
• The Wordcount application counts the number of occurrences of each word in a
large collection of documents.
• The steps of the process are briefly described as follows: The input is read (typically
from a distributed file system) and broken up into key/value pairs (e.g., the Map
function emits a word and its associated count of occurrence, which is just “1”).
• The pairs are partitioned into groups for processing, and they are sorted according
to their key as they arrive for reduction.
• Finally, the key/value pairs are reduced, once for each unique key in the sorted list,
to produce a combined result (e.g., the Reduce function sums all the counts emitted
for a particular word).
Hadoop Library from Apache
• Hadoop is an open-source implementation of MapReduce coded in
Java by Apache.
• This implementation uses the Hadoop Distributed File System (HDFS)
as its underlying layer.
• It has two fundamental layers: the MapReduce engine and HDFS.
• The MapReduce engine is the computation engine running on top of
HDFS as its data storage manager.
• HDFS: It is a distributed file system that organizes files and stores their
data on a distributed computing system.
Hadoop’s Architecture