DATA-CENTER DESIGN AND INTERCONNECTION NETWORKS
A data center is often built with a large number of servers through a huge interconnection network.
Depending upon their size, they can be divided into two categories: large-scale data centers which
requires acres of land and small modular data centers that can be housed in a 40-ft truck container.
Warehouse-Scale Data-Center Design
Dennis Gannon claims: “The cloud is built on massive datacenters”. a data center that is as large as
a shopping mall (11 times the size of a football field) under one roof. Such a data center can house
400,000 to 1 million servers. A small data center could have 1,000 servers. The data centers are
built economics of scale - meaning lower unit cost for larger data centers. The larger the data center,
the lower the operational cost. The network cost to operate a small data center is about seven times
greater and the storage cost is 5.7 times greater.
Data-Center Construction Requirements
Most data centers are built with commercially available components. An off-the-shelf
server consists of a number of processor sockets, each with a multicore CPU and its internal
cache hierarchy, local shared and coherent DRAM, and a number of directly attached disk
drives. The DRAM and disk resources within the rack are accessible through first-level rack
switches and all resources in all racks are accessible via a cluster-level switch.
Consider a data center built with 2,000 servers, each with 8 GB of DRAM and four 1 TB disk
drives. Each group of 40 servers is connected through a 1 Gbps link to a rack-level switch that has
an additional eight 1 Gbps ports used for connecting the rack to the cluster-level [Link]
bandwidth available from local disks is 200 MB/s, whereas the bandwidth from off-rack disks is 25
MB/s via shared rack uplinks. The total disk storage in the cluster is almost 10 million times larger
than local DRAM.
A large application must deal with large discrepancies in latency, bandwidth, and capacity. In a very
large-scale data center, components are relatively cheaper. The components used in data centers are
very different from those in building supercomputer systems. With a scale of thousands of servers,
concurrent failure, either hardware failure or software failure, of 1 percent of nodes is common.
Many failures can happen in hardware; for example, CPU failure, disk I/O failure, and network
failure. It is even quite possible that the whole data center does not work in the case of a power
crash. Also, some failures are brought on by software. The service and data should not be lost in a
failure situation. Reliability can be achieved by redundant hardware. The software must keep
multiple copies of data in different locations and keep the data accessible while facing hardware or
software errors.
Cooling System of a Data-Center Room
The data-center room has raised floors for hiding cables, power lines, and cooling supplies. The
cooling system is somewhat simpler than the power system. The raised floor has a steel grid resting
on stanchions about 2–4 ft above the concrete floor.
The under-floor area is often used to route power cables to racks, but its primary use is to distribute
cool air to the server rack. The CRAC (computer room air conditioning) unit pressurizes the raised
floor plenum by blowing cold air into the plenum.
The cold air escapes from the plenum through perforated tiles that are placed in front of server
racks. Racks are arranged in long aisles that alternate between cold aisles and hot aisles to avoid
mixing hot and cold air. The hot air produced by the servers circulates back to the intakes of the
CRAC units that cool it and then exhaust the cool air into the raised floor plenum again.
Data-Center Interconnection Networks
A critical core design of a data center is the interconnection network among all servers in the
datacenter cluster. This network design must meet five special requirements: low latency, high
bandwidth, low cost, message-passing interface (MPI) communication support, and fault tolerance.
Network Expandability
The interconnection network should be expandable. With thousands or even hundreds of
thousands of server nodes, the cluster network interconnection should be allowed to expand once
more servers are added to the data center. the network should be designed to support load
balancing and data movement among the servers. The topology of the interconnection should
avoid such bottlenecks.
Fault Tolerance and Graceful Degradation
The interconnection network should provide some mechanism to tolerate link or switch
failures. In addition, multiple paths should be established between any two server nodes in a data
center. Fault tolerance of servers is achieved by replicating data and computing among
redundant servers. Both software and hardware network redundancy apply to cope with potential
failures. One the software side, the software layer should be aware of network failures.
In case of failures, the network structure should degrade gracefully amid limited node failures.
There should be no critical paths or critical points which may become a single point of failure that
pulls down the entire system. The network structure is often divided into two layers. The lower
layer is close to the end servers, and the upper layer establishes the backbone connections among
the server groups or sub-clusters.
Switch-centric Data-Center Design
At the time of this writing, there are two approaches to building data-center-scale
networks: One is switch centric and the other is server-centric. In a switch-centric network, the
switches are used to connect the server nodes. The switch-centric design does not affect the
server side. No modifications to the servers are needed. The server-centric design does modifythe
operating system running on the servers. Special drivers are designed for relaying the traffic.
Modular Data Center in Shipping Containers
A modern data center is structured as a shipyard of server clusters housed in truck-towed
containers. Inside the container, hundreds of blade servers are housed in racks surrounding the
container walls. An array of fans forces the heated air generated by the server racks to
go through a heat exchanger, which cools the air for the next rack on a continuous loop.
Container Data-Center Construction
The data-center module is housed in a truck-towable container. The modular container
design includes the network, computer, storage, and cooling gear. The container must be designed to
be weatherproof and easy to transport. Modular datacenter construction and testing may take a few
days to complete if all components are available and power and water supplies are handy. The
modular data-center approach supports many cloud service applications.
Interconnection of Modular Data Centers
Container-based data-center modules are meant for construction of even larger data centers using a
farm of container modules. Data-Center Management Issues
Data-Center Management Issues
Here are basic requirements for managing the resources of a data center. These suggestions have
resulted from the design and operational experiences of many data centers in the IT and service
industries.
1. Making common users happy The data center should be designed to provide quality
service to the majority of users for at least 30 years.
2. Controlled information flow Information flow should be streamlined. Sustained services
and high availability (HA) are the primary goals.
3. Multiuser manageability The system must be managed to support all functions of a data
center, including traffic flow, database updating, and server maintenance.
4. Scalability to prepare for database growth The system should allow growth as workload
increases. The storage, processing, I/O, power, and cooling subsystems should be scalable.
5. Reliability in virtualized infrastructure Failover, fault tolerance, and VM live migration
should be integrated to enable recovery of critical applications from failures or disasters.
6. Low cost to both users and providers The cost to users and providers of the cloud system
built over the data centers should be reduced, including all operational costs.
7. Security enforcement and data protection Data privacy and security defense mechanisms
must be deployed to protect the data center against network attacks and system interrupts
and to maintain data integrity from user abuses or network attacks.
8. Green information technology Saving power consumption and upgrading energy
efficiency are in high demand when designing and operating current and future data centers.