25-02-2013, 09:28 AM
High-Performance Cluster Computing
High-Performance.docx (Size: 550.37 KB / Downloads: 26)
INTRODUCTION
General Introduction
Parallel computing has seen many changes since the days of the highly expensive and
proprietary super computers. Changes and improvements in performance have also
been seen in the area of mainframe computing for many environments. But these
compute environments may not be the most cost effectiveand flexible solution for a
problem. Over the past decade, cluster technologies have been developed that allow
multiple low cost computers to work in a coordinated fashion to process applications.
The economics, performance and flexibility of compute clusters makes cluster
computing an attractive alternative to centralized computing models and the attendant
to cost, inflexibility, and scalability issues inherent to these models.
Many enterprises are now looking at clusters of high-performance, low cost
computers to provide increased application performance, high availability, and ease of
scaling within the data center. Interest in and deployment of computer clusters has
largely been driven by the increase in the performance of off-the-shelf commodity
computers, high-speed, low-latency network switches and the maturity of the software
components. Application performance continues to be of significant concern for
various entities including governments, military, education, scientific and now
enterprise organizations. This document provides a review of cluster computing, the
various types of clusters and their associated applications. This document is a highlevel
informational document; it does not provide details aboutvarious cluster
implementations and applications.
Cluster Computing
Cluster computing is best characterized as the integration of a number of off-the-shelf
commodity computers and resources integrated through hardware, networks, and
software to behave as a single computer. Initially, the terms cluster computing and
high performance computing were viewed as one and the same. However, the
technologies available today have redefined the term cluster computing to extend
beyond parallel computing to incorporate load-balancing clusters (for example, webclusters) and high availability clusters. Clusters may also be deployed to address load
balancing, parallel processing, systems management, and scalability. Today, clusters
are made up of commodity computers usually restricted to a single switch or group of
interconnected switches operating at Layer 2 and within a single virtual local-area
network (VLAN). Each compute node (computer) may have different characteristics
such as single processor or symmetric multiprocessor design, and access to various
types of storage devices. The underlying network is a dedicated network made up of
high-speed, low-latency switches that may be of a single switch or a hierarchy of
multiple switches.
Cluster Benefits
The main benefits of clusters are scalability, availability, and performance. For
scalability, a cluster uses the combined processing power of compute nodes to run
cluster-enabled applications such as a parallel database server at a higher performance
than a single machine can provide. Scaling the cluster's processing power is achieved
by simply adding additional nodes to the cluster. Availability within the cluster is
assured as nodes within the cluster provide backup to each other in the event of a
failure. In high-availability clusters, if a node is taken out of service or fails, the load
is transferred to another node (or nodes) within the cluster. To the user, this operation
is transparent as the applications and data running are also available on the failover
nodes. An additional benefit comes with the existence of a single system image and
the ease of manageability of the cluster. From the users perspective the users sees an
application resource as the provider of services and applications. The user does not
know or care if this resource is a single server, a cluster, or even which node within
the cluster is providing services. These benefits map to needs of today's enterprise
business, education, military and scientific community infrastructures. In summary,
clusters provide:
• Scalable capacity for compute, data, and transaction intensive applications,
including support of mixed workloads
• Horizontal and vertical scalability without downtime
• Ability to handle unexpected peaks in workload
• Central system management of a single systems image
• 24 x 7 availability.
TYPES OF CLUSTER
There are several types of clusters, each with specific design goals and functionality.
These clusters range from distributed or parallel clusters for computation intensive or
data intensive applications that are used for protein, seismic, or nuclear modeling to
simple load-balanced clusters.
High Availability or Failover Clusters
These clusters are designed to provide uninterrupted availability of data or services
(typically web services) to the end-user community. The purpose of these clusters is
to ensure that a single instance of an application is only ever running on one cluster
member at a time but if and when that cluster member is no longer available, the
application will failover to another cluster member. With a high-availability cluster,
nodes can be taken out-of-service for maintenance or repairs. Additionally, if a node
fails, the service can be restored without affecting the availability of the services
provided by the cluster (see Figure 2.1). While the application will still be available,
there will be a performance drop due to the missing node.
High-availability clusters implementations are best for mission-critical applications or
databases, mail, file and print, web, or application servers.
Parallel/Distributed Processing Clusters
Traditionally, parallel processing was performed by multiple processors in a specially
designed parallel computer. These are systems in which multiple processors share a
single memory and bus interface within a single computer. With the advent of high
speed, low-latency switching technology, computers can be interconnected to form a
parallel-processing cluster. These types of cluster increase availability, performance, and scalability for applications, particularly computationally or data intensive tasks. A
parallel cluster is a system that uses a number of nodes to simultaneously solve a
specific computational or data-mining task. Unlike the load balancing or highavailability
clusters that distributes requests/tasks to nodes where a node processes the
entire request, a parallel environment will divide the request into multiple sub-tasks
that are distributed to multiple nodes within the cluster for processing. Parallel
clusters are typically used for CPU-intensive analytical applications, such as
mathematical computation, scientific analysis (weather forecasting, seismic analysis,
etc.), and financial data analysis. One of the more common cluster operating systems
is the Beowulf class of clusters. A Beowulf cluster can be defined as a number of
systems whose collective processing capabilities are simultaneously applied to a
specific technical, scientific, or business application. Each individual computer is
referred to as a “node” and each node communicates with other nodes within a cluster
across standard Ethernet technologies (10/100 Mbps, GbE, or 10GbE).
CLUSTER OPERATION
Cluster Nodes
Node technology has migrated from the conventional tower cases to single rack-unit
multiprocessor systems and blade servers that provide a much higher processor
density within a decreased area. Processor speeds and server architectures have
increased in performance, as well as solutions that provide options for either 32-bit or
64-bit processors systems. Additionally, memory performance as well as hard-disk
access speeds and storage capacities have also increased. It is interesting to note that
even though performance is growing exponentially in some cases, the cost of these
technologies has dropped considerably. As shown in Figure 4.1 below, node
participation in the cluster falls into one of two responsibilities: master (or head) node
and compute (or slave) nodes. The master node is the unique server in cluster systems.
It is responsible for running the file system and also serves as the key system for
clustering middleware to route processes, duties, and monitor the health and status of
each slave node. A compute (or slave) node within a cluster provides the cluster a
computing and data storage capability. These nodes are derived from fully
operational, standalone computers that are typically marketed as desktop or server
systems that, as such, are off-the-shelf commodity systems.
Compute Intensive Applications
Compute intensive is a term that applies to any computer application that demands a
lot of computation cycles (for example, scientific applications such as meteorological
prediction). These types of applications are very sensitive to end-to-end message
latency. This latency sensitivity is caused by either the processors having to wait for
instruction messages, or if transmitting results data between nodes takes longer. In
general, the more time spent idle waiting for an instruction or for results data, the
longer it takes to complete the application.
Some compute-intensive applications may also be graphic intensive. Graphic
intensive is a term that applies to any application that demands a lot of computational
cycles where the end result is the delivery of significant information for the
development of graphical output such as ray-tracing applications.
These types of applications are also sensitive to end-to-end message latency. The
longer the processors have to wait for instruction messages or the longer it takes to
send resulting data, the longer it takes to present the graphical representation of the
resulting data.
Message Latency
Message latency is defined as the time it takes to send a zero-length message from
one processor to another (measured in microseconds). The lower the latency for some
application types, the better.
Message latency is made up of aggregate latency incurred at each element within the
cluster network, including within the cluster nodes themselves (see Figure 4.4.1).
Although network latency is often focused on, the protocol processing latency of
message passing interface (MPI) and TCP processes within the host itself are typically
larger. Throughput of today's cluster nodes are impacted by protocol processing, both
for TCP/IP processing and the MPI. To maintain cluster stability, node
synchronization, and data sharing, the cluster uses message passing technologies such
as Parallel Virtual Machine (PVM) or MPI. TCP/IP stack processing is a CPUintensive
task that limits performance within high speed networks. As CPU
performance has increased and new techniques such as TCP offload engines (TOE)
have been introduced, PCs are now able to drive the bandwidth levels higher to a
point where we see traffic levels reaching near theoretical maximum for TCP/IP on
Gigabit Ethernet and near bus speeds for PCI-X based systems when using 10 Gigabit
Ethernet. These high-bandwidth capabilities will continue to grow as processor speeds
increase and more vendors build network adapters to the PCI-Express specification.