23-03-2011, 10:44 AM
cluster 573.doc (Size: 106 KB / Downloads: 87)
Introduction
What are Clusters?
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers co - operatively working together as a single, integrated computing resource.
This cluster of computers shares common network characteristics like the same namespace and it is available to other computers on the network as a single resource. These computers are linked together using high-speed network interfaces between themselves and the actual binding together of the all the individual computers in the cluster is performed by the operating system and the software used.
What is a Beowulf Cluster?
It's a kind of high-performance massively parallel computer built primarily out of commodity hardware components, running a free-software operating system like Linux or Free BSD, interconnected by a private high-speed network.
Motivation For Clustering
High cost of ‘traditional’ High Performance Computing.
Clustering using Commercial Off The Shelf (COTS) is way cheaper than buying specialized machines for computing. Cluster computing has emerged as a result of the convergence of several trends, including the availability of inexpensive high performance microprocessors and high-speed networks, and the development of standard software tools for high performance distributed computing.
Increased need for High Performance Computing
As processing power becomes available, applications which require enormous amount of processing, like weather modeling are becoming more common place requiring the high performance computing provided by Clusters
Thus the viable alternative to this problem is
“Building Your Own Cluster”, which is what Cluster Computing is all about.
Components of a Cluster
The main components of a cluster are the Personal Computer and the interconnection network. The computer can be built out of Commercial off the shelf components (COTS) and is available economically.
The interconnection network can be either an ATM ring (Asynchronous Transfer Mode) which guarantees a fast and effective connection, or a Fast Ethernet connection which is commonly available now. Gigabit Ethernet which provides speeds up to 1000Mbps,or Myrinet a commercial interconnection network with high speed and reduced latency are viable options.
But for high-end scientific clustering , there are a variety of network interface cards designed specifically for clustering.
Those include Myricom's Myrinet, Giganet's cLAN and the IEEE 1596 standard Scalable Coherent Interface (SCI). Those cards' function is not only to provide high bandwidth between the nodes of the cluster but also to reduce the latency (the time it takes to send messages). Those latencies are crucial to exchanging state information between the nodes to keep their operations synchronized.
Clusters classified according to their use
Types of clustering
The three most common types of clusters include high-performance scientific clusters, load-balancing clusters, and high-availability clusters.
Scientific clusters
The first type typically involves developing parallel programming applications for a cluster to solve complex scientific problems. That is the essence of parallel computing, although it does not use specialized parallel supercomputers that internally consist of between tens and tens of thousands of separate processors. Instead, it uses commodity systems such as a group of single- or dual-processor PCs linked via high-speed connections and communicating over a common messaging layer to run those parallel applications. Thus, every so often, you hear about another cheap Linux supercomputer coming out. But that is actually a cluster of computers with the equivalent processing power of a real supercomputer, and it usually runs over $100,000 for a decent cluster configuration. That may seem high for the average person but is still cheap compared to a multimillion- and thus the traffic needs to be sent to network server applications running on other nodes. That can also be optimized according to the different resources available on each node or the particular environment of the network.
High-availability clusters
High-availability clusters exist to keep the overall services of the cluster available as much as possible.to take into account the computing hardware and software. As the primary node in a high-availability cluster fails, it is replaced by a secondary node that has been waiting for that moment. That secondary node is usually a mirror image of the primary node, so that when it does replace the primary, it can completely take over its identity and thus keep the system environment consistent from the user's point of view.
With each of those three basic types of clusters, hybrids and interbreeding often occur between them. Thus you can find a high-availability cluster that can also load-balance users across its nodes, while still attempting to maintain a degree of high-availability. Similarly, you can find a parallel cluster that can also perform load balancing between the nodes separately from what was programmed into the application. Although the clustering system itself is independent of what software or hardware is in use, hardware connections play a pivotal role when it comes to running the system efficiently
Cluster Classification according to Architecture
Clusters can be basically classified into two
o Close Clusters
o Open Clusters
Close Clusters
They hide most of the cluster behind the gateway node. Consequently they need less IP addresses and provide better security. They are good for computing tasks.
Open Clusters
All nodes can be seen from outside,and hence they need more IPs, and cause more security concern .But they are more flexible and are used for server task.
Beowulf Cluster
Basically, the Beowulf architecture is a multi-computer architecture that is used for parallel computation applications. Therefore, Beowulf clusters are primarily meant only for processor-intensive and number-crunching applications and definitely not for storage applications. Primarily, a Beowulf cluster consists of a server computer that controls the functioning of many client nodes that are connected together with Ethernet or any other network comprising of a network of switches or hubs. One good feature of Beowulf is that all the system's components are available from off-the-shelf component and there is no special hardware that is required to implement it. It also uses commodity software - most often Linux - and other commonly available components like Parallel Virtual Machine (PVM) and Messaging Passing Interface (MPI).
Besides serving all the client nodes in the Beowulf cluster, the server node also acts as a gateway to external users and passes files to the Beowulf system. The server is also used to drive the console of the system from where the various parameters and configuration can be monitored. In some cases, especially in very large Beowulf configurations, there is sometimes more than one server node with other specialized nodes that perform tasks like monitoring stations and additional consoles. In disk-less configurations, very often, the individual client nodes do not even know their own addresses until the server node informs them.
The major difference between the Beowulf clustering system and the more commonly implemented Cluster of Workstations (CoW) is the fact that Beowulf systems tend to appear as an entire unit to the external world and not as individual workstations. In most cases, the individual workstations do not even have a keyboard, mouse or monitor and are accessed only by remote login or through a console terminal. In fact, a Beowulf node can be conceptualized as a CPU+memory package that can be plugged into the Beowulf system - much like would be done with a motherboard.
It's important to realize that Beowulf is not a specific set of components or a networking topology or even a specialized kernel. Instead, it's simply a technology for clustering together Linux computers to form a parallel, virtual supercomputer.
Technicalities in the design of Cluster
Homogeneous and Heterogeneous Clusters.
The cluster can either be made of homogeneous machines, machines that have the same hardware and software configurations or as a heterogeneous cluster with machines of different configuration. Heterogeneous clusters face problems of different performance profiles, software configuration management.