31-05-2013, 01:03 PM
Linux Virtual Server for Scalable Network Services
Linux Virtual.pdf (Size: 229.59 KB / Downloads: 11)
Abstract
This paper describes the motivation, design, internal implementation
of Linux Virtual Server. The goal of Linux
Virtual Server is to provide a basic framework to build
highly scalable and highly available network services using
a large cluster of commodity servers. The TCP/IP
stack of Linux kernel is extended to support three IP load
balancing techniques, which can make parallel services
of different kinds of server clusters to appear as a service
on a single IP address. Scalability is achieved by
transparently adding or removing a node in the cluster,
and high availability is provided by detecting node or
daemon failures and reconfiguring the system appropriately.
Introduction
With the explosive growth of the Internet, Internet
servers must cope with greater demands than ever. The
potential number of clients that a server must support
has dramatically increased, some hot sites have already
received hundreds of thousands of simultaneous client
connections. With the increasing number of users and
the increasing workload, companies often worry about
how their systems grow over time. Furthermore, rapid
response and 24x7 availability are mandatory requirements
for the mission-critical business applications, as
sites compete for offering users the best access experience.
System Architecture Overview
In this section we present a system architecture for building
highly scalable and highly available network services
on clusters. The three-tier architecture of LVS illustrated
in Figure 1 includes:
Load balancer, is the front end to the service as
seen by the outside world. The load balancer directs
network connections from clients who know
a single IP address for services, to a set of servers
that actually perform the work.
Server pool, consits of a cluster of servers that implement
the autual services, such as web, ftp, mail,
dns, and so on.
IP Load Balancing Techniques
Since the IP load balancing techniques have good scalability,
we patch the Linux kernel (2.0 and 2.2) to
support three IP load balancing techniques, LVS/NAT,
LVS/TUN and LVS/DR. The box running Linux Virtual
Server act as a load balancer of network connections
from clients who know a single IP address for a service,
to a set of servers that actually perform the work. In general,
real servers are idential, they run the same service
and they have the same set of contents. The contents
are either replicated on each server’s local disk, shared
on a network file system, or served by a distributed file
system. We call data communication between a client’s
socket and a server’s socket connection, no matter it
talks TCP or UDP protocol. The following subsections
describe the working principles of three techniques and
their advantages and disadvantages.
Linux Virtual Server via NAT
Due to the shortage of IP address in IPv4 and some security
reasons, more and more networks use private IP
addresses which cannot be used on the Internet. The
need for network address translation arises when hosts
in internal networks want to access or to be accessed on
the Internet. Network address translation relies on the
fact that the headers of packets can be adjusted appropriately
so that clients believe they are contacting one IP
address, but servers at different IP addresses believe they
are contacted directly by the clients. This feature can be
used to build a virtual server, i.e. parallel services at the
different IP addresses can appear as a virtual service on
a single IP address.
Linux Virtual Server via Direct Routing
This IP load balancing approach is similar to the one implemented
in IBM’s NetDispatcher. The architecture of
LVS/DR is illustrated in Figure 4. The load balancer and
the real servers must have one of their interfaces physically
linked by an uninterrupted segment of LAN such
as a HUB/Switch. The virtual IP address is shared by
real servers and the load balancer.
Implemention Issues
The system implementation of Linux Virtual Server is illustrated
in Figure 5. The “VS Schedule&Control Module”
is the main module of LVS, it hooks two places at IP
packet traversing inside kernel in order to grab/rewrite IP
packets to support IP load balancing. It looks up the “VS
Rules” hash table for new connections, and checks the
“Connection Hash Table” for established connections.
The “IPVSADM” user-space program is to administrator
virtual servers, it uses setsockopt function to modify
the virtual server rules inside the kernel, and read the
virtual server rules through /proc file system.
Connection Scheduling
We have implemented four scheduling algorithms for
selecting servers from the cluster for new connections:
Round-Robin, Weighted Round-Robin, Least-
Connection and Weighted Least-Connection. The first
two algorithms are self-explanatory, because they don’t
have any load information about the servers. The last
two algorithms count active connection number for each
server and estimate their load based on those connection
numbers.