16-11-2012, 02:13 PM
A Gossip Protocol for Dynamic Resource Management in Large Cloud Environments
A Gossip Protocol for Dynamic.pdf (Size: 517.06 KB / Downloads: 57)
INTRODUCTION
WE consider the problem of resource management
for a large-scale cloud environment. Such an
environment includes the physical infrastructure and
associated control functionality that enables the provisioning
and management of cloud services. While
our contribution is relevant in a more general context,
we conduct the discussion from the perspective of the
Platform-as-a-Service (PaaS) concept, with the specific
use case of a cloud service provider which hosts sites in a
cloud environment. The stakeholders for this use case are
depicted in figure 1a. The cloud service provider owns
and administers the physical infrastructure, on which
cloud services are provided. It offers hosting services
to site owners through a middleware that executes on
its infrastructure (See figure 1b). Site owners provide
services to their respective users via sites that are
hosted by the cloud service provider. Our contribution
can also be applied (with slight modifications) to the
Infrastructure-as-a-Service (IaaS) concept. A use case
for this concept could include a cloud tenant running
a collection of virtual appliances that are hosted on
the cloud infrastructure, with services provided to end
users through the public Internet. For both perspectives,
this paper introduces a resource allocation protocol that
dynamically places site modules (or virtual machines,
respectively) on servers within the cloud, following
global management objectives.
SYSTEM ARCHITECTURE
Datacenters running a cloud environment often contain
a large number of machines that are connected by
a high-speed network. Users access sites hosted by the
cloud environment through the public Internet. A site is
typically accessed through a URL that is translated to a
network address through a global directory service, such
as DNS. A request to a site is routed through the Internet
to a machine inside the datacenter that either processes
the request or forwards it.
Figure 2 (left) shows the architecture of the cloud
middleware. The components of the middleware layer
run on all machines. The resources of the cloud are
primarily consumed by module instances whereby the
functionality of a site is made up of one or more
modules. In the middleware, a module either contains
part of the service logic of a site (denoted by
FORMALIZING THE PROBLEM OF RESOURCE
ALLOCATION BY THE CLOUD MIDDLEWARE
For this work, we consider a cloud as having computational
resources (i.e., CPU) and memory resources,
which are available on the machines in the cloud infrastructure.
As explained earlier, we restrict the discussion
to the case where all machines belong to a single cluster
and cooperate as peers in the task of resource allocation.
The specific problem we address is that of placing
modules (more precisely: identical instances of modules)
on machines and allocating cloud resources to these
modules, such that a cloud utility is maximized under
constraints. As cloud utility we choose the minimum
utility generated by any site, which we define as the
minimum utility of its module instances. We formulate
the resource allocation problem as that of maximizing
the cloud utility under CPU and memory constraints.
The solution to this problem is a configuration matrix
that controls the module scheduler and the request forwarder
components. At discrete points in time, events
occur, such as demand changes, addition and removal
of site or machines, etc.
A PROTOCOL FOR DISTRIBUTED RESOURCE
ALLOCATION
In this section, we present our protocol for resource
allocation in a cloud environment, which we call P*. It
is based on a heuristic algorithm for solving OP(2) and
is implemented in form of a gossip protocol.
As a gossip protocol, P* has the structure of a roundbased
distributed algorithm (whereby round-based does
not imply that the protocol is synchronous). When executing
a round-based gossip protocol, each node selects
a subset of other nodes to interact with, whereby the
selection function is often probabilistic. Nodes interact
via ‘small’ messages, which are processed and trigger
local state changes. Node interaction with P* follows
the so-called push-pull paradigm, whereby two nodes
exchange state information, process this information and
update their local states during a round. Compared to
alternative distributed solutions, gossip-based protocols
tend to be simpler, more scalable and more robust.