25-02-2013, 03:57 PM
Towards autonomic detection of SLA violations in Cloud infrastructures
Towards autonomic detection.pdf (Size: 1.7 MB / Downloads: 62)
ABSTRACT
Cloud computing has become a popular paradigm for implementing scalable computing infrastructures
provided on-demand on a case-by-case basis. Self-manageable Cloud infrastructures are required in order
to comply with users’ requirements defined by Service Level Agreements (SLAs) and to minimize user
interactions with the computing environment. Thus, adequate SLA monitoring strategies and timely
detection of possible SLA violations represent challenging research issues. This paper presents the
Detecting SLA Violation infrastructure (DeSVi) architecture, sensing SLA violations through sophisticated
resource monitoring. Based on the user requests, DeSVi allocates computing resources for a requested
service and arranges its deployment on a virtualized environment. Resources are monitored using a
novel framework capable of mapping low-level resource metrics (e.g., host up and down time) to userdefined
SLAs (e.g., service availability). The detection of possible SLA violations relies on the predefined
service level objectives and utilization of knowledge databases to manage and prevent such violations.
We evaluate the DeSVi architecture using two application scenarios.
Introduction
Cloud computing represents a novel paradigm for the implementation
of scalable computing infrastructures combining concepts
from virtualization, distributed application design, Grid, and
enterprise IT management [1–3]. Service provisioning in the Cloud
relies on Service Level Agreements (SLAs) representing a contract
signed between the customer and the service provider including
non-functional requirements of the service specified as Quality of
Service (QoS) [4,5]. SLA considers obligations, service pricing, and
penalties in case of agreement violations.
Flexible and reliable management of SLA agreements is of
paramount importance for both Cloud providers and consumers.
On the one hand, prevention of SLA violations avoids penalties
providers have to pay and on the other hand, based on flexible and
timely reactions to possible SLA violations, user interaction with
the system can be minimized, which enables Cloud computing to
take roots as a flexible and reliable form of on-demand computing.
Related work
We classify related work into (i) resource monitoring [21,
12,22], (ii) SLA management including violation detection [23–
27], and (iii) mapping techniques of monitored metrics to SLA
parameters [28,11]. Currently, there is little work in the area of
resource monitoring, low-level metrics mapping, and SLA violation
detection in Cloud computing. Because of that, we look into the
related areas of Grid and Service-Oriented Architecture (SOA)
based systems.
Fu et al. [21] propose GridEye, a service-oriented monitoring
system with flexible architecture that is further equipped with
an algorithm for prediction of the overall resource performance
characteristics. The authors discuss how resources are monitored
with their approach in Grid environment but they consider neither
SLA management nor low-level metric mapping. Gunter et al. [12]
present NetLogger, a distributed monitoring system, which
can monitor and collect information of networks. Applications
invoke NetLogger’s API to survey the overload before and after
some request or operation. However, it monitors only network
resources. Wood et al. [22] developed a system, called Sandpiper,
which automates the process of monitoring and detecting hotspots
and remapping/reconfiguring VMs whenever necessary. Their
monitoring system is reminiscent of our in terms of goal: avoid
SLA violation. Similar to our approach, Sandpiper uses thresholds
to check whether SLAs can be violated. However, it differs from
our system by not considering the mapping of low level metrics,
such as CPU and memory, to high-level SLA parameters, such as
response time for SLA enactment.
Background and motivation
The processes of service provisioning based on SLA and efficient
management of resources in an autonomic manner have been
identified as major research challenges in Cloud environments [30,
1]. FoSII project (Foundations of Self-governing Infrastructures) is
developing models and concepts for autonomic SLA management
and enforcement in Clouds. FoSII components manage the whole
lifecycle of self-adaptable Cloud services [6] as explained next.
SLA are used to guarantee customers a certain level of quality
for their services. In a situation where this level of quality is not
met, the provider pays penalties for the breach of contract. In
order to save Cloud providers from paying penalties and increase
their profit, providers have to monitor the current status or
resource and check frequently whether the established SLAs are
violated. Thus, in order to facilitate appropriate monitoring of SLAs
we developed the low level metrics to high level SLA (LoM2HiS
framework) [19] that maps the low-level resource metrics to highlevel
SLA parameters and detects SLA violations as well as future
SLA violation threats so as to react before actual SLA violations
occur.
Knowledge databases
Knowledge management in FoSII is performed based on
knowledge databases and case-based reasoning [31] for proposing
of reactive actions. Case-Based Reasoning (CBR) is the process of
solving problems based on past experience. It tries to solve a case
(a formatted instance of a problem) by looking for similar cases
from the past and reusing the solutions of these cases to solve
the current one. In general a typical CBR cycle consists of the
following phases assuming that a new case has just been received:
(i) retrieving the most similar case or cases to the new one, (ii)
reusing the information and knowledge in the similar case(s) to
solve the problem, (iii) revising the proposed solution, and (iv)
retaining the parts of this experience likely to be useful for future
problem solving.
LoM2HiS framework overview
The LoM2HiS framework comprises two core components,
namely host monitor and run-time monitor. The former is responsible
for monitoring low-level resource metrics, whereas the latter
is responsible for metric mapping and SLA violation monitoring.
In order to explain our mapping approach we consider the Service
Level Objectives (SLOs) shown in Table 1, including incoming bandwidth,
outgoing bandwidth, storage, and availability.
As shown in Fig. 1, we distinguish between host monitor and
run-time monitor. Resources are monitored by the host monitor
using arbitrary monitoring tools such as Gmond from Ganglia
project [32]. Low level resource metrics include downtime, uptime,
and available storage. Based on the predefined mapping rules
stored in a database, monitored metrics are periodically mapped
to the high level SLA parameters. These mapping ideas are similar
to those in Grids where workflow processes are mapped to Grid
service in order to ensure their quality of service [33]. An example
of an SLA parameter is service availability Av, (as shown in Table 1).
DeSVi architecture
This section describes in detail the Detecting SLA Violation
infrastructure-DeSVi architecture, its components, and how the
components interact with one another (Fig. 3). The proposed architecture
is designed to handle the complete service provisioning
management lifecycle in Cloud environments. The service provisioning
lifecycle includes activities such as service deployment, resource
allocation to tasks, resource monitoring, and SLA violation
detection.
The topmost layer represents the users (customers) who
place service provisioning request through a defined application
interface (step 1 in Fig. 3) to the Cloud provider. The provider
handles the user service request based on the negotiated and
agreed SLAs with the user. The application deployer, which is
located on the same layer of the run-time monitor, allocates
necessary VM resources for the requested service and arranges
its deployment on the Cloud environment (step 2).
Automated Emulation Framework
The Automated Emulation Framework (AEF) was originally
conceived for automated configuration and execution of emulation
experiments [8]. Nevertheless, it can also be used to set up
arbitrary virtual environments by not activating the emulated
wide-area network support. In the latter case AEF works as
a virtualized infrastructure manager, similar to tools such as
OpenNebula [40], Oracle VM Manager [41], and OpenPEX [42].
Fig. 5 depicts the architecture of the AEF framework. AEF
input consists of two configuration files providing XML description
of both the physical and virtual infrastructures. Using this
information, AEF maps VMs to physical hosts. AEF supports
different algorithms for VM mapping. The algorithm used in this
work tries to reduce the number of hosts used by consolidating
VMs as long as one host has enough resources to host several VMs.
At the end of the mapping process, the resulting mapping is sent
to the deployer, which creates VMs in the hosts accordingly.