22-12-2012, 04:35 PM
Deadline-constrained workflow scheduling in software as a service Cloud
1Deadline-constrained.pdf (Size: 1.76 MB / Downloads: 41)
Abstract
The advent of Cloud computing as a new model of service provisioning in distributed systems,
encourages researchers to investigate its benefits and drawbacks in executing scientific applications such
as workflows. In this model, the users request for available services according to their desired Quality of
Service, and they are charged on a pay-per-use basis. One of the most challenging problems in Clouds is
workflow scheduling, i.e., the problem of satisfying the QoS of the user as well as minimizing the cost of
workflow execution. In this paper, we propose a new QoS-based workflow scheduling algorithm based on
a novel concept called Partial Critical Paths (PCP), which tries to minimize the cost of workflow execution
while meeting a user-defined deadline. This algorithm recursively schedules the partial critical paths
ending at previously scheduled tasks. The simulation results show that the performance of our algorithm
is very promising.
Introduction
Cloud computing [1] is the latest emerging trend in
distributed computing that delivers hardware infrastructures
and software applications as services. The users can consume
these services based on a Service Level Agreement (SLA), which
defines their required Quality of Service (QoS) parameters on a
pay-per-use basis. Although there are many papers that address
the problem of scheduling in traditional distributed systems,
like Grids, there are only a few works on this problem in
Clouds. The multi-objective nature of the scheduling problem
in Clouds makes it difficult to solve, especially in the case of
complex jobs like workflows. This has led most researchers to
use time-consuming meta-heuristic approaches, instead of fast
heuristic methods.
Scheduling system model
The proposed scheduling system model consists of an
application model, a Cloud model, and a performance criterion
for scheduling. An application is modeled by a directed acyclic
graph w(T , E), where T is a set of n tasks {t1, t2, . . . , tn}, and
E is a set of dependencies. Each dependency, ei,j = (ti, tj),
represents a precedence constraint, which indicates that task ti
should complete execution before task tj can start. In a given
task graph, a task without any parent is called an entry task,
and a task without any child is called an exit task. As our
algorithm requires a single entry and a single exit task, we
always add two dummy tasks; tentry and texit , to the beginning
and end of the workflow, respectively. These dummy tasks
have zero execution time and are connected with zero-weight
dependencies to the actual entry and exit tasks.
The Cloud partial critical paths algorithm
In this section, we first elaborate on the SC-PCP scheduling
algorithm, and then compute its time complexity.
Main idea
Critical Path heuristics are widely used in workflow scheduling.
The critical path of a workflow is the longest execution path
between the entry and exit tasks of the workflow. Most of these
heuristics try to schedule critical tasks (nodes), i.e. the tasks
belonging to the critical path, first, by assigning them to the resources
that process them earliest, in order to minimize the execution
time of the entire workflow. Our proposed algorithm
is based on a similar heuristic, to schedule the critical nodes
first, yet not to minimize the execution time, but to minimize
the price of executing the critical path before the user-specified
deadline. After scheduling all critical nodes, each of them has
a start time which is a deadline for its parent nodes, i.e. its
(direct) predecessors in the workflow. So, then we can carry out
the same procedure by considering each critical node in turn as
an exit node with its start time as a deadline, and creating a partial
critical path that ends in the critical node and that leads back
to an already scheduled node. In the SaaS Cloud-Partial Critical
Paths (SC-PCP) algorithm, this procedure continues recursively
until all tasks are successfully scheduled. In the following sections,
we elaborate on the details of the SC-PCP algorithm.
The path scheduling algorithm
The SchedulePath algorithm receives a path as input and tries
to find a schedule for its tasks that minimizes the total cost of
the path and finishes each task before its latest finishes time.We
propose three different policies for scheduling a path as follows.
Optimized policy. In this policy, we try to find the cheapest
schedule that can finish the tasks of the path before their
latest finish time. Since the problem of finding the optimal
schedule for an ordered list of tasks, or, more precisely, a
linear workflow is also an NP-complete problem, there is no
polynomial time algorithm to solve it. Fortunately, this problem
can be formulated as an extension of a classic problem, known
as the Multiple Choice Knapsack Problem (MCKP) [18].
Performance evaluation
In this section, we will present our simulations of the Cloud
Partial Critical Paths algorithm.
Experimental setup
A Java-based simulator is developed to simulate the Cloud
environment for our experiments. The simulated Cloud model
consists of a service provider, and offers services to execute each
task type of the five experimental workflows, e.g. 9 different
services for the Montage workflow. Each task type is supported
by 10 services with different execution times and costs. The
execution time of each task type on its fastest service is set to
be equal to its average execution time, explained in [19], and
execution times on slower services are selected randomly. The
execution cost of each service is also selected randomly, such
that a faster service costs more than a slower one. To control
the range of execution times and costs, they are selected, such
that the fastest service for each task type is approximately
10 times faster than the slowest one, and accordingly it is
roughly 10 times more expensive. The average bandwidth
between the computation services and the storage service is
set to 20 MBps, which is the approximate average bandwidth
between the storage service (S3) and the computation service
(EC2) in Amazon [20].
Conclusions
The Cloud computing model enables users to obtain their
required services with desired QoS (such as deadline) by
paying an appropriate price. In this paper, we propose a new
algorithm named the SaaS Cloud Partial Critical Path (SC-PCP)
for workflow scheduling in SaaS Clouds, which minimizes the
total execution cost while meeting a user-defined deadline.
We evaluate our algorithm by simulating it with synthetic
workflows that are based on real scientific workflows with
different structures and sizes. The results show that SC-PCP
outperforms another highly cited algorithm called Deadline-
MDP. Furthermore, the experiments show that the computation
time of the algorithm is very low for the Decrease Cost and
the Fair policies, but is much longer for the Optimized policy,
although still acceptable for the mentioned workflows.