An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing

**seminar tips** · 13-11-2012, 03:50 PM

BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing

.pdf

BAR.pdf (Size: 662.61 KB / Downloads: 27)

INTRODUCTION

In recent years, large scale data processing has emerged
as an important part of state-of-the-art internet applications
such as search engines, online map services and social networkings.
These applications not only handle vast amount
of data, but also generate a large quantity of data everyday.
Cloud computing systems like MapReduce[1], Hadoop[2] and
Dryad[3] which are based on simplified parallel programming
models, have been designed for data-intensive applications.
For example, Facebook’s Hadoop data warehouse stores more
than 15PB of data(2.5PB after compression); on a single day,
more than 10,000 jobs are submitted to process a large amount
of data, meanwhile more than 60TB of new data(10TB after
compression) are loaded[4].

RELATED WORK

A. Data-aware scheduling on distributed systems
Over the past decade, data-intensive applications are
emerged as an important part of distributed computing. Meanwhile
considerable work has been done on data-aware scheduling
on distributed systems. Stork[12] is a specialized scheduler
for data placement and data movement in Grid. The main
idea of Stork is to map data close to computational resources.
Though Stork can be coupled with a computational task
scheduler, no attempt is made to use data locality to reduce
data transfer cost. The Gfarm[13] architecture is designed for
petascale data-intensive computing. Their model specifically
targets applications where the data primarily consist of a set
of records or objects which are analysed independently. In
Gfarm, several greedy scheduling algorithms are implemented
to improve data locality. However these algorithms do not
take account of the global optimization of all tasks.

PROBLEM FORMALIZATION

In this section, the system model is formalized. We consider
scheduling a set of independent tasks on a homogeneous
platform. As shown in Fig. 1, there are m(m = 7) tasks
and n(n = 3) servers, where each task processes an input
block on a server. On one hand, as input blocks are fixed-size,
we assume that data-local tasks take identical constant local
cost. On the other hand, as a larger remote task number will
cause a higher network contention, remote cost is increased
when the remote task number become larger. A job is not
completed until all tasks are finished. In addition, we take
account of cluster workload: at the start time, if most servers
are idle, the cluster is underloaded; in an overloaded cluster,
many servers can not be idle in a short time. Base on these
assumptions, our goal is to find an allocation strategy that
minimizes the job completion time.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY PROJECT	project maker	1	708	14-09-2017, 01:15 PM Last Post: jaseela123
	Survey of Privacy Protection for Medical Data	project maker	1	649	13-09-2017, 01:14 PM Last Post: jaseela123
	Using Rapid Prototyping Data to Enhance a Knowledge-Based Framework for Product Redes	smart paper boy	1	115,120	13-09-2017, 09:54 AM Last Post: jaseela123
	SECRET DATA HIDING IN IMAGE USING ENCRYPTION AND DECRYPTION KEY PPT	study tips	1	983	09-09-2017, 10:07 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.