Load Rebalancing for Distributed File Systems in Clouds pdf

**seminar projects maker** · 29-05-2014, 11:03 AM

Load Rebalancing for Distributed File Systems in Clouds

.pdf

Load Rebalancing for Distributed .pdf (Size: 1.2 MB / Downloads: 10)

Abstract

Distributed file systems are key building blocks for cloud computing applications based on the MapReduce programming
paradigm. In such file systems, nodes simultaneously serve computing and storage functions; a file is partitioned into a number of
chunks allocated in distinct nodes so that MapReduce tasks can be performed in parallel over the nodes. However, in a cloud
computing environment, failure is the norm, and nodes may be upgraded, replaced, and added in the system. Files can also be
dynamically created, deleted, and appended. This results in load imbalance in a distributed file system; that is, the file chunks are not
distributed as uniformly as possible among the nodes. Emerging distributed file systems in production systems strongly depend on a
central node for chunk reallocation. This dependence is clearly inadequate in a large-scale, failure-prone environment because the
central load balancer is put under considerable workload that is linearly scaled with the system size, and may thus become the
performance bottleneck and the single point of failure. In this paper, a fully distributed load rebalancing algorithm is presented to cope
with the load imbalance problem. Our algorithm is compared against a centralized approach in a production system and a competing
distributed solution presented in the literature. The simulation results indicate that our proposal is comparable with the existing
centralized approach and considerably outperforms the prior distributed algorithm in terms of load imbalance factor, movement cost,
and algorithmic overhead. The performance of our proposal implemented in the Hadoop distributed file system is further investigated in
a cluster environment.

INTRODUCTION

Computing (or cloud for short) is a compelling
technology. In clouds, clients can dynamically allocate
their resources on-demand without sophisticated deploy-
ment and management of resources. Key enabling technol-
ogies for clouds include the MapReduce programming
paradigm [1], distributed file systems (e.g., [2], [3]),
virtualization (e.g., [4], [5]), and so forth. These techniques
emphasize scalability, so clouds (e.g., [6]) can be large in
scale, and comprising entities can arbitrarily fail and join
while maintaining system reliability.

LOAD REBALANCING PROBLEM

We consider a large-scale distributed file system consisting
of a set of chunkservers V in a cloud, where the cardinality of
V is jV j 1⁄4 n. Typically, n can be 1,000, 10,000, or more. In
the system, a number of files are stored in the n
chunkservers. First, let us denote the set of files as F . Each
file f 2 F is partitioned into a number of disjointed, fixed-
size chunks denoted by Cf . For example, each chunk has
the same size, 64 Mbytes, in Hadoop HDFS [3]. Second, the
load of a chunkserver is proportional to the number of
chunks hosted by the server [3]. Third, node failure is the
norm in such a distributed system, and the chunkservers
may be upgraded, replaced and added in the system.
Finally, the files in F may be arbitrarily created, deleted,
and appended. The net effect results in file chunks not
being uniformly distributed to the chunkservers. Fig. 1
illustrates an example of the load rebalancing problem with
the assumption that the chunkservers are homogeneous
and have the same capacity.

Managing Replicas

In distributed file systems (e.g., Google GFS [2] and Hadoop
HDFS [3]), a constant number of replicas for each file chunk
are maintained in distinct nodes to improve file availability
with respect to node failures and departures. Our current
load-balancing algorithm does not treat replicas distinctly.
It is unlikely that two or more replicas are placed in an
identical node because of the random nature of our load
rebalancing algorithm. More specifically, each underloaded
node samples a number of nodes, each selected with a
, to share their loads (where n is the total
probability of n
number of storage nodes).

Experimental Results

We demonstrate in Fig. 11 the experimental results. Fig. 11b
shows the time required for performing the HDFS load
balancer and our proposal. Our proposal clearly outper-
forms the HDFS load balancer. When the namenode is
heavily loaded (i.e., small M’s), our proposal remarkably
performs better than the HDFS load balancer. For example,
if M 1⁄4 1%, the HDFS load balancer takes approximately
60 minutes to balance the loads of datanodes. By contrast,
our proposal takes nearly 20 minutes in the case of
M 1⁄4 1%. Specifically, unlike the HDFS load balancer, our
proposal is independent of the load of the namenode.
In Figs. 11c and 11d, we further show the distributions of
chunks after performing the HDFS load balancer and our
proposal. As there are 256 file chunks and 25 datanodes, the
ideal number of chunks that each datanode needs to host is
256
25 % 10. Due to space limitation, we only offer the
experimental results for M 1⁄4 1 and the results for M 61⁄4 1
conclude the similar. Figs. 11c and 11d indicate that our
proposal is comparable to the HDFS load balancer, and
balances the loads of datanodes, effectively.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	MANAGEMENT INFORMATION SYSTEMS	seminar ideas	1	2,282	19-09-2017, 04:45 PM Last Post: jaseela123
	Efficient Code-based Relay Algorithm for Reliable Cognitive Radio Systems pdf	project girl	1	1,187	08-09-2017, 12:01 PM Last Post: jaseela123
	Development of a Novel Voice Verification System using Wavelets pdf	seminar projects maker	1	860	07-09-2017, 01:22 PM Last Post: jaseela123
	Automated Duplicate Detection for Bug Tracking Systems	seminar flower	1	1,543	07-09-2017, 09:36 AM Last Post: jaseela123
	ATTRIBUTE-BASED ACCESS CONTROL WITH EFFICIENT REVOCATION IN DATA OUTSOURCING SYSTEMS	seminar tips	1	980	31-08-2017, 12:50 PM Last Post: jaseela123
	Integrated Distributed Energy Awareness for Wireless Sensor Networks	seminar ideas	1	1,490	30-08-2017, 03:06 PM Last Post: jaseela123
	A Hotal Management System pdf	study tips	1	771	28-08-2017, 03:24 PM Last Post: jaseela123
	distributed computing	mkaasees	0	253	25-08-2017, 09:32 PM Last Post: mkaasees
	Embedded Systems	mkaasees	0	219	25-08-2017, 09:32 PM Last Post: mkaasees
	Shareglimpse-file Sharing Java Web Application	presentation Abstract	0	449	25-08-2017, 09:32 PM Last Post: presentation Abstract

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.