28-08-2014, 11:30 AM
IN-NETWORK REDUNDANCY GENERATION FOR OPPORTUNISTIC SPEEDUP OF DATA BACKUP SEMINAR REPORT
IN-NETWORK REDUNDANCY.docx (Size: 46.31 KB / Downloads: 12)
ABSTRACT
Erasure coding is a storage-efficient alternative to replication for achieving reliable data backup indistributed storage systems. During the storage process, traditional erasure codes require a unique sourcenode to create and upload all the redundant data to the different storage nodes. However, such a sourcenode may have limited communication and computation capabilities, which constrain the storage process throughput. Moreover, the source node and the different storage nodes might not be able to send andreceive data simultaneously – e.g., nodes might be busy in a data center setting, or simply be offline ina peer-to-peer setting – which can further threaten the efficacy of the overall storage process. We propose an ‘‘in-network’’ redundancy generation process which distributes the data insertion Load among the source and storage nodes by allowing the storage nodes to generate new redundant data by exchanging partial information among themselves, improving the throughput of the storage process. The process is carried out asynchronously, utilizing spare bandwidth and computing resources from the storage nodes. The proposed approach leverages on the local reparability property of newly proposed erasure codes, tailor made for the needs of distributed storage systems. We analytically show that, the performance of this technique relies on an efficient usage of the spare node resources, and we derive a set of scheduling algorithms to maximize the same. We experimentally show, using availability traces from real peer-to-peer applications as well as Google data center availability and workload traces, that, ourAlgorithms can, depending on the environmental characteristics, increase the throughput of the storage process significantly (up to 90% in data centers, and 60% in peer-to-peer settings) with respect to the classical naive data insertion approach
INTRODUCTION
When a large volume of data is involved, deploying a networked distributed storage system becomes essential, since a single storage node cannot scale. Furthermore, distribution provides opportunities for fault tolerance and parallelized I/O. Examples of such distributed storage systems are readily found in data center environments, including distributed file systems such as GFS or HDFS, distributed key-value stores like Dynamo or Cassandra, for storing huge volume of scientific or multimedia content as well as in ad-hoc end user resource based peer-to-peer (P2P) settings such as Ocean Store and friend-to-friend (F2F) storage systems, a special kind of peer-to-peer systems oftenConsidered particularly suitable for personal data backup.
Various trade-offs in adopting erasure codes in storage systems, such as storage overhead and fault-tolerance, access frequency and decoding overheads, but also due to the need of replenishment of lost redundancy, repair bandwidth and repair time after failures, have been studied in the literature, revealing in particular that erasure codes are particularly suited for backup and archival storage, where data access is infrequent, and hence the effects of decoding are marginal. Consequently, many modern data center oriented storage and file systems such as Microsoft’s Azure, HDFS and the next generation of Google’s file system (Colossus) have incorporated erasure codes to enhance the systems’ storage efficiency.
EFFECT OF TEMPORAL RESOURCE (UN) AVAILABILITY
We elaborate the effect of temporal resource (un)availability issues with two distinct example scenarios, which we also use later in our experiments to determine how (much) in-network redundancy generation may improve the data insertion throughput:
• In data centers, storage nodes might be used for computation processes which require efficient access to local disks. Since backup processes consume large amounts of local disk I/O, system administrators might want to avoid backup transfers while nodes are executing I/O intensive tasks—e.g., Mapreduce tasks.
• In peer-to-peer settings, users exchange some of their spare disk resources in order to realize a collaborative data backup service. Such resource sharing may furthermore be driven by other constraints such as trust or friendship. However,desirable users may be online at different times of the day, complicating the data insertion process.
In both cases, the insertion of new redundancy by the source nodeis restricted to the periods when the availability windows of thesource overlap that of the storage nodes
MAIN CONTRIBUTIONS OF THIS PAPER
• Weintroduce the concept of in-network redundancy generation for reducing data insertion latency in erasure code based storage systems, and demonstrate its feasibility using one specific instance of locally repairable code, namely self-repairing codes.
• We define an analytical framework to explore valid data transfer schedules where the in-network redundancy generation process maximizes the use of the available network resources.
• We show that besides requiring a node availability prediction, determining the optimal data insertion schedule is computationally intractable.
• We propose a set of heuristics for efficient in-network redundancy generation.
• We determine the efficacy of in-network redundancy generation in diverse distributed storage environments using simulations driven by real workloads and real availability traces.
BACKGROUND
The P2P research community has long studied the applicabilityof erasure codes in low availability environments with limitedstorage capacity. The growing interest in applying erasurecodes in data centers is more recent, and aims at reducing the storagecosts. While fault-tolerance, storage overhead, reparability, but alsoI/O and bandwidth are well recognized critical bottlenecks for thestorage of huge amounts of data, the existing literature does notexplore yet, how the data insertion can be optimized in the contextof erasure codes based storage. This work leverages on one of theserecent approaches on distributed storage systems reparability, namely, novel codes with local reparability, in order to improvethe very process of data insertion.In the following, we provide some background on erasure codesas classically used for distributed storage, as well as on one specificinstance of locally repairable codes, Self-Repairing Codes (SRC),which we use in the rest of the paper to demonstrate the feasibilityand quantify the benefits of in-network redundancy generation inerasure code based distributed storage systems.
ERASURE CODES FOR DISTRIBUTED STORAGE
A classical ⟨n, k⟩erasure code allows to redundantly encode anobject of sizeM into n redundant fragments of sizeM/k, each to bestored in a different storage node. The data storage overhead (orredundancy factor) is then given by n/k, and the stored object canbe reconstructed by downloading an amount of data equal to M,from k or more different nodes out of n.One of the main drawbacks of using classical erasure codes forstorage is that, redundant fragments can only be generated byapplying coding operations on the original data. The generationof new redundancy is then restricted to the nodes that possessthe original object (or a copy), namely: the source node, storagenodes that previously reconstructed the original object, or possiblywere storing a copy (as is the case in a hybrid model where afull copy of the object is kept, together with encoded fragments).When the original raw object is not available, repairing a singlenode failure consequently entails downloading an amount ofinformation equivalent to the size of the original object, causing a significant communication overhead
SCHEDULING THE IN-NETWORK REDUNDANCY GENERATION
In-network redundancy generation has the potential to speed up the insertion of new data in distributed storage systems. However, the magnitude of actual benefit depends on two factors: (i) the availability pattern of the source and storage nodes, which determines the achievable throughput, and (ii) the specific schedule of data transfer among nodes subject to the constraints of resource availability, which determines the actual achieved throughput for data backup. In this section, we explore the scheduling problem, demonstrating that finding an optimal schedule is computationally very expensive even with a few simplifying assumptions, and accordingly motivate some heuristics instead. Let s be a source node aiming to store a new data object to n different storage nodes, and let i, i= 1, . . . ,n, represent each of these n storage nodes. We model our system using discrete time steps of duration τ , where at each time step nodes can be Available or unavailable to send/receive redundant data. The binary variable a(i, t) ∈ {0, 1} denotes this availability for each node i for the corresponding time step t. Then, we define the in-network redundancy generation network as a weighted temporal directed graph g = (e(t), v(t)), t ≥ 0, with the set of nodes v(t) ⊂ {s, 1, 2, . . . , n}, and the set of edges e(t) = {(i, j)|i, j ∈v(t)}. The amount of data that nodes might send among themselves is a mapping f :e(t) →r+, denoted by F (i, j, t), ∀(i, j) ∈e(t), t ≥ 0. Furthermore, we assume (for algorithmic simplicity) that nodes send data through each of the redundancy generation triplets symmetrically: r(c, t) = f (i, k, t) = f (j, k, t), ∀c ∈c; c = (i, j) ⊢k.
ADDITIONAL SCHEDULING CONSTRAINTS
In this section,weelaborate that, while being a bandwidth-validschedule is a necessary condition, it is not a sufficient conditionfor the schedule to be actually valid. For that an in-network redundancy generation network using HSRCwith parameters ⟨n = 7, k = 3⟩, where the redundant fragmentsr1, . . . ,r7 have to be stored in nodes 1, . . . , 7 respectively. Recallalso that, each redundant fragment ri is composed of 3 redundantchunks, hence |ri| = 3.
EXPERIMENTAL RESULTS
We have proposed four different policies for the source trafficscheduling problem and four policies for the triplets sortingproblem. However, after an extensive experimental evaluation ofall polices we will only report for each case the two best policies(in terms of achieved throughput). At the source, the random andminimum data policies consistently outperform the others, and atthe storage nodes, the maximum flow and minimum data sortingpolicies for the triplets likewise outperform the others
RESULTS
Storage throughput. We show the increment of the datainsertion throughput achieved by the in-network redundancygeneration process. We can see, how the gain is higher whennodes are more available for redundancy generation. This fact is aconsequence of the constraint in Eq. (5) requiring the redundancygeneration triplets to be symmetric, which requires the threeinvolved nodes in each triplet to be available simultaneously. Thehigher the online availability, the higher the chances to find theonline three nodes from a triplet
CONCLUSIONS
In this work, we propose and explore how storage nodescan collaborate among themselves to generate erasure encoded redundancy by leveraging novel erasure codes’ local-reparabilityproperty. Doing so not only reduces a source node’s load to inserterasure encoded data, but also significantly improves the overallthroughput of the data insertion process. We demonstrate the ideausing self-repairing codes. We show that, determining an optimalschedule among nodes to carry out in-network redundancy generationsubject to resource constraints of the system (nodes andnetwork) is computationally prohibitive even under simplifyingassumptions. However, experiments supported by real availabilitytraces from a Google data center, and P2P/F2F applications showthat, some heuristics we propose yield significant gain in storagethroughput under these diverse settings, proving the practicality ofnot only the idea in general, but also that of the specific proposedheuristics