Scalable and Parallel Boosting with MapReduce

**project girl** · 05-11-2012, 12:28 PM

Scalable and Parallel Boosting with MapReduce

.doc

Scalable and Parallel.doc (Size: 34.5 KB / Downloads: 31)

Abstract

MapReduce is a framework for processing large data sets and also used to do distribute computing on clusters of computers. These MapReduce libraries have been written in many programming languages. Here we propose two novel algorithms for an efficient map reduce method, AdaBoost.PL (Parallel AdaBoost) and LogitBoost.PL (Parallel LogitBoost), which facilitate simultaneous participation of multiple computing nodes to construct a boosted classifier.
Due to the recent overwhelming growth rate of large-scale data, the development of faster processing algorithms with optimal performance has become a dire need of the time. Our algorithms can induce boosted models whose generalization performance is close to the respective baseline classifier. By exploiting their own parallel architecture both the algorithms gain signiﬁcant speedup. Moreover, the algorithms do not require individual computing nodes to communicate with each other, to share their data or to share the knowledge derived from their data and hence, they are robust in preserving privacy of computation as well. We used the Map-Reduce framework to implement our algorithms and experimented on a variety of synthetic and real-world data sets to demonstrate the performance in terms of classiﬁcation accuracy, speedup and scale up.

Proposed System

We propose two novel parallel boosting Algorithms like ADABOOST.PL (Parallel ADABOOST) and LOGITBOOST.PL (Parallel LOGITBOOST). These Algorithms achieve parallelization in both time and space with minimal amount communication between the computing nodes.
ADABOOST, short for Adaptive Boosting, is a machine learning algorithm (also called as Meta – algorithm) used in conjunction with many other learning algorithms to improve their performance.
LOGITBOOST is an influential boosting algorithm that is based on additive logistic regression method. It also computes working response and weights for each data points.
The Map function is applied in parallel to every pair in the input dataset. And produces a list of pairs for each call.
The Reduce function is applied in parallel to each group, which in turn produces a collection of values in the same domain.
Finally we improve the scalability of the MapReduce method and the possibility of improving multi-resolution boosting models to reduce the number of iterations.

Existing System

The Existing method of MapReduce has several limitations. The Execution time for previous algorithms used in map reduce method is too high. Data is structured into tradition database tables and columns, the SQL for processing that data is less clear.
In the Existing Method has inherent sequential nature, so it is not easy to achieve the scalability for boosting and parallelized boosting.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Throughput Enhancement Through Parallel Processing and Utilization Monitoring in Clus	seminar code	1	474	30-08-2017, 03:02 PM Last Post: jaseela123
	Tracking and having the View of the Cubicle and the Person sitting in it	Electrical Fan	0	9,224,815	25-08-2017, 09:32 PM Last Post: Electrical Fan
	On finding paths and flows in multicriteria, stochastic and time-varying networks	mechanical wiki	0	7,504,013	25-08-2017, 09:32 PM Last Post: mechanical wiki
	A Parallel Approach to XML Parsing	mechanical engineering crazy	0	14,755,420	25-08-2017, 09:32 PM Last Post: mechanical engineering crazy
	On finding paths and flows in multicriteria, stochastic and time-varying networks	mechanical wiki	0	6,501,678	25-08-2017, 09:32 PM Last Post: mechanical wiki
	Optimization and Evaluation of Service Speed and Reliability in Modern Caching Applic	mechanical wiki	0	8,317,301	25-08-2017, 09:32 PM Last Post: mechanical wiki
	FAIRNESS PROBLEM IN TCP AND UDP A STUDY AND SOLUTION	nit_cal	0	15,547,626	25-08-2017, 09:32 PM Last Post: nit_cal
	Secrecy Wireless Information and Power Transfer: Challenges and Opportunities	mkaasees	0	284	09-11-2016, 02:12 PM Last Post: mkaasees
	Internet and JavaFoundations, Programming and Practice	mkaasees	0	318	04-11-2016, 11:36 AM Last Post: mkaasees
	Huffman Coding and decoding, Shannon – Fano coding and decoding	mkaasees	0	199	03-11-2016, 09:24 AM Last Post: mkaasees

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.