HADOOP

**seminar ideas** · 30-05-2012, 10:44 AM

HADOOP

Hadoop.doc (Size: 836 KB / Downloads: 51)

INTRODUCTION

Computing in its purest form, has changed hands multiple times. First, from near the beginning mainframes were predicted to be the future of computing. Indeed mainframes and large scale machines were built and used, and in some circumstances are used similarly today. The trend, however, turned from bigger and more expensive, to smaller and more affordable commodity PCs and servers.
Most of our data is stored on local networks with servers that may be clustered and sharing storage. This approach has had time to be developed into stable architecture, and provide decent redundancy when deployed right. A newer emerging technology, cloud computing, has shown up demanding attention and quickly is changing the direction of the technology landscape. Whether it is Google’s unique and scalable Google File System, or Amazon’s robust Amazon S3 cloud storage model, it is clear that cloud computing has arrived with much to be gleaned from.

Need for large data processing

We live in the data age. It’s not easy to measure the total volume of data stored electronically, but an IDC estimate put the size of the “digital universe” at 0.18 zettabytes in 2006, and is forecasting a tenfold growth by 2011 to 1.8 zettabytes.
Some of the large data processing needed areas include:-

Challenges in distributed computing --- meeting hadoop

Various challenges are faced while developing a distributed application. The first problem to solve is hardware failure: as soon as we start using many pieces of hardware, the chance that one will fail is fairly high. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. This is how RAID works, for instance, although Hadoop’s filesystem, the Hadoop Distributed Filesystem(HDFS), takes a slightly different approach.

COMPARISON WITH OTHER SYSTEMS
Comparison with RDBMS

Unless we are dealing with very large volumes of unstructured data (hundreds of GB, TB’s or PB’s) and have large numbers of machines available you will likely find the performance of Hadoop running a Map/Reduce query much slower than a comparable SQL query on a relational database. Hadoop uses a brute force access method whereas RDBMS’s have optimization methods for accessing data such as indexes and read-ahead. The benefits really do only come into play when the positive of mass parallelism is achieved, or the data is unstructured to the point where no RDBMS optimizations can be applied to help the performance of queries.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Hadoop	presentation Abstract	0	501	04-06-2015, 04:26 PM Last Post: presentation Abstract
	Hadoop	presentation Abstract	0	297	21-05-2015, 03:20 PM Last Post: presentation Abstract
	Hadoop Technologies	presentation Abstract	0	400	09-05-2015, 04:18 PM Last Post: presentation Abstract
	HADOOP	smart paper boy	3	49,166	20-10-2014, 02:07 PM Last Post: Manikantakumar
	Hadoop Seminar Report	seminar code	0	928	20-09-2014, 09:28 AM Last Post: seminar code
	Hadoop : Seminar Report and PPT	seminar post	0	449	05-06-2014, 04:27 PM Last Post: seminar post
	Apache Hadoop and Hive ppt	seminar projects maker	0	570	27-03-2014, 04:56 PM Last Post: seminar projects maker
	SEMINAR REPORT on HADOOP	study tips	0	1,268	20-06-2013, 02:22 PM Last Post: study tips
	Data-Intensive Computing with Hadoop	seminar flower	0	933	01-10-2012, 03:48 PM Last Post: seminar flower
	Large Scale Image Processing with Hadoop	seminar ideas	1	2,915	18-04-2012, 11:20 AM Last Post: claretalabs

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.