Apache Hadoop and Hive ppt

**seminar projects maker** · 27-03-2014, 04:56 PM

Apache Hadoop and Hive

.ppt

Apache Hadoop.ppt (Size: 504.5 KB / Downloads: 88)

Hadoop, Why?

Need to process Multi Petabyte Datasets
Expensive to build reliability in each application.
Nodes fail every day
– Failure is expected, rather than exceptional.
– The number of nodes in a cluster is not constant.
Need common infrastructure
– Efficient, reliable, Open Source Apache License
The above goals are same as Condor, but
Workloads are IO bound and not CPU bound

Hive, Why?

Need a Multi Petabyte Warehouse
Files are insufficient data abstractions
Need tables, schemas, partitions, indices
SQL is highly popular
Need for an open data format
– RDBMS have a closed data format
– flexible schema
Hive is a Hadoop subproject!

Goals of HDFS

Very Large Distributed File System
– 10K nodes, 100 million files, 10 PB
Assumes Commodity Hardware
– Files are replicated to handle hardware failure
– Detect failures and recovers from them
Optimized for Batch Processing
– Data locations exposed so that computations can move to where data resides
– Provides very high aggregate bandwidth
User Space, runs on heterogeneous OS

NameNode Metadata

Meta-data in Memory
– The entire metadata is in main memory
– No demand paging of meta-data
Types of Metadata
– List of files
– List of Blocks for each file
– List of DataNodes for each block
– File attributes, e.g creation time, replication factor
A Transaction Log
– Records file creations, file deletions. etc

NameNode Failure

A single point of failure
Transaction Log stored in multiple directories
– A directory on the local file system
– A directory on a remote file system (NFS/CIFS)
Need to develop a real HA solution

Job Sheduling

Current state of affairs
FIFO and Fair Share scheduler
Checkpointing and parallelism tied together
Topics for Research
Cycle scavenging scheduler
Separate checkpointing and parallelism
Use resource matchmaking to support heterogeneous Hadoop compute clusters
Scheduler and API for MPI workload

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Software Life-Cycle Models ppt	seminar flower	1	3,852	22-09-2017, 10:54 AM Last Post: jaseela123
	PPT ON LINUX	project girl	1	1,829	21-09-2017, 03:56 PM Last Post: jaseela123
	Public Key Infrastructure (Digital Certificates and Digital Signatures) PPT	project girl	1	2,364	21-09-2017, 01:18 PM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Biometric Authentication PPT	project girl	1	1,109	19-09-2017, 02:32 PM Last Post: jaseela123
	Android Interface Definition Language PPT	project girl	1	1,681	19-09-2017, 10:58 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.