Cloud Computing with MapReduce

**seminar flower** · 09-10-2012, 10:55 AM

Cloud Computing with MapReduceand Hadoop

.ppt

Cloud Computing.ppt (Size: 2.35 MB / Downloads: 125)

What is Cloud Computing?

“Cloud” refers to large Internet services like Google, Yahoo, etc that run on 10,000’s of machines
More recently, “cloud computing” refers to services by these companies that let external customers rent computing cycles on their clusters
Amazon EC2: virtual machines at 10¢/hour, billed hourly
Amazon S3: storage at 15¢/GB/month
Attractive features:
Scale: up to 100’s of nodes
Fine-grained billing: pay only for what you use
Ease of use: sign up with credit card, get root access

What is MapReduce?

Simple data-parallel programming model designed for scalability and fault-tolerance
Pioneered by Google
Processes 20 petabytes of data per day
Popularized by open-source Hadoop project
Used at Yahoo!, Facebook, Amazon, …

What is MapReduce used for?

At Google:
Index construction for Google Search
Article clustering for Google News
Statistical machine translation
At Yahoo!:
“Web map” powering Yahoo! Search
Spam detection for Yahoo! Mail
At Facebook:
Data mining
Ad optimization
Spam detection

MapReduce Design Goals

Scalability to large data volumes:
1000’s of machines, 10,000’s of disks
Cost-efficiency:
Commodity machines (cheap, but unreliable)
Commodity network
Automatic fault-tolerance (fewer administrators)
Easy to use (fewer programmers)

Hadoop Distributed File System

Files split into 128MB blocks
Blocks replicated across several datanodes (usually 3)
Single namenode stores metadata (file names, block locations, etc)
Optimized for large files, sequential reads
Files are append-only

Fault Tolerance in MapReduce

If a task is going slowly (straggler):
Launch second copy of task on another node (“speculative execution”)
Take the output of whichever copy finishes first, and kill the other
Surprisingly important in large clusters
Stragglers occur frequently due to failing hardware, software bugs, misconfiguration, etc
Single straggler may noticeably slow down a job

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Green Computing-A Critical Necessity For Making ICT Judicious	seminar flower	1	3,105	19-09-2017, 03:35 PM Last Post: jaseela123
	Cloud Computing	seminar code	1	646	13-09-2017, 03:36 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Cloud Computing	seminar code	1	762	13-09-2017, 12:58 PM Last Post: jaseela123
	Green Computing for Efficient use of Energy and Electronic Waste Minimization Report	project girl	1	1,357	12-09-2017, 12:37 PM Last Post: jaseela123
	A Short History of Computing	seminar ideas	1	2,398	09-09-2017, 10:02 AM Last Post: jaseela123
	Aneka - Cloud application Platform	seminar projects maker	1	812	02-09-2017, 12:46 PM Last Post: jaseela123
	Study and Evaluating Cloud Computing Services	seminar paper	1	2,010	28-08-2017, 04:24 PM Last Post: jaseela123
	Unicode And Multilingual Computing	computer science crazy	0	4,646,633	25-08-2017, 09:32 PM Last Post: computer science crazy

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.