31-05-2012, 05:45 PM
NEW TECHNOLOGY FILE SYSTEM
NEW TECHNOLOGY FILE SYSTEM.ppt (Size: 4.13 MB / Downloads: 36)
Introduction
Created by Doug Cutting
Named after a Stuffed elephant toy
Created for Nutch search engine
Inspired by Google’s MapReduce and GFS
Implements two important elements:
- Distributed file system
- Computational paradigm
HDFS
Hadoop Distributed File System
Does not require RAID storage on hosts
Data stored in three nodes: Two in the same rack, one in the other rack
Built from a cluster of data nodes
Block protocal
Allows access to all content from a web-browser
Namenode and Datanode
Name node contains metadata
Datanode contains block server
Block Server
– Stores data in the local file system (e.g. ext3)
– Stores meta-data of a block (e.g. CRC)
– Serves data and meta-data to Clients
Client request – Metadata operations – Nameblock - Block operations – Datanode - Client
MAPREDUCE ENGINE
Above the filesystem
Computational paradigm
Consists of Jobtracker, task tracker
Tasks of Job tacker and task tracker
Rack awareness file system
Reducing network traffic
Task Tracker fails - that part of the job is rescheduled
Job Tracker fails - all ongoing work is lost
Hadoop 0.21 – Checkpointing
Goals
Avoiding Hardware Failure
Smooth Streaming Data Access
Large Data Sets
Simple Coherency Model
Moving Computation is Cheaper than Moving Data
Portability across Heterogeneous Hardware and Software Platforms