14-06-2012, 12:29 PM
GFS: Google File System
Google File System.ppt (Size: 173.5 KB / Downloads: 50)
Goals of GFS
Create a distributed file system that is:
Scalable: Many clients, many servers.
Reliable: Tolerant of Faults.
Available:
Fast: Sustained throughput over latency
Built from commodity hardware
Tailored to Google’s data usage
Large files
Append and Read heavy
Basic Terms
Master: Single, coordinates system-wide activities. Can have read-only ‘Shadow’ servers.
Chunk: 64MB storage block representing a file or piece thereof.
Chunkserver: Many, stores Chunks of data.
Replica: Either Primary or Secondary. A Chunkserver that replicates a given block.
Client: Runs tasks on data.
Master
Stores Metadata (in memory)
Access Control
File Namespaces
File to Chunk Mappings
Current Chunk Locations + Version
Controls System-Wide Activities
Chunk Lease Management
Garbage Collection
Minimized Involvement in Read/Write
Monitors Chunkservers via HeartBeat
Also provides instructions and collects state.
Shadow Servers
May lag slightly behind the Master.
Provide read-only access if Master fails.
Ensures Chunk Redundancy
Default to 3 copies of all data
Single Master Simplifies Design
Chunk Placement Decisions
Replication Decisions