26-03-2012, 12:35 PM
Hands-On Hadoop Tutorial
HadoopTutorial.ppt (Size: 85.5 KB / Downloads: 228)
General Information
Hadoop uses HDFS, a distributed file system based on GFS, as its shared filesystem
HDFS architecture divides files into large chunks (~64MB) distributed across data servers
HDFS has a global namespace
Slave Nodes
Centurion064 also acts as a slave node
Slave nodes
Manage blocks of data sent from master node
In terms of GFS, these are the chunkservers
Currently centurion060 is also another slave node
Hadoop Paths
Hadoop is locally “installed” on each machine
Installed location is in /localtmp/hadoop/hadoop-0.15.3
Slave nodes store their data in /localtmp/hadoop/hadoop-dfs (this is automatically created by the DFS)
/localtmp/hadoop is owned by group gbg (someone in this group must administer this or a cs admin)
Files are divided into 64 MB chunks (this is configurable)