Table of Contents:
- What we want to do
- Tutorial approach and structure
- Prerequisites
- Configuring single-node clusters first
- Done? Let’s continue then!
- Networking
- SSH access
- Hadoop
- Cluster Overview (aka the goal)
- Masters vs. Slaves
- Configuration
- conf/masters (master only)
- conf/slaves (master only)
- conf/*-site.xml (all machines)
- Starting the multi-node cluster
- HDFS daemons
- MapReduce daemons
- Stopping the multi-node cluster
- MapReduce daemons
- HDFS daemons
- Running a MapReduce job
- Caveats
- java.io.IOException: Incompatible namespaceIDs
- Workaround 1: Start from scratch
- Workaround 2: Updating namespaceID of problematic datanodes
- What’s next?
- Related Links
- Changelog
- Comments (54)
What we want to do
In this tutorial, I will describe the required steps for setting up a multi-node Hadoop cluster using the Hadoop Distributed File System (HDFS) on Ubuntu Linux.
Are you looking for the single-node cluster tutorial? Just head over there.
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets.
Read more: Michael G. Noll
0 comments:
Post a Comment