Jasper22.NET: Running Hadoop On Ubuntu Linux (Multi-Node Cluster)

What we want to do
Tutorial approach and structure
Prerequisites
Configuring single-node clusters first
Done? Let’s continue then!
Networking
SSH access
Hadoop
Cluster Overview (aka the goal)
Masters vs. Slaves
Configuration
conf/masters (master only)
conf/slaves (master only)
conf/*-site.xml (all machines)
Starting the multi-node cluster
HDFS daemons
MapReduce daemons
Stopping the multi-node cluster
MapReduce daemons
HDFS daemons
Running a MapReduce job
Caveats
java.io.IOException: Incompatible namespaceIDs
Workaround 1: Start from scratch
Workaround 2: Updating namespaceID of problematic datanodes
What’s next?
Related Links
Changelog
Comments (54)

What we want to do

In this tutorial, I will describe the required steps for setting up a multi-node Hadoop cluster using the Hadoop Distributed File System (HDFS) on Ubuntu Linux.

Are you looking for the single-node cluster tutorial? Just head over there.

Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets.

Jasper22.NET

Running Hadoop On Ubuntu Linux (Multi-Node Cluster)

Archive

Random sites

Followers

Search This Blog