Steps To Setup Hadoop 2.4.0 (Single Node Cluster) on CentOS/RHEL

Setting up a Hadoop single-node cluster on CentOS or RHEL involves several steps to install and configure Hadoop components. Here's a basic guide to help you get started:

Note: This guide assumes you have a basic understanding of Linux and command-line usage.

Prerequisites:

CentOS or RHEL system (preferably a clean installation)
Java JDK (Hadoop requires Java)
SSH access to your system

Install Java: Ensure you have Java installed and set as the default Java version.

sudo yum install java-1.8.0-openjdk-devel

Download and Extract Hadoop: Download the Hadoop binary package from the official Apache Hadoop website and extract it.

Wget https://downloads.apache.org/hadoop/common/hadoop-x.y.z/hadoop-x.y.z.tar.gz

tar -xvf hadoop-x.y.z.tar.gz

sudo mv hadoop-x.y.z /usr/local/hadoop

Set Environment Variables: Edit the .bashrc or .bash_profile file to set Hadoop-related environment variables.

Export HADOOP_HOME=/usr/local/Hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Configure Hadoop: Navigate to the etc/hadoop directory in your Hadoop installation and configure core-site.xml, hdfs-site.xml, and mapred-site.xml as needed. Here's a minimal setup:

core-site.xml:

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

hdfs-site.xml:

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

mapred-site.xml:

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

Format HDFS: Before starting Hadoop, you need to format the HDFS.

hdfs namenode -format

Start Hadoop Services: Start the Hadoop daemons using the following commands:

start-dfs.sh

start-yarn.sh

Access Hadoop UI: Hadoop services provide web-based user interfaces. Access them using your web browser:

HDFS Namenode UI: http://localhost:9870
YARN Resource Manager UI: http://localhost:8088

Run a Test Job: You can run a test job to ensure that Hadoop is working correctly.

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-x.y.z.jar pi 2 5

That's a basic setup of a single-node Hadoop cluster on CentOS or RHEL. Keep in mind that this setup is meant for learning and testing purposes. For a production environment or more advanced setups, you would need to consider high availability, security, and performance optimizations.

Tech Insights

About Us

Contact Form