Setting up a Hadoop single-node cluster on CentOS or RHEL involves several steps to install and configure Hadoop components. Here's a basic guide to help you get started:
Note: This guide assumes you have a basic understanding of Linux and command-line usage.
- Prerequisites:
- CentOS or RHEL system (preferably a clean installation)
- Java JDK (Hadoop requires Java)
- SSH access to your system
- Install Java: Ensure you have Java installed and set as the default Java version.
sudo yum install java-1.8.0-openjdk-devel
Download and Extract Hadoop: Download the Hadoop binary package from the official Apache Hadoop website and extract it.
Wget https://downloads.apache.org/hadoop/common/hadoop-x.y.z/hadoop-x.y.z.tar.gz
tar -xvf hadoop-x.y.z.tar.gz
sudo mv hadoop-x.y.z /usr/local/hadoop
Set Environment Variables: Edit the .bashrc or .bash_profile file to set Hadoop-related environment variables.
Export HADOOP_HOME=/usr/local/Hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Configure Hadoop: Navigate to the etc/hadoop directory in your Hadoop installation and configure core-site.xml, hdfs-site.xml, and mapred-site.xml as needed. Here's a minimal setup:
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Format HDFS: Before starting Hadoop, you need to format the HDFS.
hdfs namenode -format
Start Hadoop Services: Start the Hadoop daemons using the following commands:
start-dfs.sh
start-yarn.sh
Access Hadoop UI: Hadoop services provide web-based user interfaces. Access them using your web browser:
- HDFS Namenode UI: http://localhost:9870
- YARN Resource Manager UI: http://localhost:8088
Run a Test Job: You can run a test job to ensure that Hadoop is working correctly.
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-x.y.z.jar pi 2 5
That's a basic setup of a single-node Hadoop cluster on CentOS or RHEL. Keep in mind that this setup is meant for learning and testing purposes. For a production environment or more advanced setups, you would need to consider high availability, security, and performance optimizations.