Setting up a Hadoop single-node cluster on CentOS or RHEL involves several steps to install and configure Hadoop components. Here's a basic guide to help you get started:

Note: This guide assumes you have a basic understanding of Linux and command-line usage.

  1. Prerequisites:
    • CentOS or RHEL system (preferably a clean installation)
    • Java JDK (Hadoop requires Java)
    • SSH access to your system
  2. Install Java: Ensure you have Java installed and set as the default Java version.

                    sudo yum install java-1.8.0-openjdk-devel

Download and Extract Hadoop: Download the Hadoop binary package from the official Apache Hadoop website and extract it.

Wget https://downloads.apache.org/hadoop/common/hadoop-x.y.z/hadoop-x.y.z.tar.gz

tar -xvf hadoop-x.y.z.tar.gz

sudo mv hadoop-x.y.z /usr/local/hadoop

Set Environment Variables: Edit the .bashrc or .bash_profile file to set Hadoop-related environment variables.

                    Export HADOOP_HOME=/usr/local/Hadoop

                    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Configure Hadoop: Navigate to the etc/hadoop directory in your Hadoop installation and configure core-site.xml, hdfs-site.xml, and mapred-site.xml as needed. Here's a minimal setup:

core-site.xml:

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://localhost:9000</value>

    </property>

</configuration>

hdfs-site.xml:

<configuration>

    <property>

        <name>dfs.replication</name>

        <value>1</value>

    </property>

</configuration>

mapred-site.xml:

 <configuration>

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

</configuration>

Format HDFS: Before starting Hadoop, you need to format the HDFS.

                    hdfs namenode -format

Start Hadoop Services: Start the Hadoop daemons using the following commands:

                    start-dfs.sh

                    start-yarn.sh

Access Hadoop UI: Hadoop services provide web-based user interfaces. Access them using your web browser:

Run a Test Job: You can run a test job to ensure that Hadoop is working correctly.

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-x.y.z.jar pi 2 5

That's a basic setup of a single-node Hadoop cluster on CentOS or RHEL. Keep in mind that this setup is meant for learning and testing purposes. For a production environment or more advanced setups, you would need to consider high availability, security, and performance optimizations.

 

Previous Post Next Post