Installing Hadoop 2.2.0 with YARN (Yet Another Resource Negotiator) involves several steps. Here's a high-level guide to help you get started:

1. Prerequisites: Before you start, make sure you have the following prerequisites:

  • A Linux-based system (such as CentOS, Ubuntu, or Debian)
  • Java JDK installed (Hadoop 2.2.0 requires Java 6 or higher)
  • SSH access to the machine

2. Download Hadoop: Download the Hadoop 2.2.0 distribution from the Apache Hadoop website or a mirror site. You can use the following command to download it to your server:

wget https://archive.apache.org/dist/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz

3. Extract Hadoop: Extract the downloaded archive using the following command:

tar -zxvf hadoop-2.2.0.tar.gz

4. Configure Environment Variables: Add the following environment variables to your shell profile (e.g., .bashrc, .bash_profile):

export HADOOP_HOME=/path/to/hadoop-2.2.0

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

5. Configure Hadoop: Navigate to the Hadoop configuration directory:

                    cd $HADOOP_HOME/etc/hadoop

Edit the configuration files in this directory according to your cluster's settings. Key files to configure include:

  • core-site.xml: Basic Hadoop settings like filesystem URLs.
  • hdfs-site.xml: HDFS-specific settings.
  • yarn-site.xml: YARN-specific settings.
  • mapred-site.xml: MapReduce-specific settings (if needed).

6. Format HDFS: Before starting Hadoop services, you need to format the HDFS. This command will initialize the HDFS filesystem:

                    hdfs namenode -format

7. Start Hadoop Services: Start the Hadoop services using the following commands:

start-dfs.sh

start-yarn.sh

8. Access Hadoop UI: You can access the Hadoop UIs using the following URLs:

9. Run MapReduce Job: To test your Hadoop installation, you can run a MapReduce job:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount input output

Replace input and output with your input and output paths.

10. Stop Hadoop Services: After you're done testing, stop the Hadoop services using the following commands:

stop-yarn.sh

stop-dfs.sh

Please note that Hadoop 2.2.0 is quite old, and newer versions of Hadoop offer more features, improvements, and bug fixes. It's recommended to use a more recent version of Hadoop for production environments.

Also, remember that this is a simplified overview of the installation process. Depending on your cluster's requirements and your environment, additional configurations and security considerations may be necessary.

Previous Post Next Post