Installing Hadoop 2.2.0 with YARN (Yet Another Resource Negotiator) involves several steps. Here's a high-level guide to help you get started:
1. Prerequisites: Before you start, make sure you have the following prerequisites:
- A Linux-based system (such as CentOS, Ubuntu, or Debian)
- Java JDK installed (Hadoop 2.2.0 requires Java 6 or higher)
- SSH access to the machine
2. Download Hadoop: Download the Hadoop 2.2.0 distribution from the Apache Hadoop website or a mirror site. You can use the following command to download it to your server:
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
3. Extract Hadoop: Extract the downloaded archive using the following command:
tar -zxvf hadoop-2.2.0.tar.gz
4. Configure Environment Variables: Add the following environment variables to your shell profile (e.g., .bashrc, .bash_profile):
export HADOOP_HOME=/path/to/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
5. Configure Hadoop: Navigate to the Hadoop configuration directory:
cd $HADOOP_HOME/etc/hadoop
Edit the configuration files in this directory according to your cluster's settings. Key files to configure include:
- core-site.xml: Basic Hadoop settings like filesystem URLs.
- hdfs-site.xml: HDFS-specific settings.
- yarn-site.xml: YARN-specific settings.
- mapred-site.xml: MapReduce-specific settings (if needed).
6. Format HDFS: Before starting Hadoop services, you need to format the HDFS. This command will initialize the HDFS filesystem:
hdfs namenode -format
7. Start Hadoop Services: Start the Hadoop services using the following commands:
start-dfs.sh
start-yarn.sh
8. Access Hadoop UI: You can access the Hadoop UIs using the following URLs:
- HDFS NameNode UI: http://localhost:50070/
- YARN ResourceManager UI: http://localhost:8088/
9. Run MapReduce Job: To test your Hadoop installation, you can run a MapReduce job:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount input output
Replace input and output with your input and output paths.
10. Stop Hadoop Services: After you're done testing, stop the Hadoop services using the following commands:
stop-yarn.sh
stop-dfs.sh
Please note that Hadoop 2.2.0 is quite old, and newer versions of Hadoop offer more features, improvements, and bug fixes. It's recommended to use a more recent version of Hadoop for production environments.
Also, remember that this is a simplified overview of the installation process. Depending on your cluster's requirements and your environment, additional configurations and security considerations may be necessary.