Posts

Showing posts from November, 2014

IMPORTING DATA FROM HADOOP TO MYSQL

Steps to install mysqlRun the command :sudo apt-get install mysql-server and give appropriate username and password. Using sqoop to perform import to hadoop from sql Download mysql-connector-java-5.1.28-bin.jar and move to/usr/lib/sqoop/lib using command user@ubuntu:~$ sudo cp mysql-connnectpr-java-5.1.28-bin.jar >/usr/lib/sqoop/lib/ Login to mysql using command user@ubuntu:~$ mysql -u root -p Login to secure shell using command user@ubuntu:~$ ssh localhost Start hadoop using the command user@ubuntu:~$ bin/hadoop start-all.sh Run the command user@ubuntu:~$ sqoop import -connect >jdbc:mysql://localhost:3306/sqoop -username root -pasword abc >-table employees -m This command imports the employees table from the sqoop directory of myql to hdfs. Error points Do check if the hadoop is in safe mode using command user@ubuntu:~$hadoop dfsadmin -safemode get If you are getting safemode is on, run the command
user@ubuntu:~$hadoop dfsadmin -safemode leave and again run the command
u…

HIVE INSTALLATION

This section refers to the installation settings of Hive on a standalone system as well as on a system existing as a node in a cluster.
INTRODUCTIONApache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Apache Hive supports analysis of large datasets stored in Hadoop’s HDFS and compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL(Hive Query Language) while maintaining full support for map/reduce. Hive Installation Installing HIVE: Browse to the link: http://apache.claz.org/hive/stable/ Click the apache-hive-0.13.0-bin.tar.gz Save and Extract it Commandsuser@ubuntu:~$ cd /usr/lib/ user@ubuntu:~$ sudo mkdir hive user@ubuntu:~$ cd Downloads user@ubuntu:~$ sudo mv apache-hive-0.13.0-bin /usr/lib/hive Setting Hive environment variable: Commands
user@ubuntu:~$ cd user@ubuntu:~$ sudo gedit ~/.bashrc Copy and paste the following lines at end of the file
# Set HIVE…

MULTI-NODE INSTALLATION - yarn

Image
Running Hadoop on Ubuntu Linux (Multi-Node Cluster) From single-node clusters to a multi-node cluster We will build a multi-node cluster merge two or more single-node clusters into one multi-node cluster in which one Ubuntu box will become the designated master but also act as a slave , and the other box will become only a slave.
Prerequisites Configuring single-node clusters first,here we have used two single node clusters. Shutdown each single-node cluster with the following command
user@ubuntu:~$ bin/stop-all.sh Networking The easiest is to put both machines in the same network with regard to hardware and software configuration. Update /etc/hosts on both machines .Put the alias to the ip addresses of all the machines. Here we are creating a cluster of 2 machines , one is master and other is slave 1 hduser@master:$ cd /etc/hosts Add the following lines for two node cluster 10.105.15.78 master (IP address of the master node) 10.105.15.43 slave1 (IP address of the sl…

SQOOP 1.4.4 INSTALLATION

Image
SQOOP INSTALLATION This section refers to the installation settings of Sqoop.
INTRODUCTIONSqoop is a tool designed to transfer data between Hadoop and relational databases.You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance. This document describes how to get started using Sqoop to move data between databases and Hadoop and provides reference information for the operation of the Sqoop command-line tool suite. Stable release and Download Sqoop is an open source software product of the Apache Software Foundation. Sqoop source code is held in the Apache Git repository…

Steps To Setup Hadoop 2.4.0 (Single Node Cluster) on CentOS/RHEL

Image
Steps To Setup Hadoop 2.4.0 (Single Node Cluster) on CentOS/RHELApache Hadoop 2.4.0 significant improvements over the previous stable releases. This version has many improvements in HDFS and MapReduce. This how to guide will help you to install Hadoop 2.4.0 on CentOS 6.5 System. This article doesn’t includes overall configuration of hadoop, we have only basic configuration required to start working with hadoop.

Step 1. Install JAVA/JDK Java is the primary requirement for running hadoop on any system, So make sure you have java installed on your system using following command.
# java -version java version "1.8.0_05" Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) Client VM (build 25.5-b02, mixed mode) If you don’t have java installed on your system, use one of following link to install it first.
Install JAVA/JDK 8 on CentOS and RHEL 6/5
Step 2. Setup Hadoop User We recommend to create a normal (nor root) account for hadoop working. So create a…